Focus on the Core: Efficient Attention via Pruned Token Compression for Document Classification

Yun, Jungmin; Kim, Mihyeon; Kim, Youngbin

Detailed Information

Cited 0 time in webofscience

Cited 0 time in scopus

Metadata Downloads

Focus on the Core: Efficient Attention via Pruned Token Compression for Document Classification

Full metadata record

DC Field	Value	Language
dc.contributor.author	Yun, Jungmin	-
dc.contributor.author	Kim, Mihyeon	-
dc.contributor.author	Kim, Youngbin	-
dc.date.accessioned	2024-03-13T05:00:45Z	-
dc.date.available	2024-03-13T05:00:45Z	-
dc.date.issued	2023	-
dc.identifier.issn	0000-0000	-
dc.identifier.uri	https://scholarworks.bwise.kr/cau/handle/2019.sw.cau/72809	-
dc.description.abstract	Transformer-based models have achieved dominant performance in numerous NLP tasks. Despite their remarkable successes, pre-trained transformers such as BERT suffer from a computationally expensive self-attention mechanism that interacts with all tokens, including the ones unfavorable to classification performance. To overcome these challenges, we propose integrating two strategies: token pruning and token combining. Token pruning eliminates less important tokens in the attention mechanism's key and value as they pass through the layers. Additionally, we adopt fuzzy logic to handle uncertainty and alleviate potential mispruning risks arising from an imbalanced distribution of each token's importance. Token combining, on the other hand, condenses input sequences into smaller sizes in order to further compress the model. By integrating these two approaches, we not only improve the model's performance but also reduce its computational demands. Experiments with various datasets demonstrate superior performance compared to baseline models, especially with the best improvement over the existing BERT model, achieving +5%p in accuracy and +5.6%p in F1 score. Additionally, memory cost is reduced to 0.61x, and a speedup of 1.64x is achieved. © 2023 Association for Computational Linguistics.	-
dc.format.extent	12	-
dc.language	영어	-
dc.language.iso	ENG	-
dc.publisher	Association for Computational Linguistics (ACL)	-
dc.title	Focus on the Core: Efficient Attention via Pruned Token Compression for Document Classification	-
dc.type	Article	-
dc.identifier.bibliographicCitation	Findings of the Association for Computational Linguistics: EMNLP 2023, pp 13617 - 13628	-
dc.description.isOpenAccess	N	-
dc.identifier.scopusid	2-s2.0-85183303870	-
dc.citation.endPage	13628	-
dc.citation.startPage	13617	-
dc.citation.title	Findings of the Association for Computational Linguistics: EMNLP 2023	-
dc.type.docType	Conference paper	-
dc.description.journalRegisteredClass	scopus	-

Files in This Item: There are no files associated with this item.

Appears in Collections: Graduate School of Advanced Imaging Sciences, Multimedia and Film > Department of Imaging Science and Arts > 1. Journal Articles

Show simple item record

qrcode

Related Researcher

Researcher Kim, Young Bin photo

Kim, Young Bin: 첨단영상대학원 (영상학과)

Read more

Altmetrics

Total Views & Downloads

RSS_1.0 RSS_2.0 ATOM_1.0

84, Heukseok-ro, Dongjak-gu, Seoul, Republic of Korea (06974)02-820-6194

Certain data included herein are derived from the © Web of Science of Clarivate Analytics. All rights reserved.
You may not copy or re-distribute this material in whole or in part without the prior written consent of Clarivate Analytics.

Detailed Information

Related Researcher

Altmetrics

Total Views & Downloads

BROWSE