Token Merging with Class Importance Score
- Authors
- 설광수; Roh, Si-Dong; Chung, Ki-Seok
- Issue Date
- Oct-2023
- Keywords
- computer vision; deep learning; model compression
- Citation
- IECON Proceedings (Industrial Electronics Conference), pp 1 - 6
- Pages
- 6
- Indexed
- SCOPUS
- Journal Title
- IECON Proceedings (Industrial Electronics Conference)
- Start Page
- 1
- End Page
- 6
- URI
- https://scholarworks.bwise.kr/hanyang/handle/2021.sw.hanyang/196120
- DOI
- 10.1109/IECON51785.2023.10312420
- ISSN
- 2162-4704
- Abstract
- Vision Transformers have achieved high performance in computer vision tasks, but their high computational cost and low throughput are weaknesses. Therefore, much research has been done to reduce the size of Vision Transformers. Among them, studies on pruning unnecessary tokens are being actively conducted to reduce the number of tokens used for self-attention computation inside the Vision Transformer. Recently, token merging has been proposed as a new alternative approach. These studies aim to increase throughput with a small accuracy drop by merging similar tokens instead of pruning them. A previous study finds similar tokens using cosine similarity and merges them with a weighted average. However, merging a large number of tokens at once may lead to an accuracy drop because of the underestimating of important information. In this paper, we propose ToMeCIS, a method that merges similar tokens through a weighted average using the class importance score of tokens to reduce the accuracy drop. When ToMeCIS is applied to a pretrained DeiT-S and evaluated on the ImageNet-1k dataset, the throughput is increased by about 50% with an accuracy drop of less than 1% without additional training. In addition, importance scores were evaluated with different metrics to find the best accuracy versus throughput trade-off.
- Files in This Item
-
Go to Link
- Appears in
Collections - 서울 공과대학 > 서울 융합전자공학부 > 1. Journal Articles

Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.