Detailed Information

Cited 0 time in webofscience Cited 0 time in scopus
Metadata Downloads

S-ViT: Sparse Vision Transformer for Accurate Face Recognition

Authors
Kim, GeunsuPark, GyudoKang, SoohyeokWoo, Simon S.
Issue Date
Mar-2023
Publisher
Association for Computing Machinery
Keywords
deep learning model compression; face recognition; neural networks; pruning; vision transformer
Citation
Proceedings of the ACM Symposium on Applied Computing, pp 1130 - 1138
Pages
9
Indexed
SCOPUS
Journal Title
Proceedings of the ACM Symposium on Applied Computing
Start Page
1130
End Page
1138
URI
https://scholarworks.bwise.kr/skku/handle/2021.sw.skku/106826
DOI
10.1145/3555776.3577640
ISSN
0000-0000
Abstract
Most of the existing face recognition applications using deep learning models have leveraged CNN-based architectures as the feature extractor. However, recent studies have shown that in computer vision tasks, vision transformer-based models often outperform CNN-based models. Therefore, in this work, we propose a Sparse Vision Transformer (S-ViT) based on the Vision Transformer (ViT) architecture to improve the face recognition tasks. After the model is trained, S-ViT tends to have a sparse distribution of weights compared to ViT, so we named it according to these characteristics. Unlike the conventional ViT, our proposed S-ViT adopts image Relative Positional Encoding (iRPE) method for positional encoding. Also, S-ViT has been modified so that all token embeddings, not just class token, participate in the decoding process. Through extensive experiment, we showed that S-ViT achieves better performance in closed-set than the other baseline models, and showed better performance than the baseline ViT-based models. For example, when using ArcFace as the loss function in the identification protocol, S-ViT achieved up to 3.27% higher accuracy than ResNet50. We also show that the use of ArcFace loss functions yields greater performance gains in S-ViT than in baseline models. In addition, S-ViT has an advantage in cost-performance trade-off because it tends to be more robust to the pruning technique than the underlying model, ViT. Therefore, S-ViT offers the additional advantage, which can be applied more flexibly in the target devices with limited resources. © 2023 ACM.
Files in This Item
There are no files associated with this item.
Appears in
Collections
Computing and Informatics > Computer Science and Engineering > 1. Journal Articles

qrcode

Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.

Related Researcher

Researcher WOO, SIMON SUNGIL photo

WOO, SIMON SUNGIL
Computing and Informatics (Computer Science and Engineering)
Read more

Altmetrics

Total Views & Downloads

BROWSE