Detailed Information

Cited 0 time in webofscience Cited 0 time in scopus
Metadata Downloads

Adversarial Normalization: I Can visualize Everything (ICE)

Authors
Choi, HoyoungJin, SeungwanHan, Kyungsik
Issue Date
Jun-2023
Publisher
IEEE Computer Society
Keywords
Explainable computer vision
Citation
Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, v.2023-June, pp 12115 - 12124
Pages
10
Indexed
SCIE
SCOPUS
Journal Title
Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition
Volume
2023-June
Start Page
12115
End Page
12124
URI
https://scholarworks.bwise.kr/hanyang/handle/2021.sw.hanyang/192191
DOI
10.1109/CVPR52729.2023.01166
ISSN
1063-6919
Abstract
Vision transformers use [CLS] tokens to predict image classes. Their explainability visualization has been studied using relevant information from [CLS] tokens or focusing on attention scores during self-attention. Such visualization, however, is challenging because of the dependence of the structure of a vision transformer on skip connections and attention operators, the instability of non-linearities in the learning process, and the limited reflection of self-attention scores on relevance. We argue that the output vectors for each input patch token in a vision transformer retain the image information of each patch location, which can facilitate the prediction of an image class. In this paper, we propose ICE (Adversarial Normalization: I Can visualize Everything), a novel method that enables a model to directly predict a class for each patch in an image; thus, advancing the effective visualization of the explainability of a vision transformer. Our method distinguishes background from foreground regions by predicting background classes for patches that do not determine image classes. We used the DeiT-S model, the most representative model employed in studies, on the explainability visualization of vision transformers. On the ImageNet-Segmentation dataset, ICE outperformed all explainability visualization methods for four cases depending on the model size. We also conducted quantitative and qualitative analyses on the tasks of weakly-supervised object localization and unsupervised object discovery. On the CUB-200-2011 and PASCALVOC07/12 datasets, ICE achieved comparable performance to the state-of-the-art methods. We incorporated ICE into the encoder of DeiT-S and improved efficiency by 44.01% on the ImageNet dataset over that achieved by the original DeiT-S model. We showed performance on the accuracy and efficiency comparable to EViT, the state-of-the-art pruning model, demonstrating the effectiveness of ICE. The code is available at https://github.com/Hanyang-HCC-Lab/ICE.
Files in This Item
Go to Link
Appears in
Collections
ETC > 1. Journal Articles

qrcode

Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.

Related Researcher

Researcher Han, Kyungsik photo

Han, Kyungsik
COLLEGE OF ENGINEERING (DEPARTMENT OF INTELLIGENCE COMPUTING)
Read more

Altmetrics

Total Views & Downloads

BROWSE