Fusion Attention for Action Recognition: Integrating Sparse-Dense and Global Attention for Video Action Recognition

Kim, Hyun-Woo; Choi, Yong-Suk

doi:10.3390/s24216842

Detailed Information

Cited 0 time in webofscience

Cited 0 time in scopus

Metadata Downloads

Fusion Attention for Action Recognition: Integrating Sparse-Dense and Global Attention for Video Action Recognition

Full metadata record

DC Field	Value	Language
dc.contributor.author	Kim, Hyun-Woo	-
dc.contributor.author	Choi, Yong-Suk	-
dc.date.accessioned	2024-11-28T19:00:57Z	-
dc.date.available	2024-11-28T19:00:57Z	-
dc.date.issued	2024-11	-
dc.identifier.issn	1424-8220	-
dc.identifier.issn	1424-8220	-
dc.identifier.uri	https://scholarworks.bwise.kr/hanyang/handle/2021.sw.hanyang/198091	-
dc.description.abstract	Conventional approaches to video action recognition perform global attention over the entire video patches, which may be ineffective due to the temporal redundancy of video frames. Recent works on masked video modeling adopt a high-ratio tube masking and reconstruction strategy as a pre-training method to mitigate the problem of focusing on spatial features well but not on temporal features. Inspired by this pre-training method, we propose Fusion Attention for Action Recognition (FAR), which fuses the sparse-dense attention patterns specialized for temporal features with global attention during fine-tuning. FAR has three main components: head-split sparse-dense attention (HSDA), token-group interaction, and group-averaged classifier. First, HSDA splits the head of multi-head self-attention to fuse global and sparse-dense attention. The sparse-dense attention is divided into groups of tube-shaped patches to focus on temporal features. Second, token-group interaction is used to improve information exchange between divided patch groups. Finally, the group-averaged classifier uses spatio-temporal features from different patch groups to improve performance. The proposed method uses the weight parameters that are pre-trained with VideoMAE and MVD, and achieves higher performance (+0.1-0.4%) with less computation than models fine-tuned with global attention on Something-Something V2 and Kinetics-400. Moreover, qualitative comparisons show that FAR captures temporal features quite well in highly redundant video frames. The FAR approach demonstrates improved action recognition with efficient computation, and exploring its adaptability across different pre-training methods presents an interesting direction for future research.	-
dc.format.extent	18	-
dc.language	영어	-
dc.language.iso	ENG	-
dc.publisher	Multidisciplinary Digital Publishing Institute (MDPI)	-
dc.title	Fusion Attention for Action Recognition: Integrating Sparse-Dense and Global Attention for Video Action Recognition	-
dc.type	Article	-
dc.publisher.location	스위스	-
dc.identifier.doi	10.3390/s24216842	-
dc.identifier.scopusid	2-s2.0-85208575427	-
dc.identifier.wosid	001351021200001	-
dc.identifier.bibliographicCitation	Sensors, v.24, no.21, pp 1 - 18	-
dc.citation.title	Sensors	-
dc.citation.volume	24	-
dc.citation.number	21	-
dc.citation.startPage	1	-
dc.citation.endPage	18	-
dc.type.docType	Article	-
dc.description.isOpenAccess	Y	-
dc.description.journalRegisteredClass	scie	-
dc.description.journalRegisteredClass	scopus	-
dc.relation.journalResearchArea	Chemistry	-
dc.relation.journalResearchArea	Engineering	-
dc.relation.journalResearchArea	Instruments & Instrumentation	-
dc.relation.journalWebOfScienceCategory	Chemistry, Analytical	-
dc.relation.journalWebOfScienceCategory	Engineering, Electrical & Electronic	-
dc.relation.journalWebOfScienceCategory	Instruments & Instrumentation	-
dc.subject.keywordPlus	Video analysis	-
dc.subject.keywordAuthor	action recognition	-
dc.subject.keywordAuthor	fusion attention	-
dc.subject.keywordAuthor	temporal redundancy	-
dc.identifier.url	https://www.mdpi.com/1424-8220/24/21/6842	-

Files in This Item

Fusion Attention for Action Recognition Integrating Sparse-Dense and Global Attention for Video Action Recognition.pdf 10.52 MB

Appears in Collections: 서울 공과대학 > 서울 컴퓨터소프트웨어학부 > 1. Journal Articles

Show simple item record

qrcode

Related Researcher

Researcher Choi, Yong Suk photo

Choi, Yong Suk: COLLEGE OF ENGINEERING (SCHOOL OF COMPUTER SCIENCE)

Read more

Altmetrics

Total Views & Downloads

RSS_1.0 RSS_2.0 ATOM_1.0

222, Wangsimni-ro, Seongdong-gu, Seoul, 04763, Korea+82-2-2220-1366

Certain data included herein are derived from the © Web of Science of Clarivate Analytics. All rights reserved.
You may not copy or re-distribute this material in whole or in part without the prior written consent of Clarivate Analytics.

Detailed Information

Related Researcher

Altmetrics

Total Views & Downloads

BROWSE