Convolutional Method for Modeling Video Temporal Context Effectively in Transformer

Park, Hae Sung; Choi, Yong Suk

doi:10.1145/3555776.3578481

Detailed Information

Cited 0 time in webofscience

Cited 0 time in scopus

Metadata Downloads

Convolutional Method for Modeling Video Temporal Context Effectively in Transformer

Full metadata record

DC Field	Value	Language
dc.contributor.author	Park, Hae Sung	-
dc.contributor.author	Choi, Yong Suk	-
dc.date.accessioned	2024-11-28T14:00:58Z	-
dc.date.available	2024-11-28T14:00:58Z	-
dc.date.issued	2023-03	-
dc.identifier.uri	https://scholarworks.bwise.kr/hanyang/handle/2021.sw.hanyang/196713	-
dc.description.abstract	Video understanding remains a challenging task because video understanding models have many parameters to be trained and should capture detailed spatiotemporal contexts in video effectively. Recent methods have typically employed 3D convolution modules or else self-attention modules. However, we identify that when the self-attention mechanism captures temporal semantics, it often struggles to find out proper temporal context for video understanding. In this paper, we propose a new method for enhancing temporal modeling by incorporating 3D convolution modules into attention-based model, transformer. In particular, we replace the temporal attention of the TimeSformer with a 3D convolution module to improve temporal context learning. In contrast to the TimeSformer, our proposed method can focus on modeling temporal details at the low-level encoders, while gradually getting to focus on temporal contexts more globally at the high-level encoders. Our method surpasses the TimeSformer by 2.2% margin on Something-Something v2, which is required complex temporal modeling for getting high performance.	-
dc.format.extent	4	-
dc.language	영어	-
dc.language.iso	ENG	-
dc.publisher	ASSOC COMPUTING MACHINERY	-
dc.title	Convolutional Method for Modeling Video Temporal Context Effectively in Transformer	-
dc.type	Article	-
dc.publisher.location	미국	-
dc.identifier.doi	10.1145/3555776.3578481	-
dc.identifier.scopusid	2-s2.0-85162913929	-
dc.identifier.wosid	001124308100172	-
dc.identifier.bibliographicCitation	38TH ANNUAL ACM SYMPOSIUM ON APPLIED COMPUTING, SAC 2023, pp 1205 - 1208	-
dc.citation.title	38TH ANNUAL ACM SYMPOSIUM ON APPLIED COMPUTING, SAC 2023	-
dc.citation.startPage	1205	-
dc.citation.endPage	1208	-
dc.type.docType	Proceedings Paper	-
dc.description.isOpenAccess	N	-
dc.description.journalRegisteredClass	scie	-
dc.description.journalRegisteredClass	scopus	-
dc.relation.journalResearchArea	Computer Science	-
dc.relation.journalWebOfScienceCategory	Computer Science, Interdisciplinary Applications	-
dc.relation.journalWebOfScienceCategory	Computer Science, Theory & Methods	-
dc.subject.keywordPlus	3D modeling	-
dc.subject.keywordPlus	Classification (of information)	-
dc.subject.keywordPlus	Convolution	-
dc.subject.keywordPlus	Semantics	-
dc.subject.keywordPlus	Signal encoding	-
dc.subject.keywordAuthor	Video classification	-
dc.subject.keywordAuthor	Transformer	-
dc.subject.keywordAuthor	3D convolution	-
dc.subject.keywordAuthor	Self-attention	-
dc.subject.keywordAuthor	Temporal feature	-
dc.subject.keywordAuthor	Computer Vision	-
dc.identifier.url	https://dl.acm.org/doi/10.1145/3555776.3578481	-

Files in This Item: Go to Link

Appears in Collections: 서울 공과대학 > 서울 컴퓨터소프트웨어학부 > 1. Journal Articles

Show simple item record

qrcode

Related Researcher

Researcher Choi, Yong Suk photo

Choi, Yong Suk: COLLEGE OF ENGINEERING (SCHOOL OF COMPUTER SCIENCE)

Read more

Altmetrics

Total Views & Downloads

RSS_1.0 RSS_2.0 ATOM_1.0

222, Wangsimni-ro, Seongdong-gu, Seoul, 04763, Korea+82-2-2220-1366

Certain data included herein are derived from the © Web of Science of Clarivate Analytics. All rights reserved.
You may not copy or re-distribute this material in whole or in part without the prior written consent of Clarivate Analytics.

Detailed Information

Related Researcher

Altmetrics

Total Views & Downloads

BROWSE