Detailed Information

Cited 0 time in webofscience Cited 0 time in scopus
Metadata Downloads

Compositional Video Understanding with Spatiotemporal Structure-based Transformers

Authors
Yun, HoyeoungAhn, JinwooKim, MinseoKim, Eun-Sol
Issue Date
Sep-2024
Publisher
IEEE
Citation
Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp 18751 - 18760
Pages
10
Indexed
SCOPUS
Journal Title
Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition
Start Page
18751
End Page
18760
URI
https://scholarworks.bwise.kr/hanyang/handle/2021.sw.hanyang/212714
DOI
10.1109/CVPR52733.2024.01774
ISSN
1063-6919
2575-7075
Abstract
In this paper, we suggest a new novel method to understand complex semantic structures through long video inputs. Conventional methods for understanding videos have been focused on short-term clips, and trained to get visual representations for the short clips using convolutional neural networks or transformer architectures. However, most real-world videos are composed of long videos ranging from minutes to hours, therefore, it essentially brings limitations to understanding the overall semantic structures of the long videos by dividing them into small clips and learning the representations of them. We suggest a new algorithm to learn the multi-granular semantic structures of videos, by defining spatiotemporal high-order relationships among object-based representations as semantic units. The proposed method includes a new transformer architecture capable of learning spatiotemporal graphs, and a compositional learning method to learn disentangled features for each semantic unit. Using the suggested method, we resolve the challenging video task, which is compositional generalization understanding of unseen videos. In experiments, we demonstrate new state-of-the-art performances for two challenging video datasets.
Files in This Item
Go to Link
Appears in
Collections
서울 공과대학 > 서울 컴퓨터소프트웨어학부 > 1. Journal Articles

qrcode

Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.

Related Researcher

Researcher Kim, Eun Sol photo

Kim, Eun Sol
COLLEGE OF ENGINEERING (SCHOOL OF COMPUTER SCIENCE)
Read more

Altmetrics

Total Views & Downloads

BROWSE