Dense but Efficient VideoQA for Intricate Compositional Reasoning
DC Field | Value | Language |
---|---|---|
dc.contributor.author | Lee, Jihyeon | - |
dc.contributor.author | Kang, Wooyoung | - |
dc.contributor.author | Kim, Eun Sol | - |
dc.date.accessioned | 2023-03-13T07:21:15Z | - |
dc.date.available | 2023-03-13T07:21:15Z | - |
dc.date.created | 2023-03-08 | - |
dc.date.issued | 2023-01 | - |
dc.identifier.issn | 0000-0000 | - |
dc.identifier.uri | https://scholarworks.bwise.kr/hanyang/handle/2021.sw.hanyang/182539 | - |
dc.description.abstract | It is well known that most of the conventional video question answering (VideoQA) datasets consist of easy questions requiring simple reasoning processes. However, long videos inevitably contain complex and compositional semantic structures along with the spatio-temporal axis, which requires a model to understand the compositional structures inherent in the videos. In this paper, we suggest a new compositional VideoQA method based on transformer architecture with a deformable attention mechanism to address the complex VideoQA tasks. The deformable attentions are introduced to sample a subset of informative visual features from the dense visual feature map to cover a temporally long range of frames efficiently. Furthermore, the dependency structure within the complex question sentences is also combined with the language embeddings to readily understand the relations among question words. Extensive experiments and ablation studies show that the suggested dense but efficient model outperforms other baselines. | - |
dc.language | 영어 | - |
dc.language.iso | en | - |
dc.publisher | Institute of Electrical and Electronics Engineers Inc. | - |
dc.title | Dense but Efficient VideoQA for Intricate Compositional Reasoning | - |
dc.type | Article | - |
dc.contributor.affiliatedAuthor | Kim, Eun Sol | - |
dc.identifier.doi | 10.1109/WACV56688.2023.00117 | - |
dc.identifier.scopusid | 2-s2.0-85149030587 | - |
dc.identifier.wosid | 000971500201020 | - |
dc.identifier.bibliographicCitation | Proceedings - 2023 IEEE Winter Conference on Applications of Computer Vision, WACV 2023, pp.1114 - 1123 | - |
dc.relation.isPartOf | Proceedings - 2023 IEEE Winter Conference on Applications of Computer Vision, WACV 2023 | - |
dc.citation.title | Proceedings - 2023 IEEE Winter Conference on Applications of Computer Vision, WACV 2023 | - |
dc.citation.startPage | 1114 | - |
dc.citation.endPage | 1123 | - |
dc.type.rims | ART | - |
dc.type.docType | Proceedings Paper | - |
dc.description.journalClass | 1 | - |
dc.description.isOpenAccess | N | - |
dc.description.journalRegisteredClass | scopus | - |
dc.relation.journalResearchArea | Computer Science | - |
dc.relation.journalResearchArea | Engineering | - |
dc.relation.journalResearchArea | Imaging Science & Photographic Technology | - |
dc.relation.journalWebOfScienceCategory | Computer Science, Artificial Intelligence | - |
dc.relation.journalWebOfScienceCategory | Engineering, Electrical & Electronic | - |
dc.relation.journalWebOfScienceCategory | Imaging Science & Photographic Technology | - |
dc.subject.keywordPlus | Computer vision | - |
dc.subject.keywordPlus | Action recognition | - |
dc.subject.keywordPlus | Algorithm: video recognition and understanding (tracking, action recognition, etc. | - |
dc.subject.keywordPlus | Compositional reasoning | - |
dc.subject.keywordPlus | Question Answering | - |
dc.subject.keywordPlus | Simple++ | - |
dc.subject.keywordPlus | Video recognition | - |
dc.subject.keywordPlus | Video understanding | - |
dc.subject.keywordPlus | Vision + language and/or other modality | - |
dc.subject.keywordPlus | Visual feature | - |
dc.subject.keywordPlus | Semantics | - |
dc.subject.keywordAuthor | Algorithms: Video recognition and understanding (tracking, action recognition, etc.) | - |
dc.subject.keywordAuthor | Vision + language and/or other modalities | - |
dc.identifier.url | https://ieeexplore.ieee.org/document/10030999 | - |
Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.
222, Wangsimni-ro, Seongdong-gu, Seoul, 04763, Korea+82-2-2220-1365
COPYRIGHT © 2021 HANYANG UNIVERSITY.
Certain data included herein are derived from the © Web of Science of Clarivate Analytics. All rights reserved.
You may not copy or re-distribute this material in whole or in part without the prior written consent of Clarivate Analytics.