Detailed Information

Cited 0 time in webofscience Cited 0 time in scopus
Metadata Downloads

Cascaded MPN: Cascaded Moment Proposal Network for Video Corpus Moment Retrievalopen access

Authors
Yoon, SunjaeKim, DahyunKim, JunyeongYoo, Chang D.
Issue Date
Jun-2022
Publisher
IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
Keywords
Proposals; Semantics; Streaming media; Cognition; Bipartite graph; Training; Task analysis; Video corpus moment retrieval; cascaded moment proposal; multi-modal interaction; vision-language system
Citation
IEEE ACCESS, v.10, pp 64560 - 64568
Pages
9
Journal Title
IEEE ACCESS
Volume
10
Start Page
64560
End Page
64568
URI
https://scholarworks.bwise.kr/cau/handle/2019.sw.cau/61291
DOI
10.1109/ACCESS.2022.3183106
ISSN
2169-3536
Abstract
Video corpus moment retrieval aims to localize temporal moments corresponding to textual query in a large video corpus. Previous moment retrieval systems are largely grouped into two categories: (1) anchor-based method which presets a set of video segment proposals (via sliding window) and predicts proposal that best matches with the query, and (2) anchor-free method which directly predicts frame-level start-end time of the moment related to the query (via regression). Both methods have their own inherent weaknesses: (1) anchor-based method is vulnerable to heuristic rules of generating video proposals, which causes restrictive moment prediction in variant length; and (2) anchor-free method, as is based on frame-level interplay, suffers from insufficient understanding of contextual semantics from long and sequential video. To overcome the aforementioned challenges, our proposed Cascaded Moment Proposal Network incorporates the following two main properties: (1) Hierarchical Semantic Reasoning which provides video understanding from anchor-free level to anchor-based level via building hierarchical video graph, and (2) Cascaded Moment Proposal Generation which precisely performs moment retrieval via devising cascaded multi-modal feature interaction among anchor-free and anchor-based video semantics. Extensive experiments show state-of-the-art performance on three moment retrieval benchmarks (TVR, ActivityNet, DiDeMo), while qualitative analysis shows improved interpretability. The code will be made publicly available.
Files in This Item
Appears in
Collections
College of Software > Department of Artificial Intelligence > 1. Journal Articles

qrcode

Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.

Related Researcher

Researcher Kim, Junyeong photo

Kim, Junyeong
소프트웨어대학 (AI학과)
Read more

Altmetrics

Total Views & Downloads

BROWSE