Detailed Information

Cited 0 time in webofscience Cited 0 time in scopus
Metadata Downloads

Parallel Pathway Dense Video Captioning With Deformable Transformeropen access

Authors
Choi, WangyuChen, JiasiYoon, Jongwon
Issue Date
Dec-2022
Publisher
IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
Keywords
Machine learning; deep learning; video and language; video captioning
Citation
IEEE Access, v.10, pp 129899 - 129910
Pages
12
Indexed
SCIE
SCOPUS
Journal Title
IEEE Access
Volume
10
Start Page
129899
End Page
129910
URI
https://scholarworks.bwise.kr/erica/handle/2021.sw.erica/111561
DOI
10.1109/ACCESS.2022.3228821
ISSN
2169-3536
Abstract
Dense video captioning is a very challenging task because it requires a high-level understanding of the video story, as well as pinpointing details such as objects and motions for a consistent and fluent description of the video. Many existing solutions divide this problem into two sub-tasks, event detection and captioning, and solve them sequentially ( "localize-then-describe " or reverse). Consequently, the final outcome is highly dependent on the performance of the preceding modules. In this paper, we decompose this sequential approach by proposing a parallel pathway dense video captioning framework that localizes and describes events simultaneously without any bottlenecks. We introduce a representation organization network at the branching point of the parallel pathway to organize the encoded video feature by considering the entire storyline. Then, an event localizer focuses to localize events without any event proposal generation network, a sentence generator describes events while considering the fluency and coherency of sentences. Our method has several advantages over existing work: (i) the final output does not depend on the output of the preceding modules, (ii) it improves existing parallel decoding methods by relieving the bottleneck of information. We evaluate the performance of PPVC on large-scale benchmark datasets, the ActivityNet Captions, and YouCook2. PPVC not only outperforms existing algorithms on the majority of metrics but also improves on both datasets by 5.4% and 4.9% compared to the state-of-the-art parallel decoding method.
Files in This Item
Appears in
Collections
COLLEGE OF COMPUTING > ERICA 컴퓨터학부 > 1. Journal Articles

qrcode

Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.

Related Researcher

Researcher Yoon, Jongwon photo

Yoon, Jongwon
ERICA 소프트웨어융합대학 (ERICA 컴퓨터학부)
Read more

Altmetrics

Total Views & Downloads

BROWSE