Detailed Information

Cited 0 time in webofscience Cited 0 time in scopus
Metadata Downloads

Step by Step: A Gradual Approach for Dense Video Captioningopen access

Authors
Choi, WangyuChen, JiasiYoon, Jongwon
Issue Date
May-2023
Publisher
Institute of Electrical and Electronics Engineers Inc.
Keywords
Dense video captioning; event captioning; event localization; event proposal generation; video captioning
Citation
IEEE Access, v.11, pp 51949 - 51959
Pages
11
Indexed
SCIE
SCOPUS
Journal Title
IEEE Access
Volume
11
Start Page
51949
End Page
51959
URI
https://scholarworks.bwise.kr/erica/handle/2021.sw.erica/112992
DOI
10.1109/ACCESS.2023.3279816
ISSN
2169-3536
Abstract
Dense video captioning aims to localize and describe events for storytelling in untrimmed videos. It is a conceptually very challenging task that requires concise, relevant, and coherent captioning based on high-quality event localization. Unlike simple temporal action localization tasks without overlapping events, dense video captioning requires detecting multiple/overlapping regions in order to branch out the video story. Most existing methods generate numerous candidate event proposals and then eliminate duplicate ones using a event proposal selection algorithm (e.g., non-maximum suppression) or generate event proposals directly through box prediction and binary classification mechanisms, similar to object detection tasks. Despite these efforts, the aforementioned approaches tend to fail to localize overlapping events into different stories, hindering high-quality captioning. In this paper, we propose SBS, a dense video captioning framework with a gradual approach that addresses the challenge of localizing overlapping events and eventually constructs high-quality captioning. SBS accurately estimates the number of explicit events for each video snippet and then detects the boundaries context/activities, which are the details for generating the event proposals. Based on both the number of events and boundaries, SBS generates the event proposals. SBS encodes the context of the event sequence and finally generates sentences describing the event proposals. Our framework is fairly effective in localizing multiple/overlapping events, thus experimental results show the state-of-the-art performance compared to the existing methods.
Files in This Item
Go to Link
Appears in
Collections
COLLEGE OF COMPUTING > ERICA 컴퓨터학부 > 1. Journal Articles

qrcode

Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.

Related Researcher

Researcher Yoon, Jong won photo

Yoon, Jong won
ERICA 소프트웨어융합대학 (ERICA 컴퓨터학부)
Read more

Altmetrics

Total Views & Downloads

BROWSE