Detailed Information

Cited 0 time in webofscience Cited 0 time in scopus
Metadata Downloads

PWS-DVC: Enhancing Weakly Supervised Dense Video Captioning with Pretraining Approachopen access

Authors
CHOI, WANGYUCHEN, JIASIYOON, JONGWON
Issue Date
Nov-2023
Publisher
Institute of Electrical and Electronics Engineers Inc.
Keywords
Cross-modal video-text comprehension; dense video captioning; event localization in videos; fine-tuning for dense captioning; natural language processing in videos; pretraining; retraining for video understanding; video description generation; weakly supervised
Citation
IEEE Access, v.11, pp 128162 - 128174
Pages
13
Indexed
SCIE
SCOPUS
Journal Title
IEEE Access
Volume
11
Start Page
128162
End Page
128174
URI
https://scholarworks.bwise.kr/erica/handle/2021.sw.erica/115648
DOI
10.1109/ACCESS.2023.3331756
ISSN
2169-3536
Abstract
In recent times, there has been a notable increase in efforts to simultaneously comprehend vision and language, driven by the availability of video-related datasets and advancements in language models within the domain of natural language processing. Dense video captioning poses a significant challenge in understanding untrimmed video and generating several event-based sentences to describe the video. Numerous endeavors have been undertaken to enhance the efficacy of the dense video captioning task by the utilization of various approaches, such as bottom-up, top-down, parallel pipeline, pretraining, etc. In contrast, the weakly supervised dense video captioning method presents a highly promising strategy for generating dense video captions solely based on captions, without relying on any knowledge of ground-truth events, which distinguishes it from widely employed approaches. Nevertheless, this approach has a drawback that inadequate captions might hurt both event localization and captioning. This paper introduces PWS-DVC, a novel approach aimed at enhancing the performance of weakly supervised dense video captioning. PWS-DVC’s event captioning module is initially trained on video-clip datasets, which are extensively accessible video datasets by leveraging the absence of ground-truth data during training. Subsequently, it undergoes fine-tuning specifically for dense video captioning. In order to demonstrate the efficacy of PWS-DVC, we conduct comparative experiments with state-of-the-art methods using the ActivityNet Captions dataset. The findings indicate that PWS-DVC exhibits improved performance in comparison to current approaches in weakly supervised dense video captioning.
Files in This Item
Appears in
Collections
COLLEGE OF COMPUTING > ERICA 컴퓨터학부 > 1. Journal Articles

qrcode

Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.

Related Researcher

Researcher Yoon, Jong won photo

Yoon, Jong won
ERICA 소프트웨어융합대학 (ERICA 컴퓨터학부)
Read more

Altmetrics

Total Views & Downloads

BROWSE