ADVC: Adversarial dense video captioning with unsupervised pretraining
DC Field | Value | Language |
---|---|---|
dc.contributor.author | 윤종원 | - |
dc.date.accessioned | 2025-06-16T08:00:23Z | - |
dc.date.available | 2025-06-16T08:00:23Z | - |
dc.date.issued | 2025-09 | - |
dc.identifier.issn | 0262-8856 | - |
dc.identifier.issn | 1872-8138 | - |
dc.identifier.uri | https://scholarworks.bwise.kr/erica/handle/2021.sw.erica/125641 | - |
dc.description.abstract | Dense video captioning involves detecting and describing events that represent a video story in untrimmed videos using sentences. This task holds great promise for various video analytics-related applications. However, the nondeterministic nature of dense video captioning poses challenges in generating realistic events and captions. Recently, with the advent of large-scale video datasets, pretraining approaches have emerged. Nevertheless, these methods still require strict supervision and often lack accurate localization or are tightly coupled with localization and captioning. To address these challenges, this paper introduces ADVC, a novel approach for dense video captioning that combines unsupervised pre-training and adversarial adaptation. ADVC learns from readily available unlabeled videos and text corpora at scale, thereby reducing the need for strict supervision. It achieves realistic outcomes by directly learning the distribution of human-annotated events and captions through adversarial adaptation. Adversarial adaptation allows for the decoupling of localization and captioning subtasks while effectively considering their interdependence. We evaluate the performance of ADVC using multiple benchmark datasets to showcase the efficacy of our unsupervised pre-training and adversarial adaptation approach. | - |
dc.format.extent | 10 | - |
dc.language | 영어 | - |
dc.language.iso | ENG | - |
dc.publisher | ELSEVIER | - |
dc.title | ADVC: Adversarial dense video captioning with unsupervised pretraining | - |
dc.type | Article | - |
dc.publisher.location | 네델란드 | - |
dc.identifier.doi | 10.1016/j.imavis.2025.105595 | - |
dc.identifier.scopusid | 2-s2.0-105007787882 | - |
dc.identifier.wosid | 001508997000001 | - |
dc.identifier.bibliographicCitation | IMAGE AND VISION COMPUTING, v.161, pp 1 - 10 | - |
dc.citation.title | IMAGE AND VISION COMPUTING | - |
dc.citation.volume | 161 | - |
dc.citation.startPage | 1 | - |
dc.citation.endPage | 10 | - |
dc.type.docType | Article | - |
dc.description.isOpenAccess | N | - |
dc.description.journalRegisteredClass | scie | - |
dc.description.journalRegisteredClass | scopus | - |
dc.relation.journalResearchArea | Computer Science | - |
dc.relation.journalResearchArea | Engineering | - |
dc.relation.journalResearchArea | Optics | - |
dc.relation.journalWebOfScienceCategory | Computer Science, Artificial Intelligence | - |
dc.relation.journalWebOfScienceCategory | Computer Science, Software Engineering | - |
dc.relation.journalWebOfScienceCategory | Computer Science, Theory & Methods | - |
dc.relation.journalWebOfScienceCategory | Engineering, Electrical & Electronic | - |
dc.relation.journalWebOfScienceCategory | Optics | - |
dc.subject.keywordAuthor | Dense video captioning | - |
dc.subject.keywordAuthor | Generative adversarial networks | - |
dc.subject.keywordAuthor | Nondeterminism | - |
dc.subject.keywordAuthor | Unsupervised learning | - |
dc.identifier.url | https://www.sciencedirect.com/science/article/pii/S0262885625001830?via%3Dihub | - |
Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.
55 Hanyangdeahak-ro, Sangnok-gu, Ansan, Gyeonggi-do, 15588, Korea+82-31-400-4269 sweetbrain@hanyang.ac.kr
COPYRIGHT © 2021 HANYANG UNIVERSITY. ALL RIGHTS RESERVED.
Certain data included herein are derived from the © Web of Science of Clarivate Analytics. All rights reserved.
You may not copy or re-distribute this material in whole or in part without the prior written consent of Clarivate Analytics.