Detailed Information

Cited 0 time in webofscience Cited 0 time in scopus
Metadata Downloads

Cap4Bridge: Caption-Guided Cross-Modal Contextualization with Stochastic Augmentation for Text-Video Retrieval

Full metadata record
DC Field Value Language
dc.contributor.authorJeon, Minju-
dc.contributor.authorKim, Hyungee-
dc.contributor.authorKim, Si-Woo-
dc.contributor.authorOh, Youngtaek-
dc.contributor.authorLee, Soeun-
dc.contributor.authorKim, Dong-Jin-
dc.date.accessioned2026-05-09T05:02:06Z-
dc.date.available2026-05-09T05:02:06Z-
dc.date.issued2026-04-
dc.identifier.issn2169-3536-
dc.identifier.issn2169-3536-
dc.identifier.urihttps://scholarworks.bwise.kr/hanyang/handle/2021.sw.hanyang/212541-
dc.description.abstractA key challenge in text-video retrieval is bridging the semantic gap between information-rich videos and concise text queries. Existing methods often address this by incorporating auxiliary captions from Large Language Models (LLMs) or employing stochastic modeling. However, these approaches face critical challenges: captions can lack domain-specific relevance, while stochastic methods that directly model text embeddings risk distorting the original query's intent. To overcome these issues, we propose Cap4Bridge, a framework that leverages semantic anchors searched from a domain-specific caption anchor bank. Our framework introduces two key components: 1) Caption-Guided Cross-Modality Contextualization, which uses a shared co-attention mechanism to enrich both video and text representations with these anchors, and 2) Similarity-Aware Stochastic Augmentation, which applies Gaussian noise scaled by relevance to the searched semantic anchors rather than the query itself. This integrated strategy bridges the fundamental information imbalance by providing complementary context to both modalities and robustly expanding the semantic representation while preserving the original query's intent. Our method achieves across most benchmarks, including R@1 scores of 58.5% on MSRVTT, 51.3% on MSVD, and 63.8% on DiDeMo, demonstrating its high efficacy and generalizability, particularly in challenging cross-domain settings.-
dc.format.extent12-
dc.language영어-
dc.language.isoENG-
dc.publisherIEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC-
dc.titleCap4Bridge: Caption-Guided Cross-Modal Contextualization with Stochastic Augmentation for Text-Video Retrieval-
dc.typeArticle-
dc.publisher.location미국-
dc.identifier.doi10.1109/ACCESS.2026.3680911-
dc.identifier.scopusid2-s2.0-105035550342-
dc.identifier.wosid001740812900002-
dc.identifier.bibliographicCitationIEEE ACCESS, v.14, pp 54442 - 54453-
dc.citation.titleIEEE ACCESS-
dc.citation.volume14-
dc.citation.startPage54442-
dc.citation.endPage54453-
dc.type.docTypeArticle-
dc.description.isOpenAccessY-
dc.description.journalRegisteredClassscie-
dc.description.journalRegisteredClassscopus-
dc.relation.journalResearchAreaComputer Science-
dc.relation.journalResearchAreaEngineering-
dc.relation.journalResearchAreaTelecommunications-
dc.relation.journalWebOfScienceCategoryComputer Science, Information Systems-
dc.relation.journalWebOfScienceCategoryEngineering, Electrical & Electronic-
dc.relation.journalWebOfScienceCategoryTelecommunications-
dc.subject.keywordPlusBenchmarking-
dc.subject.keywordPlusGaussian noise (electronic)-
dc.subject.keywordPlusImage retrieval-
dc.subject.keywordPlusLearning systems-
dc.subject.keywordPlusModeling languages-
dc.subject.keywordPlusSemantics-
dc.subject.keywordPlusStochastic models-
dc.subject.keywordPlusStochastic systems-
dc.subject.keywordAuthorBroadcasting-
dc.subject.keywordAuthorBroadcast technology-
dc.subject.keywordAuthorFiltering-
dc.subject.keywordAuthorFilters-
dc.subject.keywordAuthorVideos-
dc.subject.keywordAuthorVideo equipment-
dc.subject.keywordAuthorText to video-
dc.subject.keywordAuthorTV-
dc.subject.keywordAuthorVideo description-
dc.subject.keywordAuthorTelecommunications-
dc.subject.keywordAuthorComputer vision-
dc.subject.keywordAuthortext-video retrieval-
dc.subject.keywordAuthorcross-modal learning-
dc.subject.keywordAuthorsemantic alignment-
dc.identifier.urlhttps://ieeexplore.ieee.org/document/11474843-
Files in This Item
Go to Link
Appears in
Collections
서울 공과대학 > ETC > 1. Journal Articles

qrcode

Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.

Related Researcher

Researcher Kim, Dong Jin photo

Kim, Dong Jin
COLLEGE OF ENGINEERING (DEPARTMENT OF INTELLIGENCE COMPUTING)
Read more

Altmetrics

Total Views & Downloads

BROWSE