Detailed Information

Cited 0 time in webofscience Cited 0 time in scopus
Metadata Downloads

IFCap: Image-like Retrieval and Frequency-based Entity Filtering for Zero-shot Captioning

Full metadata record
DC Field Value Language
dc.contributor.authorLee, Soeun-
dc.contributor.authorKim, Si-Woo-
dc.contributor.authorKim, Taewhan-
dc.contributor.authorKim, Dong-Jin-
dc.date.accessioned2025-03-11T01:30:14Z-
dc.date.available2025-03-11T01:30:14Z-
dc.date.issued2024-11-
dc.identifier.urihttps://scholarworks.bwise.kr/hanyang/handle/2021.sw.hanyang/206728-
dc.description.abstractRecent advancements in image captioning have explored text-only training methods to overcome the limitations of paired image-text data. However, existing text-only training methods often overlook the modality gap between using text data during training and employing images during inference. To address this issue, we propose a novel approach called Image-like Retrieval, which aligns text features with visually relevant features to mitigate the modality gap. Our method further enhances the accuracy of generated captions by designing a Fusion Module that integrates retrieved captions with input features. Additionally, we introduce a Frequency-based Entity Filtering technique that significantly improves caption quality. We integrate these methods into a unified framework, which we refer to as IFCap (Image-like Retrieval and Frequency-based Entity Filtering for Zero-shot Captioning). Through extensive experimentation, our straightforward yet powerful approach has demonstrated its efficacy, outperforming the state-of-the-art methods by a significant margin in both image captioning and video captioning compared to zero-shot captioning based on text-only training.-
dc.format.extent13-
dc.language영어-
dc.language.isoENG-
dc.publisherAssociation for Computational Linguistics (ACL)-
dc.titleIFCap: Image-like Retrieval and Frequency-based Entity Filtering for Zero-shot Captioning-
dc.typeArticle-
dc.identifier.doi10.48550/arXiv.2409.18046-
dc.identifier.scopusid2-s2.0-85217803155-
dc.identifier.bibliographicCitationEMNLP 2024 - 2024 Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference, pp 20715 - 20727-
dc.citation.titleEMNLP 2024 - 2024 Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference-
dc.citation.startPage20715-
dc.citation.endPage20727-
dc.type.docTypeConference paper-
dc.description.isOpenAccessN-
dc.description.journalRegisteredClassscopus-
dc.subject.keywordPlusComputational linguistics-
dc.subject.keywordPlusImage retrieval-
dc.subject.keywordPlusZero-shot learning-
dc.identifier.urlhttps://arxiv.org/abs/2409.18046-
Files in This Item
Go to Link
Appears in
Collections
서울 공과대학 > ETC > 1. Journal Articles

qrcode

Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.

Related Researcher

Researcher Kim, Dong Jin photo

Kim, Dong Jin
COLLEGE OF ENGINEERING (DEPARTMENT OF INTELLIGENCE COMPUTING)
Read more

Altmetrics

Total Views & Downloads

BROWSE