Detailed Information

Cited 0 time in webofscience Cited 0 time in scopus
Metadata Downloads

NuCap: A Numerically Aware Captioning Framework for Improved Numerical Reasoning

Full metadata record
DC Field Value Language
dc.contributor.authorJeong, Yuna-
dc.contributor.authorChoi, Yongsuk-
dc.date.accessioned2025-06-18T01:00:08Z-
dc.date.available2025-06-18T01:00:08Z-
dc.date.issued2025-05-
dc.identifier.issn2076-3417-
dc.identifier.issn2076-3417-
dc.identifier.urihttps://scholarworks.bwise.kr/hanyang/handle/2021.sw.hanyang/207611-
dc.description.abstractDespite advances in image captioning, existing models struggle to generate captions that include accurate numerical information, especially the number of objects. One reason for this issue is that the dataset used for training has a limited number of samples with numerical information about the image. To address this issue, we propose a new framework, the Numerically Aware Captioning (NuCap) model, to enhance numerical reasoning in caption generation. We extract dual features by combining a region-attended object encoder for finer-grained object features and a spatially attended grid encoder for encoding spatially distributed global features. We also propose a number-focused cross-entropy loss component to increase sensitivity to numerical tokens, and introduce CountCOCO, a dataset for structured understanding of numerical information. Experiments show that our method achieves statistically significant counting performance improvements over state-of-the-art image captioning models while maintaining similar captioning performance. Despite the significant improvement in numerical reasoning power, our proposed approach has significantly fewer parameters and lower inference latency than large-scale vision language models, demonstrating both computational efficiency and stability. NuCap is an image captioning model that can represent specific numerical information in a given image, making it more suitable for applications that require precise object enumeration, such as automated surveillance, store monitoring, and scientific documentation.-
dc.format.extent17-
dc.language영어-
dc.language.isoENG-
dc.publisherMDPI-
dc.titleNuCap: A Numerically Aware Captioning Framework for Improved Numerical Reasoning-
dc.typeArticle-
dc.publisher.location스위스-
dc.identifier.doi10.3390/app15105608-
dc.identifier.scopusid2-s2.0-105006765467-
dc.identifier.wosid001495886200001-
dc.identifier.bibliographicCitationApplied Sciences-basel, v.15, no.10, pp 1 - 17-
dc.citation.titleApplied Sciences-basel-
dc.citation.volume15-
dc.citation.number10-
dc.citation.startPage1-
dc.citation.endPage17-
dc.type.docTypeArticle-
dc.description.isOpenAccessY-
dc.description.journalRegisteredClassscie-
dc.description.journalRegisteredClassscopus-
dc.relation.journalResearchAreaChemistry-
dc.relation.journalResearchAreaEngineering-
dc.relation.journalResearchAreaMaterials Science-
dc.relation.journalResearchAreaPhysics-
dc.relation.journalWebOfScienceCategoryChemistry, Multidisciplinary-
dc.relation.journalWebOfScienceCategoryEngineering, Multidisciplinary-
dc.relation.journalWebOfScienceCategoryMaterials Science, Multidisciplinary-
dc.relation.journalWebOfScienceCategoryPhysics, Applied-
dc.subject.keywordPlusImage coding-
dc.subject.keywordPlusImage enhancement-
dc.subject.keywordPlusMulti-task learning-
dc.subject.keywordPlusPhotointerpretation-
dc.subject.keywordAuthormulti-modal learning-
dc.subject.keywordAuthorimage captioning-
dc.subject.keywordAuthornumerical reasoning-
dc.identifier.urlhttps://www.mdpi.com/2076-3417/15/10/5608-
Files in This Item
Go to Link
Appears in
Collections
서울 공과대학 > 서울 컴퓨터소프트웨어학부 > 1. Journal Articles

qrcode

Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.

Related Researcher

Researcher Choi, Yong Suk photo

Choi, Yong Suk
COLLEGE OF ENGINEERING (SCHOOL OF COMPUTER SCIENCE)
Read more

Altmetrics

Total Views & Downloads

BROWSE