NuCap: A Numerically Aware Captioning Framework for Improved Numerical Reasoning

Jeong, Yuna; Choi, Yongsuk

doi:10.3390/app15105608

Detailed Information

Cited 0 time in webofscience

Cited 0 time in scopus

Metadata Downloads

NuCap: A Numerically Aware Captioning Framework for Improved Numerical Reasoning

Full metadata record

DC Field	Value	Language
dc.contributor.author	Jeong, Yuna	-
dc.contributor.author	Choi, Yongsuk	-
dc.date.accessioned	2025-06-18T01:00:08Z	-
dc.date.available	2025-06-18T01:00:08Z	-
dc.date.issued	2025-05	-
dc.identifier.issn	2076-3417	-
dc.identifier.issn	2076-3417	-
dc.identifier.uri	https://scholarworks.bwise.kr/hanyang/handle/2021.sw.hanyang/207611	-
dc.description.abstract	Despite advances in image captioning, existing models struggle to generate captions that include accurate numerical information, especially the number of objects. One reason for this issue is that the dataset used for training has a limited number of samples with numerical information about the image. To address this issue, we propose a new framework, the Numerically Aware Captioning (NuCap) model, to enhance numerical reasoning in caption generation. We extract dual features by combining a region-attended object encoder for finer-grained object features and a spatially attended grid encoder for encoding spatially distributed global features. We also propose a number-focused cross-entropy loss component to increase sensitivity to numerical tokens, and introduce CountCOCO, a dataset for structured understanding of numerical information. Experiments show that our method achieves statistically significant counting performance improvements over state-of-the-art image captioning models while maintaining similar captioning performance. Despite the significant improvement in numerical reasoning power, our proposed approach has significantly fewer parameters and lower inference latency than large-scale vision language models, demonstrating both computational efficiency and stability. NuCap is an image captioning model that can represent specific numerical information in a given image, making it more suitable for applications that require precise object enumeration, such as automated surveillance, store monitoring, and scientific documentation.	-
dc.format.extent	17	-
dc.language	영어	-
dc.language.iso	ENG	-
dc.publisher	MDPI	-
dc.title	NuCap: A Numerically Aware Captioning Framework for Improved Numerical Reasoning	-
dc.type	Article	-
dc.publisher.location	스위스	-
dc.identifier.doi	10.3390/app15105608	-
dc.identifier.scopusid	2-s2.0-105006765467	-
dc.identifier.wosid	001495886200001	-
dc.identifier.bibliographicCitation	Applied Sciences-basel, v.15, no.10, pp 1 - 17	-
dc.citation.title	Applied Sciences-basel	-
dc.citation.volume	15	-
dc.citation.number	10	-
dc.citation.startPage	1	-
dc.citation.endPage	17	-
dc.type.docType	Article	-
dc.description.isOpenAccess	Y	-
dc.description.journalRegisteredClass	scie	-
dc.description.journalRegisteredClass	scopus	-
dc.relation.journalResearchArea	Chemistry	-
dc.relation.journalResearchArea	Engineering	-
dc.relation.journalResearchArea	Materials Science	-
dc.relation.journalResearchArea	Physics	-
dc.relation.journalWebOfScienceCategory	Chemistry, Multidisciplinary	-
dc.relation.journalWebOfScienceCategory	Engineering, Multidisciplinary	-
dc.relation.journalWebOfScienceCategory	Materials Science, Multidisciplinary	-
dc.relation.journalWebOfScienceCategory	Physics, Applied	-
dc.subject.keywordPlus	Image coding	-
dc.subject.keywordPlus	Image enhancement	-
dc.subject.keywordPlus	Multi-task learning	-
dc.subject.keywordPlus	Photointerpretation	-
dc.subject.keywordAuthor	multi-modal learning	-
dc.subject.keywordAuthor	image captioning	-
dc.subject.keywordAuthor	numerical reasoning	-
dc.identifier.url	https://www.mdpi.com/2076-3417/15/10/5608	-

Files in This Item: Go to Link

Appears in Collections: 서울 공과대학 > 서울 컴퓨터소프트웨어학부 > 1. Journal Articles

Show simple item record

qrcode

Related Researcher

Researcher Choi, Yong Suk photo

Choi, Yong Suk: COLLEGE OF ENGINEERING (SCHOOL OF COMPUTER SCIENCE)

Read more

Altmetrics

Total Views & Downloads

RSS_1.0 RSS_2.0 ATOM_1.0

222, Wangsimni-ro, Seongdong-gu, Seoul, 04763, Korea+82-2-2220-1366

Certain data included herein are derived from the © Web of Science of Clarivate Analytics. All rights reserved.
You may not copy or re-distribute this material in whole or in part without the prior written consent of Clarivate Analytics.

Detailed Information

Related Researcher

Altmetrics

Total Views & Downloads

BROWSE