NuCap: A Numerically Aware Captioning Framework for Improved Numerical Reasoningopen access
- Authors
- Jeong, Yuna; Choi, Yongsuk
- Issue Date
- May-2025
- Publisher
- MDPI
- Keywords
- multi-modal learning; image captioning; numerical reasoning
- Citation
- Applied Sciences-basel, v.15, no.10, pp 1 - 17
- Pages
- 17
- Indexed
- SCIE
SCOPUS
- Journal Title
- Applied Sciences-basel
- Volume
- 15
- Number
- 10
- Start Page
- 1
- End Page
- 17
- URI
- https://scholarworks.bwise.kr/hanyang/handle/2021.sw.hanyang/207611
- DOI
- 10.3390/app15105608
- ISSN
- 2076-3417
2076-3417
- Abstract
- Despite advances in image captioning, existing models struggle to generate captions that include accurate numerical information, especially the number of objects. One reason for this issue is that the dataset used for training has a limited number of samples with numerical information about the image. To address this issue, we propose a new framework, the Numerically Aware Captioning (NuCap) model, to enhance numerical reasoning in caption generation. We extract dual features by combining a region-attended object encoder for finer-grained object features and a spatially attended grid encoder for encoding spatially distributed global features. We also propose a number-focused cross-entropy loss component to increase sensitivity to numerical tokens, and introduce CountCOCO, a dataset for structured understanding of numerical information. Experiments show that our method achieves statistically significant counting performance improvements over state-of-the-art image captioning models while maintaining similar captioning performance. Despite the significant improvement in numerical reasoning power, our proposed approach has significantly fewer parameters and lower inference latency than large-scale vision language models, demonstrating both computational efficiency and stability. NuCap is an image captioning model that can represent specific numerical information in a given image, making it more suitable for applications that require precise object enumeration, such as automated surveillance, store monitoring, and scientific documentation.
- Files in This Item
-
Go to Link
- Appears in
Collections - 서울 공과대학 > 서울 컴퓨터소프트웨어학부 > 1. Journal Articles

Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.