Cited 0 time in
NuCap: A Numerically Aware Captioning Framework for Improved Numerical Reasoning
| DC Field | Value | Language |
|---|---|---|
| dc.contributor.author | Jeong, Yuna | - |
| dc.contributor.author | Choi, Yongsuk | - |
| dc.date.accessioned | 2025-06-18T01:00:08Z | - |
| dc.date.available | 2025-06-18T01:00:08Z | - |
| dc.date.issued | 2025-05 | - |
| dc.identifier.issn | 2076-3417 | - |
| dc.identifier.issn | 2076-3417 | - |
| dc.identifier.uri | https://scholarworks.bwise.kr/hanyang/handle/2021.sw.hanyang/207611 | - |
| dc.description.abstract | Despite advances in image captioning, existing models struggle to generate captions that include accurate numerical information, especially the number of objects. One reason for this issue is that the dataset used for training has a limited number of samples with numerical information about the image. To address this issue, we propose a new framework, the Numerically Aware Captioning (NuCap) model, to enhance numerical reasoning in caption generation. We extract dual features by combining a region-attended object encoder for finer-grained object features and a spatially attended grid encoder for encoding spatially distributed global features. We also propose a number-focused cross-entropy loss component to increase sensitivity to numerical tokens, and introduce CountCOCO, a dataset for structured understanding of numerical information. Experiments show that our method achieves statistically significant counting performance improvements over state-of-the-art image captioning models while maintaining similar captioning performance. Despite the significant improvement in numerical reasoning power, our proposed approach has significantly fewer parameters and lower inference latency than large-scale vision language models, demonstrating both computational efficiency and stability. NuCap is an image captioning model that can represent specific numerical information in a given image, making it more suitable for applications that require precise object enumeration, such as automated surveillance, store monitoring, and scientific documentation. | - |
| dc.format.extent | 17 | - |
| dc.language | 영어 | - |
| dc.language.iso | ENG | - |
| dc.publisher | MDPI | - |
| dc.title | NuCap: A Numerically Aware Captioning Framework for Improved Numerical Reasoning | - |
| dc.type | Article | - |
| dc.publisher.location | 스위스 | - |
| dc.identifier.doi | 10.3390/app15105608 | - |
| dc.identifier.scopusid | 2-s2.0-105006765467 | - |
| dc.identifier.wosid | 001495886200001 | - |
| dc.identifier.bibliographicCitation | Applied Sciences-basel, v.15, no.10, pp 1 - 17 | - |
| dc.citation.title | Applied Sciences-basel | - |
| dc.citation.volume | 15 | - |
| dc.citation.number | 10 | - |
| dc.citation.startPage | 1 | - |
| dc.citation.endPage | 17 | - |
| dc.type.docType | Article | - |
| dc.description.isOpenAccess | Y | - |
| dc.description.journalRegisteredClass | scie | - |
| dc.description.journalRegisteredClass | scopus | - |
| dc.relation.journalResearchArea | Chemistry | - |
| dc.relation.journalResearchArea | Engineering | - |
| dc.relation.journalResearchArea | Materials Science | - |
| dc.relation.journalResearchArea | Physics | - |
| dc.relation.journalWebOfScienceCategory | Chemistry, Multidisciplinary | - |
| dc.relation.journalWebOfScienceCategory | Engineering, Multidisciplinary | - |
| dc.relation.journalWebOfScienceCategory | Materials Science, Multidisciplinary | - |
| dc.relation.journalWebOfScienceCategory | Physics, Applied | - |
| dc.subject.keywordPlus | Image coding | - |
| dc.subject.keywordPlus | Image enhancement | - |
| dc.subject.keywordPlus | Multi-task learning | - |
| dc.subject.keywordPlus | Photointerpretation | - |
| dc.subject.keywordAuthor | multi-modal learning | - |
| dc.subject.keywordAuthor | image captioning | - |
| dc.subject.keywordAuthor | numerical reasoning | - |
| dc.identifier.url | https://www.mdpi.com/2076-3417/15/10/5608 | - |
Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.
222, Wangsimni-ro, Seongdong-gu, Seoul, 04763, Korea+82-2-2220-1366
COPYRIGHT © 2024 HANYANG UNIVERSITY.
Certain data included herein are derived from the © Web of Science of Clarivate Analytics. All rights reserved.
You may not copy or re-distribute this material in whole or in part without the prior written consent of Clarivate Analytics.
