Cited 0 time in
DA-BioNER: data augmentation based on few-shot learning and distant supervision for biomedical named entity recognition
| DC Field | Value | Language |
|---|---|---|
| dc.contributor.author | Park, Yesol | - |
| dc.contributor.author | Son, Gyujin | - |
| dc.contributor.author | Kim, Taeuk | - |
| dc.contributor.author | Rho, Mina | - |
| dc.date.accessioned | 2026-07-02T02:30:28Z | - |
| dc.date.available | 2026-07-02T02:30:28Z | - |
| dc.date.issued | 2026-06 | - |
| dc.identifier.issn | 1367-4803 | - |
| dc.identifier.issn | 1367-4811 | - |
| dc.identifier.uri | https://scholarworks.bwise.kr/hanyang/handle/2021.sw.hanyang/217789 | - |
| dc.description.abstract | Motivation Named entity recognition (NER) is a fundamental component of structured knowledge extraction, yet its effectiveness in emerging domains remains by the scarcity of high-quality, domain-specific annotated corpora. Although data augmentation and distant supervision have been explored to alleviate this issue, existing methods often introduce limited entity diversity, noisy labels, or disrupt contextual integrity, thereby limiting their generalization ability in low-resource settings.Results In this study, we propose DA-BioNER, a context-preserving data expansion framework for biomedical NER. DA-BioNER combines multiple base NER models trained on few-shot data to provide coarse annotations, followed by refinement using a large language model (LLM) guided by global biomedical knowledge. Unlike generation-based augmentation methods that synthesize new sentences, DA-BioNER performs annotation refinement within existing sentences, preserving both syntactic structure and semantic context. By constraining the role of LLM to refinement rather than open-ended generation, the framework effectively reduces hallucination while improving label precision and consistency. We evaluate DA-BioNER on three benchmark datasets (NCBI-Disease, BC5CDR, and BioRED), under low-resource conditions. In 40-shot settings, DA-BioNER achieves F1-scores of 0.750, 0.795, and 0.799, respectively, outperforming state-of-the-art methods, including LSMS, DAGA, and MELM, by up to 0.32. Under more extreme few-shot settings, DA-BioNER further improves F1-scores by up to 0.08, while generating an average of 1,391 additional unique entities, substantially enriching training diversity. These results demonstrate that DA-BioNER provides a scalable and adaptable solution for robust biomedical NER, particularly in domain adaptation and low-resource scenarios.Availability DA-BioNER is publicly available at https://github.com/DMnBI/DA-BioNER. | - |
| dc.format.extent | 14 | - |
| dc.language | 영어 | - |
| dc.language.iso | ENG | - |
| dc.publisher | OXFORD UNIV PRESS | - |
| dc.title | DA-BioNER: data augmentation based on few-shot learning and distant supervision for biomedical named entity recognition | - |
| dc.type | Article | - |
| dc.publisher.location | 영국 | - |
| dc.identifier.doi | 10.1093/bioinformatics/btag332 | - |
| dc.identifier.scopusid | 2-s2.0-105042150086 | - |
| dc.identifier.wosid | 001795206100001 | - |
| dc.identifier.bibliographicCitation | BIOINFORMATICS, v.42, no.6, pp 1 - 14 | - |
| dc.citation.title | BIOINFORMATICS | - |
| dc.citation.volume | 42 | - |
| dc.citation.number | 6 | - |
| dc.citation.startPage | 1 | - |
| dc.citation.endPage | 14 | - |
| dc.type.docType | Article | - |
| dc.description.isOpenAccess | Y | - |
| dc.description.journalRegisteredClass | scie | - |
| dc.description.journalRegisteredClass | scopus | - |
| dc.relation.journalResearchArea | Biochemistry & Molecular Biology | - |
| dc.relation.journalResearchArea | Biotechnology & Applied Microbiology | - |
| dc.relation.journalResearchArea | Computer Science | - |
| dc.relation.journalResearchArea | Mathematical & Computational Biology | - |
| dc.relation.journalResearchArea | Mathematics | - |
| dc.relation.journalWebOfScienceCategory | Biochemical Research Methods | - |
| dc.relation.journalWebOfScienceCategory | Biotechnology & Applied Microbiology | - |
| dc.relation.journalWebOfScienceCategory | Computer Science, Interdisciplinary Applications | - |
| dc.relation.journalWebOfScienceCategory | Mathematical & Computational Biology | - |
| dc.relation.journalWebOfScienceCategory | Statistics & Probability | - |
| dc.identifier.url | https://academic.oup.com/bioinformatics/article/42/6/btag332/8691815?login=true | - |
Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.
222, Wangsimni-ro, Seongdong-gu, Seoul, 04763, Korea+82-2-2220-1366
COPYRIGHT © 2024 HANYANG UNIVERSITY.
Certain data included herein are derived from the © Web of Science of Clarivate Analytics. All rights reserved.
You may not copy or re-distribute this material in whole or in part without the prior written consent of Clarivate Analytics.
