Detailed Information

Cited 0 time in webofscience Cited 0 time in scopus
Metadata Downloads

Bridging the Language Gap: Domain-Specific Dataset Construction for Medical LLMs

Full metadata record
DC Field Value Language
dc.contributor.authorKim, Chae Yeon-
dc.contributor.authorKim, Song Yeon-
dc.contributor.authorCho, Seung Hwan-
dc.contributor.authorKim, Young-Min-
dc.date.accessioned2024-11-28T18:31:26Z-
dc.date.available2024-11-28T18:31:26Z-
dc.date.issued2024-08-
dc.identifier.issn1865-0929-
dc.identifier.issn1865-0937-
dc.identifier.urihttps://scholarworks.bwise.kr/hanyang/handle/2021.sw.hanyang/197962-
dc.description.abstractThe advent of large language models (LLMs) has transformed the field of natural language processing (NLP), demonstrating impressive capabilities across a variety of tasks such as text generation, translation, and question answering. However, their effectiveness in specialized domains is constrained by the lack of domain-specific data. This paper presents an effective methodology for constructing domain-specific datasets using domain-specific corpora, thus overcoming the challenges posed by linguistic and cultural differences in non-English speaking regions. By leveraging mining techniques, this methodology facilitates the construction of datasets tailored to local languages and cultures. A Korean medical corpus served as the foundation for dataset construction, leading to the development of a medical language model that demonstrated high performance and versatility across various NLP tasks. A bidirectional encoder representation from transformer-based comparative analysis revealed comparable performance. The objective is to streamline LLM applications across diverse domains, thereby enhancing language model efficiency. In the future, our efforts will be directed towards implementing the proposed methodology across diverse domains and investigating strategies for extracting domain-specific tasks and vocabulary to enhance the quality of domain datasets.-
dc.format.extent13-
dc.language영어-
dc.language.isoENG-
dc.publisherSpringer Verlag-
dc.titleBridging the Language Gap: Domain-Specific Dataset Construction for Medical LLMs-
dc.typeArticle-
dc.publisher.location독일-
dc.identifier.doi10.1007/978-981-97-6125-8_11-
dc.identifier.scopusid2-s2.0-85200756998-
dc.identifier.wosid001317373400011-
dc.identifier.bibliographicCitationCommunications in Computer and Information Science, v.2160, pp 134 - 146-
dc.citation.titleCommunications in Computer and Information Science-
dc.citation.volume2160-
dc.citation.startPage134-
dc.citation.endPage146-
dc.type.docTypeProceedings Paper-
dc.description.isOpenAccessN-
dc.description.journalRegisteredClassscopus-
dc.relation.journalResearchAreaComputer Science-
dc.relation.journalWebOfScienceCategoryComputer Science, Artificial Intelligence-
dc.relation.journalWebOfScienceCategoryComputer Science, Interdisciplinary Applications-
dc.relation.journalWebOfScienceCategoryComputer Science, Theory & Methods-
dc.subject.keywordPlusComputational linguistics-
dc.subject.keywordPlusData mining-
dc.subject.keywordPlusLarge datasets-
dc.subject.keywordAuthorLarge Language Model-
dc.subject.keywordAuthorMining-
dc.subject.keywordAuthorDomain Dataset-
dc.identifier.urlhttps://link.springer.com/chapter/10.1007/978-981-97-6125-8_11-
Files in This Item
Go to Link
Appears in
Collections
서울 산업융합학부 > 서울 산업융합학부 > 1. Journal Articles

qrcode

Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.

Related Researcher

Researcher Kim, Young min photo

Kim, Young min
서울 산업융합학부
Read more

Altmetrics

Total Views & Downloads

BROWSE