Bridging the Language Gap: Domain-Specific Dataset Construction for Medical LLMs

Kim, Chae Yeon; Kim, Song Yeon; Cho, Seung Hwan; Kim, Young-Min

doi:10.1007/978-981-97-6125-8_11

Detailed Information

Cited 0 time in webofscience

Cited 0 time in scopus

Metadata Downloads

Bridging the Language Gap: Domain-Specific Dataset Construction for Medical LLMs

Full metadata record

DC Field	Value	Language
dc.contributor.author	Kim, Chae Yeon	-
dc.contributor.author	Kim, Song Yeon	-
dc.contributor.author	Cho, Seung Hwan	-
dc.contributor.author	Kim, Young-Min	-
dc.date.accessioned	2024-11-28T18:31:26Z	-
dc.date.available	2024-11-28T18:31:26Z	-
dc.date.issued	2024-08	-
dc.identifier.issn	1865-0929	-
dc.identifier.issn	1865-0937	-
dc.identifier.uri	https://scholarworks.bwise.kr/hanyang/handle/2021.sw.hanyang/197962	-
dc.description.abstract	The advent of large language models (LLMs) has transformed the field of natural language processing (NLP), demonstrating impressive capabilities across a variety of tasks such as text generation, translation, and question answering. However, their effectiveness in specialized domains is constrained by the lack of domain-specific data. This paper presents an effective methodology for constructing domain-specific datasets using domain-specific corpora, thus overcoming the challenges posed by linguistic and cultural differences in non-English speaking regions. By leveraging mining techniques, this methodology facilitates the construction of datasets tailored to local languages and cultures. A Korean medical corpus served as the foundation for dataset construction, leading to the development of a medical language model that demonstrated high performance and versatility across various NLP tasks. A bidirectional encoder representation from transformer-based comparative analysis revealed comparable performance. The objective is to streamline LLM applications across diverse domains, thereby enhancing language model efficiency. In the future, our efforts will be directed towards implementing the proposed methodology across diverse domains and investigating strategies for extracting domain-specific tasks and vocabulary to enhance the quality of domain datasets.	-
dc.format.extent	13	-
dc.language	영어	-
dc.language.iso	ENG	-
dc.publisher	Springer Verlag	-
dc.title	Bridging the Language Gap: Domain-Specific Dataset Construction for Medical LLMs	-
dc.type	Article	-
dc.publisher.location	독일	-
dc.identifier.doi	10.1007/978-981-97-6125-8_11	-
dc.identifier.scopusid	2-s2.0-85200756998	-
dc.identifier.wosid	001317373400011	-
dc.identifier.bibliographicCitation	Communications in Computer and Information Science, v.2160, pp 134 - 146	-
dc.citation.title	Communications in Computer and Information Science	-
dc.citation.volume	2160	-
dc.citation.startPage	134	-
dc.citation.endPage	146	-
dc.type.docType	Proceedings Paper	-
dc.description.isOpenAccess	N	-
dc.description.journalRegisteredClass	scopus	-
dc.relation.journalResearchArea	Computer Science	-
dc.relation.journalWebOfScienceCategory	Computer Science, Artificial Intelligence	-
dc.relation.journalWebOfScienceCategory	Computer Science, Interdisciplinary Applications	-
dc.relation.journalWebOfScienceCategory	Computer Science, Theory & Methods	-
dc.subject.keywordPlus	Computational linguistics	-
dc.subject.keywordPlus	Data mining	-
dc.subject.keywordPlus	Large datasets	-
dc.subject.keywordAuthor	Large Language Model	-
dc.subject.keywordAuthor	Mining	-
dc.subject.keywordAuthor	Domain Dataset	-
dc.identifier.url	https://link.springer.com/chapter/10.1007/978-981-97-6125-8_11	-

Files in This Item: Go to Link

Appears in Collections: 서울 산업융합학부 > 서울 산업융합학부 > 1. Journal Articles

Show simple item record

qrcode

Related Researcher

Researcher Kim, Young min photo

Kim, Young min: 서울 산업융합학부

Read more

Altmetrics

Total Views & Downloads

RSS_1.0 RSS_2.0 ATOM_1.0

222, Wangsimni-ro, Seongdong-gu, Seoul, 04763, Korea+82-2-2220-1366

Certain data included herein are derived from the © Web of Science of Clarivate Analytics. All rights reserved.
You may not copy or re-distribute this material in whole or in part without the prior written consent of Clarivate Analytics.

Detailed Information

Related Researcher

Altmetrics

Total Views & Downloads

BROWSE