Training Set Expansion Using Word Embeddings for Korean Medical Information Extraction
- Authors
- Kim, Young min
- Issue Date
- Aug-2019
- Publisher
- Springer Verlag
- Keywords
- Medical information extraction; Training set; Word embeddings; Korean
- Citation
- Lecture Notes in Computer Science, v.11721, pp 261 - 274
- Pages
- 14
- Indexed
- SCOPUS
- Journal Title
- Lecture Notes in Computer Science
- Volume
- 11721
- Start Page
- 261
- End Page
- 274
- URI
- https://scholarworks.bwise.kr/hanyang/handle/2021.sw.hanyang/147317
- DOI
- 10.1007/978-3-030-33752-0_19
- ISSN
- 0302-9743
1611-3349
- Abstract
- Entity recognition is an essential part of a task-oriented dialogue system and is considered as a sequence labeling task. However, constructing a training set in a new domain is extremely expensive and time-consuming. In this work, we propose a simple framework to exploit neural word embeddings in a semi-supervised manner to annotate medical named entities in Korean. The target domain is the automatic medical diagnosis, where disease name, symptom, and body part are defined as the entity types. Different aspects of the word embeddings such as embedding dimension, window size, models are examined to investigate their effects on the final performance. An online medical QA data has been used for the experiments. With a limit number of pre-annotated words, our framework could successfully expand the training set.
- Files in This Item
-
Go to Link
- Appears in
Collections - 서울 산업융합학부 > 서울 산업융합학부 > 1. Journal Articles

Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.