딥러닝 기반 한국어 개체명 인식의 평가와 오류 분석 연구

유현조; 송영숙; 김민수; 윤기현; 정유남

doi:10.18855/lisoko.2021.46.3.010

Detailed Information

Cited 0 time in webofscience

Cited 0 time in scopus

Metadata Downloads

딥러닝 기반 한국어 개체명 인식의 평가와 오류 분석 연구Error Analysis and Evaluation of Deep-learning Based Korean Named Entity Recognition

Authors: 유현조; 송영숙; 김민수; 윤기현; 정유남

Issue Date: 2021

Publisher: 한국언어학회

Keywords: named entity recognition; Korean language; natural language processing; proper name; terminology

Citation: 언어, v.46, no.3, pp 803 - 828

Pages: 26

Journal Title: 언어

Volume: 46

Number: 3

Start Page: 803

End Page: 828

URI: https://scholarworks.bwise.kr/cau/handle/2019.sw.cau/62795

DOI: 10.18855/lisoko.2021.46.3.010

ISSN: 1229-4039
2734-0481

Abstract: Named entity recognition is a natural language processing task that recognizes and classifies named entities in an unstructured text. The targets of NER are not limited to typical proper names for persons, locations and organizations, but also date, time and quantity expressions and can be further expanded to names of events, animals, plants, materials and other encyclopedic entities. A real-world NER system is also expected to be tuned to process domain-specific terminologies. In this study, the researchers built and tested a BERT based Korean NER system and proposed methods for evaluation and error analysis. The study trained the system with 140K word NER corpus and evaluated with 60K test. Error types are proposed to be categorized into four classes: detection, boundary, segmentation, and labelling. Error rates are found to vary greatly from 1% to 30% between entity labels, which are grouped into the most accurate time and quantity expressions, relatively accurate proper names, and highly erroneous terminologies. We expect that the error analysis will provide insights for finding a better way of data collection and post-processing correction.

Files in This Item: There are no files associated with this item.

Appears in Collections: The Office of Research Affairs > Affiliated Research Institute > 1. Journal Articles

Show full item record

qrcode

Altmetrics

Total Views & Downloads

STATISTICS: Total View :7,459,296; Today View :9,449

RSS_1.0 RSS_2.0 ATOM_1.0

84, Heukseok-ro, Dongjak-gu, Seoul, Republic of Korea (06974)02-820-6194

Certain data included herein are derived from the © Web of Science of Clarivate Analytics. All rights reserved.
You may not copy or re-distribute this material in whole or in part without the prior written consent of Clarivate Analytics.

Detailed Information

Altmetrics

Total Views & Downloads

BROWSE