Adaptive Named Entity Recognition Using Distant Supervision for Contemporary Written Texts

Kim, Juae; Kim, Yejin; Kang, Sangwoo; Seo, Jungyun

Detailed Information

Cited 0 time in webofscience

Cited 0 time in scopus

Metadata Downloads

Adaptive Named Entity Recognition Using Distant Supervision for Contemporary Written Texts

Authors: Kim, Juae; Kim, Yejin; Kang, Sangwoo; Seo, Jungyun

Issue Date: Mar-2021

Publisher: Institute of Electrical and Electronics Engineers Inc.

Keywords: Computational and artificial intelligence; Electronic publishing; Encyclopedias; Information services; Internet; named entity recognition; natural language processing; neural networks; Task analysis; Training; Transfer learning; transfer learning; weakly supervised learning

Citation: IEEE Access, v.9, pp.80405 - 80414

Journal Title: IEEE Access

Volume: 9

Start Page: 80405

End Page: 80414

URI: https://scholarworks.bwise.kr/gachon/handle/2020.sw.gachon/81826

DOI: 10.1109/ACCESS.2021.3067315

ISSN: 2169-3536

Abstract: Named entity recognition (NER) is the process of categorizing named entities in a given text that suffers from the lack of labeled corpora, which is a long-standing issue. Deep neural networks have been successfully applied to NER tasks. However, they require a large number of annotated data. Regardless of the number of data made available, annotation requires significant human effort, which is expensive and time-consuming. Moreover, collecting labeled data that reflect contemporary surrounding statuses requires exhaustive follow-up and incurs correspondingly higher costs. Current NERs typically focus on the supervised learning of hand-crafted data. The most well-known dataset for NER shared tasks, which was released at the 2003 Conference on Natural Language Learning, is used for basic training and evaluation. Although the data are qualified, the database has low coverage of timely material. In this paper, we illustrate methods for swiftly labeling up-to-date data via distant supervision. To tackle the difficulty of annotating contemporary written texts, we generate labeled data articles that reflect the latest issues. We evaluated the proposed methods with bidirectional long short-term memory conditional random-field architecture using static and contextualized embedding methods. Our proposed models perform higher than state-of-the-art methods with average F1-scores 3.09% better with weakly labeledWikipedia data and 3.47% better with Cable News Network data. When using the NER model with Flair embedding, our method shows 1.50 and 3.26% higher F1-scores with weakly labeled Wikipedia and news data, respectively. Qualitatively, the proposed model also performs better when extracting contemporary keywords. CCBYNCND

Files in This Item: There are no files associated with this item.

Appears in Collections: IT융합대학 > 소프트웨어학과 > 1. Journal Articles

Show full item record

qrcode

Related Researcher

Researcher Kang, Sang Woo photo

Kang, Sang Woo: College of IT Convergence (Department of Software)

Read more

Altmetrics

Total Views & Downloads

RSS_1.0 RSS_2.0 ATOM_1.0

1342, Seongnam-daero, Sujeong-gu, Seongnam-si, Gyeonggi-do, Republic of Korea(13120)031-750-5114

Certain data included herein are derived from the © Web of Science of Clarivate Analytics. All rights reserved.
You may not copy or re-distribute this material in whole or in part without the prior written consent of Clarivate Analytics.

Detailed Information

Related Researcher

Altmetrics

Total Views & Downloads

BROWSE