Detailed Information

Cited 0 time in webofscience Cited 0 time in scopus
Metadata Downloads

Adaptive Named Entity Recognition Using Distant Supervision for Contemporary Written Texts

Authors
Kim, JuaeKim, YejinKang, SangwooSeo, Jungyun
Issue Date
Mar-2021
Publisher
Institute of Electrical and Electronics Engineers Inc.
Keywords
Computational and artificial intelligence; Electronic publishing; Encyclopedias; Information services; Internet; named entity recognition; natural language processing; neural networks; Task analysis; Training; Transfer learning; transfer learning; weakly supervised learning
Citation
IEEE Access, v.9, pp.80405 - 80414
Journal Title
IEEE Access
Volume
9
Start Page
80405
End Page
80414
URI
https://scholarworks.bwise.kr/gachon/handle/2020.sw.gachon/81826
DOI
10.1109/ACCESS.2021.3067315
ISSN
2169-3536
Abstract
Named entity recognition (NER) is the process of categorizing named entities in a given text that suffers from the lack of labeled corpora, which is a long-standing issue. Deep neural networks have been successfully applied to NER tasks. However, they require a large number of annotated data. Regardless of the number of data made available, annotation requires significant human effort, which is expensive and time-consuming. Moreover, collecting labeled data that reflect contemporary surrounding statuses requires exhaustive follow-up and incurs correspondingly higher costs. Current NERs typically focus on the supervised learning of hand-crafted data. The most well-known dataset for NER shared tasks, which was released at the 2003 Conference on Natural Language Learning, is used for basic training and evaluation. Although the data are qualified, the database has low coverage of timely material. In this paper, we illustrate methods for swiftly labeling up-to-date data via distant supervision. To tackle the difficulty of annotating contemporary written texts, we generate labeled data articles that reflect the latest issues. We evaluated the proposed methods with bidirectional long short-term memory conditional random-field architecture using static and contextualized embedding methods. Our proposed models perform higher than state-of-the-art methods with average F1-scores 3.09% better with weakly labeledWikipedia data and 3.47% better with Cable News Network data. When using the NER model with Flair embedding, our method shows 1.50 and 3.26% higher F1-scores with weakly labeled Wikipedia and news data, respectively. Qualitatively, the proposed model also performs better when extracting contemporary keywords. CCBYNCND
Files in This Item
There are no files associated with this item.
Appears in
Collections
IT융합대학 > 소프트웨어학과 > 1. Journal Articles

qrcode

Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.

Related Researcher

Researcher Kang, Sang Woo photo

Kang, Sang Woo
College of IT Convergence (Department of Software)
Read more

Altmetrics

Total Views & Downloads

BROWSE