Cross-domain Chinese Word Segmentation Based on New Word Discovery
DC Field | Value | Language |
---|---|---|
dc.contributor.author | Zhang, Jun | - |
dc.contributor.author | Lai, Zhipeng | - |
dc.contributor.author | Li, Xue | - |
dc.contributor.author | Ning, Gengxin | - |
dc.contributor.author | Yang, Cui | - |
dc.date.accessioned | 2023-11-24T02:36:19Z | - |
dc.date.available | 2023-11-24T02:36:19Z | - |
dc.date.issued | 2022-09 | - |
dc.identifier.issn | 1009-5896 | - |
dc.identifier.uri | https://scholarworks.bwise.kr/erica/handle/2021.sw.erica/115726 | - |
dc.description.abstract | Deep Neural Network (DNN) is the major method in current Chinese word segmentation. However, its performance is significantly degraded when the network trained for one domain is used in other domains due to the Out Of Vocabulary (OOV) words and expression gaps. In this paper, a cross domain Chinese word segmentation system based on new word discovery is built to handle the OOV word and expression gap problems. An unsupervised new word discovery algorithm based on vector enhanced mutual information and weighted adjacency entropy, and a Chinese word segmentation model based on adversarial training are also proposed to improve the performance of the baseline system. Experimental results show that the proposed method is superior to the conventional methods in the OOV rates, precisions, recalls and F-scores. © 2022 Science Press. All rights reserved. | - |
dc.format.extent | 8 | - |
dc.language | 중국어 | - |
dc.language.iso | CHI | - |
dc.publisher | Zhongguo Kexueyuan | - |
dc.title | Cross-domain Chinese Word Segmentation Based on New Word Discovery | - |
dc.title.alternative | 基于新词发现的跨领域中文分词方法 | - |
dc.type | Article | - |
dc.publisher.location | 중국 | - |
dc.identifier.doi | 10.11999/JEIT210675 | - |
dc.identifier.scopusid | 2-s2.0-85139420044 | - |
dc.identifier.wosid | 000889371900006 | - |
dc.identifier.bibliographicCitation | Dianzi Yu Xinxi Xuebao/Journal of Electronics and Information Technology, v.44, no.9, pp 3241 - 3248 | - |
dc.citation.title | Dianzi Yu Xinxi Xuebao/Journal of Electronics and Information Technology | - |
dc.citation.volume | 44 | - |
dc.citation.number | 9 | - |
dc.citation.startPage | 3241 | - |
dc.citation.endPage | 3248 | - |
dc.type.docType | Article | - |
dc.description.isOpenAccess | N | - |
dc.description.journalRegisteredClass | scopus | - |
dc.description.journalRegisteredClass | esci | - |
dc.relation.journalResearchArea | Engineering | - |
dc.relation.journalWebOfScienceCategory | Engineering, Electrical & Electronic | - |
dc.subject.keywordAuthor | Adversarial training | - |
dc.subject.keywordAuthor | Chinese word segmentation | - |
dc.subject.keywordAuthor | Cross-domain | - |
dc.subject.keywordAuthor | New word discovery | - |
dc.subject.keywordAuthor | Vector enhancement mutual information | - |
dc.identifier.url | https://jeit.ac.cn/en/article/doi/10.11999/JEIT210675 | - |
Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.
55 Hanyangdeahak-ro, Sangnok-gu, Ansan, Gyeonggi-do, 15588, Korea+82-31-400-4269 sweetbrain@hanyang.ac.kr
COPYRIGHT © 2021 HANYANG UNIVERSITY. ALL RIGHTS RESERVED.
Certain data included herein are derived from the © Web of Science of Clarivate Analytics. All rights reserved.
You may not copy or re-distribute this material in whole or in part without the prior written consent of Clarivate Analytics.