Cross-domain Chinese Word Segmentation Based on New Word Discovery基于新词发现的跨领域中文分词方法
- Other Titles
- 基于新词发现的跨领域中文分词方法
- Authors
- Zhang, Jun; Lai, Zhipeng; Li, Xue; Ning, Gengxin; Yang, Cui
- Issue Date
- Sep-2022
- Publisher
- Zhongguo Kexueyuan
- Keywords
- Adversarial training; Chinese word segmentation; Cross-domain; New word discovery; Vector enhancement mutual information
- Citation
- Dianzi Yu Xinxi Xuebao/Journal of Electronics and Information Technology, v.44, no.9, pp 3241 - 3248
- Pages
- 8
- Indexed
- SCOPUS
ESCI
- Journal Title
- Dianzi Yu Xinxi Xuebao/Journal of Electronics and Information Technology
- Volume
- 44
- Number
- 9
- Start Page
- 3241
- End Page
- 3248
- URI
- https://scholarworks.bwise.kr/erica/handle/2021.sw.erica/115726
- DOI
- 10.11999/JEIT210675
- ISSN
- 1009-5896
- Abstract
- Deep Neural Network (DNN) is the major method in current Chinese word segmentation. However, its performance is significantly degraded when the network trained for one domain is used in other domains due to the Out Of Vocabulary (OOV) words and expression gaps. In this paper, a cross domain Chinese word segmentation system based on new word discovery is built to handle the OOV word and expression gap problems. An unsupervised new word discovery algorithm based on vector enhanced mutual information and weighted adjacency entropy, and a Chinese word segmentation model based on adversarial training are also proposed to improve the performance of the baseline system. Experimental results show that the proposed method is superior to the conventional methods in the OOV rates, precisions, recalls and F-scores. © 2022 Science Press. All rights reserved.
- Files in This Item
-
Go to Link
- Appears in
Collections - COLLEGE OF ENGINEERING SCIENCES > SCHOOL OF ELECTRICAL ENGINEERING > 1. Journal Articles
![qrcode](https://api.qrserver.com/v1/create-qr-code/?size=55x55&data=https://scholarworks.bwise.kr/erica/handle/2021.sw.erica/115726)
Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.