Bi-directional Maximal Matching Algorithm to Segment Khmer Words in Sentence
DC Field | Value | Language |
---|---|---|
dc.contributor.author | Makara Mao | - |
dc.contributor.author | Sony Peng | - |
dc.contributor.author | Yixuan Yang | - |
dc.contributor.author | 박두순 | - |
dc.date.accessioned | 2022-10-05T01:41:14Z | - |
dc.date.available | 2022-10-05T01:41:14Z | - |
dc.date.issued | 2022-08 | - |
dc.identifier.issn | 1976-913X | - |
dc.identifier.issn | 2092-805X | - |
dc.identifier.uri | https://scholarworks.bwise.kr/sch/handle/2021.sw.sch/21466 | - |
dc.description.abstract | In the Khmer writing system, the Khmer script is the official letter of Cambodia, written from left to rightwithout a space separator; it is complicated and requires more analysis studies. Without clear standardguidelines, a space separator in the Khmer language is used inconsistently and informally to separate words insentences. Therefore, a segmented method should be discussed with the combination of the future Khmernatural language processing (NLP) to define the appropriate rule for Khmer sentences. The critical process inNLP with the capability of extensive data language analysis necessitates applying in this scenario. One of theessential components in Khmer language processing is how to split the word into a series of sentences andcount the words used in the sentences. Currently, Microsoft Word cannot count Khmer words correctly. So,this study presents a systematic library to segment Khmer phrases using the bi-directional maximal matching(BiMM) method to address these problematic constraints. In the BiMM algorithm, the paper focuses on the Bidirectionalimplementation of forward maximal matching (FMM) and backward maximal matching (BMM) toimprove word segmentation accuracy. A digital or prefix tree of data structure algorithm, also known as a trie,enhances the segmentation accuracy procedure by finding the children of each word parent node. The accuracyof BiMM is higher than using FMM or BMM independently; moreover, the proposed approach improvesdictionary structures and reduces the number of errors. The result of this study can reduce the error by 8.57%compared to FMM and BFF algorithms with 94,807 Khmer words. | - |
dc.format.extent | 13 | - |
dc.language | 영어 | - |
dc.language.iso | ENG | - |
dc.publisher | 한국정보처리학회 | - |
dc.title | Bi-directional Maximal Matching Algorithm to Segment Khmer Words in Sentence | - |
dc.title.alternative | Bi-directional Maximal Matching Algorithm to Segment Khmer Words in Sentence | - |
dc.type | Article | - |
dc.publisher.location | 대한민국 | - |
dc.identifier.doi | 10.3745/JIPS.04.0250 | - |
dc.identifier.scopusid | 2-s2.0-85138007332 | - |
dc.identifier.wosid | 000862172900009 | - |
dc.identifier.bibliographicCitation | JIPS(Journal of Information Processing Systems), v.18, no.4, pp 549 - 561 | - |
dc.citation.title | JIPS(Journal of Information Processing Systems) | - |
dc.citation.volume | 18 | - |
dc.citation.number | 4 | - |
dc.citation.startPage | 549 | - |
dc.citation.endPage | 561 | - |
dc.type.docType | Article | - |
dc.identifier.kciid | ART002876656 | - |
dc.description.isOpenAccess | N | - |
dc.description.journalRegisteredClass | scopus | - |
dc.description.journalRegisteredClass | esci | - |
dc.description.journalRegisteredClass | kci | - |
dc.relation.journalResearchArea | Computer Science | - |
dc.relation.journalWebOfScienceCategory | Computer Science, Information Systems | - |
dc.subject.keywordAuthor | Bi-directional Maximal Matching | - |
dc.subject.keywordAuthor | Khmer Language | - |
dc.subject.keywordAuthor | Natural Language Processing | - |
dc.subject.keywordAuthor | Word Corpus | - |
dc.subject.keywordAuthor | Word Segmentation | - |
Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.
(31538) 22, Soonchunhyang-ro, Asan-si, Chungcheongnam-do, Republic of Korea+82-41-530-1114
COPYRIGHT 2021 by SOONCHUNHYANG UNIVERSITY ALL RIGHTS RESERVED.
Certain data included herein are derived from the © Web of Science of Clarivate Analytics. All rights reserved.
You may not copy or re-distribute this material in whole or in part without the prior written consent of Clarivate Analytics.