Detailed Information

Cited 0 time in webofscience Cited 0 time in scopus
Metadata Downloads

Bi-directional Maximal Matching Algorithm to Segment Khmer Words in Sentence

Full metadata record
DC Field Value Language
dc.contributor.authorMakara Mao-
dc.contributor.authorSony Peng-
dc.contributor.authorYixuan Yang-
dc.contributor.author박두순-
dc.date.accessioned2022-10-05T01:41:14Z-
dc.date.available2022-10-05T01:41:14Z-
dc.date.issued2022-08-
dc.identifier.issn1976-913X-
dc.identifier.issn2092-805X-
dc.identifier.urihttps://scholarworks.bwise.kr/sch/handle/2021.sw.sch/21466-
dc.description.abstractIn the Khmer writing system, the Khmer script is the official letter of Cambodia, written from left to rightwithout a space separator; it is complicated and requires more analysis studies. Without clear standardguidelines, a space separator in the Khmer language is used inconsistently and informally to separate words insentences. Therefore, a segmented method should be discussed with the combination of the future Khmernatural language processing (NLP) to define the appropriate rule for Khmer sentences. The critical process inNLP with the capability of extensive data language analysis necessitates applying in this scenario. One of theessential components in Khmer language processing is how to split the word into a series of sentences andcount the words used in the sentences. Currently, Microsoft Word cannot count Khmer words correctly. So,this study presents a systematic library to segment Khmer phrases using the bi-directional maximal matching(BiMM) method to address these problematic constraints. In the BiMM algorithm, the paper focuses on the Bidirectionalimplementation of forward maximal matching (FMM) and backward maximal matching (BMM) toimprove word segmentation accuracy. A digital or prefix tree of data structure algorithm, also known as a trie,enhances the segmentation accuracy procedure by finding the children of each word parent node. The accuracyof BiMM is higher than using FMM or BMM independently; moreover, the proposed approach improvesdictionary structures and reduces the number of errors. The result of this study can reduce the error by 8.57%compared to FMM and BFF algorithms with 94,807 Khmer words.-
dc.format.extent13-
dc.language영어-
dc.language.isoENG-
dc.publisher한국정보처리학회-
dc.titleBi-directional Maximal Matching Algorithm to Segment Khmer Words in Sentence-
dc.title.alternativeBi-directional Maximal Matching Algorithm to Segment Khmer Words in Sentence-
dc.typeArticle-
dc.publisher.location대한민국-
dc.identifier.doi10.3745/JIPS.04.0250-
dc.identifier.scopusid2-s2.0-85138007332-
dc.identifier.wosid000862172900009-
dc.identifier.bibliographicCitationJIPS(Journal of Information Processing Systems), v.18, no.4, pp 549 - 561-
dc.citation.titleJIPS(Journal of Information Processing Systems)-
dc.citation.volume18-
dc.citation.number4-
dc.citation.startPage549-
dc.citation.endPage561-
dc.type.docTypeArticle-
dc.identifier.kciidART002876656-
dc.description.isOpenAccessN-
dc.description.journalRegisteredClassscopus-
dc.description.journalRegisteredClassesci-
dc.description.journalRegisteredClasskci-
dc.relation.journalResearchAreaComputer Science-
dc.relation.journalWebOfScienceCategoryComputer Science, Information Systems-
dc.subject.keywordAuthorBi-directional Maximal Matching-
dc.subject.keywordAuthorKhmer Language-
dc.subject.keywordAuthorNatural Language Processing-
dc.subject.keywordAuthorWord Corpus-
dc.subject.keywordAuthorWord Segmentation-
Files in This Item
There are no files associated with this item.
Appears in
Collections
ETC > 1. Journal Articles

qrcode

Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.

Altmetrics

Total Views & Downloads

BROWSE