Development of Language Models for Continuous Uzbek Speech Recognition System
DC Field | Value | Language |
---|---|---|
dc.contributor.author | NURALIEVICH, MUKHAMADIYEV ABDINABI | - |
dc.contributor.author | Mukhiddinov, Mukhriddin | - |
dc.contributor.author | Khujayarov, Ilyos | - |
dc.contributor.author | Ochilov, Mannon | - |
dc.contributor.author | Cho, Jinsoo | - |
dc.date.accessioned | 2023-03-14T08:41:22Z | - |
dc.date.available | 2023-03-14T08:41:22Z | - |
dc.date.issued | 2023-02 | - |
dc.identifier.issn | 1424-8220 | - |
dc.identifier.issn | 1424-3210 | - |
dc.identifier.uri | https://scholarworks.bwise.kr/gachon/handle/2020.sw.gachon/87129 | - |
dc.description.abstract | Automatic speech recognition systems with a large vocabulary and other natural language processing applications cannot operate without a language model. Most studies on pre-trained language models have focused on more popular languages such as English, Chinese, and various European languages, but there is no publicly available Uzbek speech dataset. Therefore, language models of low-resource languages need to be studied and created. The objective of this study is to address this limitation by developing a low-resource language model for the Uzbek language and understanding linguistic occurrences. We proposed the Uzbek language model named UzLM by examining the performance of statistical and neural-network-based language models that account for the unique features of the Uzbek language. Our Uzbek-specific linguistic representation allows us to construct more robust UzLM, utilizing 80 million words from various sources while using the same or fewer training words, as applied in previous studies. Roughly sixty-eight thousand different words and 15 million sentences were collected for the creation of this corpus. The experimental results of our tests on the continuous recognition of Uzbek speech show that, compared with manual encoding, the use of neural-network-based language models reduced the character error rate to 5.26%. | - |
dc.language | 영어 | - |
dc.language.iso | ENG | - |
dc.publisher | MDPI | - |
dc.title | Development of Language Models for Continuous Uzbek Speech Recognition System | - |
dc.type | Article | - |
dc.identifier.wosid | 000929613900001 | - |
dc.identifier.doi | 10.3390/s23031145 | - |
dc.identifier.bibliographicCitation | SENSORS, v.23, no.3 | - |
dc.description.isOpenAccess | Y | - |
dc.identifier.scopusid | 2-s2.0-85147856060 | - |
dc.citation.title | SENSORS | - |
dc.citation.volume | 23 | - |
dc.citation.number | 3 | - |
dc.type.docType | Article | - |
dc.publisher.location | 스위스 | - |
dc.subject.keywordAuthor | language model | - |
dc.subject.keywordAuthor | Uzbek speech | - |
dc.subject.keywordAuthor | recurrent neural networks | - |
dc.subject.keywordAuthor | automatic speech recognition | - |
dc.subject.keywordAuthor | neural networks | - |
dc.subject.keywordAuthor | character-based language models | - |
dc.subject.keywordAuthor | word-based language models | - |
dc.relation.journalResearchArea | Chemistry | - |
dc.relation.journalResearchArea | Engineering | - |
dc.relation.journalResearchArea | Instruments & Instrumentation | - |
dc.relation.journalWebOfScienceCategory | Chemistry, Analytical | - |
dc.relation.journalWebOfScienceCategory | Engineering, Electrical & Electronic | - |
dc.relation.journalWebOfScienceCategory | Instruments & Instrumentation | - |
dc.description.journalRegisteredClass | scie | - |
dc.description.journalRegisteredClass | scopus | - |
Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.
1342, Seongnam-daero, Sujeong-gu, Seongnam-si, Gyeonggi-do, Republic of Korea(13120)031-750-5114
COPYRIGHT 2020 Gachon University All Rights Reserved.
Certain data included herein are derived from the © Web of Science of Clarivate Analytics. All rights reserved.
You may not copy or re-distribute this material in whole or in part without the prior written consent of Clarivate Analytics.