Development of Language Models for Continuous Uzbek Speech Recognition System

NURALIEVICH, MUKHAMADIYEV ABDINABI; Mukhiddinov, Mukhriddin; Khujayarov, Ilyos; Ochilov, Mannon; Cho, Jinsoo

Detailed Information

Cited 2 time in webofscience

Cited 2 time in scopus

Metadata Downloads

Development of Language Models for Continuous Uzbek Speech Recognition System

Full metadata record

DC Field	Value	Language
dc.contributor.author	NURALIEVICH, MUKHAMADIYEV ABDINABI	-
dc.contributor.author	Mukhiddinov, Mukhriddin	-
dc.contributor.author	Khujayarov, Ilyos	-
dc.contributor.author	Ochilov, Mannon	-
dc.contributor.author	Cho, Jinsoo	-
dc.date.accessioned	2023-03-14T08:41:22Z	-
dc.date.available	2023-03-14T08:41:22Z	-
dc.date.issued	2023-02	-
dc.identifier.issn	1424-8220	-
dc.identifier.issn	1424-3210	-
dc.identifier.uri	https://scholarworks.bwise.kr/gachon/handle/2020.sw.gachon/87129	-
dc.description.abstract	Automatic speech recognition systems with a large vocabulary and other natural language processing applications cannot operate without a language model. Most studies on pre-trained language models have focused on more popular languages such as English, Chinese, and various European languages, but there is no publicly available Uzbek speech dataset. Therefore, language models of low-resource languages need to be studied and created. The objective of this study is to address this limitation by developing a low-resource language model for the Uzbek language and understanding linguistic occurrences. We proposed the Uzbek language model named UzLM by examining the performance of statistical and neural-network-based language models that account for the unique features of the Uzbek language. Our Uzbek-specific linguistic representation allows us to construct more robust UzLM, utilizing 80 million words from various sources while using the same or fewer training words, as applied in previous studies. Roughly sixty-eight thousand different words and 15 million sentences were collected for the creation of this corpus. The experimental results of our tests on the continuous recognition of Uzbek speech show that, compared with manual encoding, the use of neural-network-based language models reduced the character error rate to 5.26%.	-
dc.language	영어	-
dc.language.iso	ENG	-
dc.publisher	MDPI	-
dc.title	Development of Language Models for Continuous Uzbek Speech Recognition System	-
dc.type	Article	-
dc.identifier.wosid	000929613900001	-
dc.identifier.doi	10.3390/s23031145	-
dc.identifier.bibliographicCitation	SENSORS, v.23, no.3	-
dc.description.isOpenAccess	Y	-
dc.identifier.scopusid	2-s2.0-85147856060	-
dc.citation.title	SENSORS	-
dc.citation.volume	23	-
dc.citation.number	3	-
dc.type.docType	Article	-
dc.publisher.location	스위스	-
dc.subject.keywordAuthor	language model	-
dc.subject.keywordAuthor	Uzbek speech	-
dc.subject.keywordAuthor	recurrent neural networks	-
dc.subject.keywordAuthor	automatic speech recognition	-
dc.subject.keywordAuthor	neural networks	-
dc.subject.keywordAuthor	character-based language models	-
dc.subject.keywordAuthor	word-based language models	-
dc.relation.journalResearchArea	Chemistry	-
dc.relation.journalResearchArea	Engineering	-
dc.relation.journalResearchArea	Instruments & Instrumentation	-
dc.relation.journalWebOfScienceCategory	Chemistry, Analytical	-
dc.relation.journalWebOfScienceCategory	Engineering, Electrical & Electronic	-
dc.relation.journalWebOfScienceCategory	Instruments & Instrumentation	-
dc.description.journalRegisteredClass	scie	-
dc.description.journalRegisteredClass	scopus	-

Files in This Item: There are no files associated with this item.

Appears in Collections: IT융합대학 > 컴퓨터공학과 > 1. Journal Articles

Show simple item record

qrcode

Related Researcher

Researcher ugli, Mukhiddinov Mukhriddin Nuriddin photo

ugli, Mukhiddinov Mukhriddin Nuriddin: College of IT Convergence (컴퓨터공학부(컴퓨터공학전공))

Read more

Altmetrics

Total Views & Downloads

STATISTICS: Total View :4,234,668; Today View :5,436

RSS_1.0 RSS_2.0 ATOM_1.0

1342, Seongnam-daero, Sujeong-gu, Seongnam-si, Gyeonggi-do, Republic of Korea(13120)031-750-5114

Certain data included herein are derived from the © Web of Science of Clarivate Analytics. All rights reserved.
You may not copy or re-distribute this material in whole or in part without the prior written consent of Clarivate Analytics.

Detailed Information

Related Researcher

Altmetrics

Total Views & Downloads

BROWSE