Visual speech recognition of Korean words using convolutional neural network

Lee, S.-W.; Yu, J.-H.; Park, S.M.; Sim, K.-B.

doi:10.5391/IJFIS.2019.19.1.1

Detailed Information

Cited 3 time in webofscience

Cited 0 time in scopus

Metadata Downloads

Visual speech recognition of Korean words using convolutional neural network

Full metadata record

DC Field	Value	Language
dc.contributor.author	Lee, S.-W.	-
dc.contributor.author	Yu, J.-H.	-
dc.contributor.author	Park, S.M.	-
dc.contributor.author	Sim, K.-B.	-
dc.date.available	2019-06-26T01:37:28Z	-
dc.date.issued	2019-03	-
dc.identifier.issn	1598-2645	-
dc.identifier.issn	2093-744X	-
dc.identifier.uri	https://scholarworks.bwise.kr/cau/handle/2019.sw.cau/26394	-
dc.description.abstract	In recent studies, speech recognition performance is greatly improved by using HMM and CNN. HMM is studying statistical modeling of voice to construct an acoustic model and to reduce the error rate by predicting voice through image of mouth region using CNN. In this paper, we propose visual speech recognition (VSR) using lip images. To implement VSR, we repeatedly recorded three subjects speaking 53 words chosen from an emergency medical service vocabulary book. To extract images of consonants, vowels, and final consonants in the recorded video, audio signals were used. The Viola-Jones algorithm was used for lip tracking on the extracted images. The lip tracking images were grouped and then classified using CNNs. To classify the components of a syllable including consonants, vowels, and final consonants, the structure of the CNN used VGG-s and modified LeNet-5, which has more layers. All syllable components were classified, and then the word was found by the Euclidean distance. From this experiment, a classification rate of 72.327% using 318 total testing words was obtained when VGG-s was used. When LeNet-5 applied this classifier for words, however, the classification rate was 22.327%. © The Korean Institute of Intelligent Systems.	-
dc.format.extent	9	-
dc.language	영어	-
dc.language.iso	ENG	-
dc.publisher	Korean Institute of Intelligent Systems	-
dc.title	Visual speech recognition of Korean words using convolutional neural network	-
dc.type	Article	-
dc.identifier.doi	10.5391/IJFIS.2019.19.1.1	-
dc.identifier.bibliographicCitation	International Journal of Fuzzy Logic and Intelligent Systems, v.19, no.1, pp 1 - 9	-
dc.identifier.kciid	ART002448465	-
dc.description.isOpenAccess	N	-
dc.identifier.wosid	000473311100001	-
dc.identifier.scopusid	2-s2.0-85065086404	-
dc.citation.endPage	9	-
dc.citation.number	1	-
dc.citation.startPage	1	-
dc.citation.title	International Journal of Fuzzy Logic and Intelligent Systems	-
dc.citation.volume	19	-
dc.type.docType	Article	-
dc.publisher.location	대한민국	-
dc.subject.keywordAuthor	Convolutional neural network	-
dc.subject.keywordAuthor	Human-robot interaction	-
dc.subject.keywordAuthor	Korean word recognition	-
dc.subject.keywordAuthor	Viola-Jones algorithm	-
dc.subject.keywordAuthor	Visual speech recognition	-
dc.relation.journalResearchArea	Computer Science	-
dc.relation.journalWebOfScienceCategory	Computer Science, Theory & Methods	-
dc.description.journalRegisteredClass	scopus	-
dc.description.journalRegisteredClass	esci	-
dc.description.journalRegisteredClass	kci	-

Files in This Item: There are no files associated with this item.

Appears in Collections: College of ICT Engineering > School of Electrical and Electronics Engineering > 1. Journal Articles

Show simple item record

qrcode

Altmetrics

Total Views & Downloads

STATISTICS: Total View :7,536,182; Today View :2,276

RSS_1.0 RSS_2.0 ATOM_1.0

84, Heukseok-ro, Dongjak-gu, Seoul, Republic of Korea (06974)02-820-6194

Certain data included herein are derived from the © Web of Science of Clarivate Analytics. All rights reserved.
You may not copy or re-distribute this material in whole or in part without the prior written consent of Clarivate Analytics.

Detailed Information

Altmetrics

Total Views & Downloads

BROWSE