Detailed Information

Cited 3 time in webofscience Cited 0 time in scopus
Metadata Downloads

Visual speech recognition of Korean words using convolutional neural network

Full metadata record
DC Field Value Language
dc.contributor.authorLee, S.-W.-
dc.contributor.authorYu, J.-H.-
dc.contributor.authorPark, S.M.-
dc.contributor.authorSim, K.-B.-
dc.date.available2019-06-26T01:37:28Z-
dc.date.issued2019-03-
dc.identifier.issn1598-2645-
dc.identifier.issn2093-744X-
dc.identifier.urihttps://scholarworks.bwise.kr/cau/handle/2019.sw.cau/26394-
dc.description.abstractIn recent studies, speech recognition performance is greatly improved by using HMM and CNN. HMM is studying statistical modeling of voice to construct an acoustic model and to reduce the error rate by predicting voice through image of mouth region using CNN. In this paper, we propose visual speech recognition (VSR) using lip images. To implement VSR, we repeatedly recorded three subjects speaking 53 words chosen from an emergency medical service vocabulary book. To extract images of consonants, vowels, and final consonants in the recorded video, audio signals were used. The Viola-Jones algorithm was used for lip tracking on the extracted images. The lip tracking images were grouped and then classified using CNNs. To classify the components of a syllable including consonants, vowels, and final consonants, the structure of the CNN used VGG-s and modified LeNet-5, which has more layers. All syllable components were classified, and then the word was found by the Euclidean distance. From this experiment, a classification rate of 72.327% using 318 total testing words was obtained when VGG-s was used. When LeNet-5 applied this classifier for words, however, the classification rate was 22.327%. © The Korean Institute of Intelligent Systems.-
dc.format.extent9-
dc.language영어-
dc.language.isoENG-
dc.publisherKorean Institute of Intelligent Systems-
dc.titleVisual speech recognition of Korean words using convolutional neural network-
dc.typeArticle-
dc.identifier.doi10.5391/IJFIS.2019.19.1.1-
dc.identifier.bibliographicCitationInternational Journal of Fuzzy Logic and Intelligent Systems, v.19, no.1, pp 1 - 9-
dc.identifier.kciidART002448465-
dc.description.isOpenAccessN-
dc.identifier.wosid000473311100001-
dc.identifier.scopusid2-s2.0-85065086404-
dc.citation.endPage9-
dc.citation.number1-
dc.citation.startPage1-
dc.citation.titleInternational Journal of Fuzzy Logic and Intelligent Systems-
dc.citation.volume19-
dc.type.docTypeArticle-
dc.publisher.location대한민국-
dc.subject.keywordAuthorConvolutional neural network-
dc.subject.keywordAuthorHuman-robot interaction-
dc.subject.keywordAuthorKorean word recognition-
dc.subject.keywordAuthorViola-Jones algorithm-
dc.subject.keywordAuthorVisual speech recognition-
dc.relation.journalResearchAreaComputer Science-
dc.relation.journalWebOfScienceCategoryComputer Science, Theory & Methods-
dc.description.journalRegisteredClassscopus-
dc.description.journalRegisteredClassesci-
dc.description.journalRegisteredClasskci-
Files in This Item
There are no files associated with this item.
Appears in
Collections
College of ICT Engineering > School of Electrical and Electronics Engineering > 1. Journal Articles

qrcode

Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.

Altmetrics

Total Views & Downloads

BROWSE