Visual speech recognition of Korean words using convolutional neural network
DC Field | Value | Language |
---|---|---|
dc.contributor.author | Lee, S.-W. | - |
dc.contributor.author | Yu, J.-H. | - |
dc.contributor.author | Park, S.M. | - |
dc.contributor.author | Sim, K.-B. | - |
dc.date.available | 2019-06-26T01:37:28Z | - |
dc.date.issued | 2019-03 | - |
dc.identifier.issn | 1598-2645 | - |
dc.identifier.issn | 2093-744X | - |
dc.identifier.uri | https://scholarworks.bwise.kr/cau/handle/2019.sw.cau/26394 | - |
dc.description.abstract | In recent studies, speech recognition performance is greatly improved by using HMM and CNN. HMM is studying statistical modeling of voice to construct an acoustic model and to reduce the error rate by predicting voice through image of mouth region using CNN. In this paper, we propose visual speech recognition (VSR) using lip images. To implement VSR, we repeatedly recorded three subjects speaking 53 words chosen from an emergency medical service vocabulary book. To extract images of consonants, vowels, and final consonants in the recorded video, audio signals were used. The Viola-Jones algorithm was used for lip tracking on the extracted images. The lip tracking images were grouped and then classified using CNNs. To classify the components of a syllable including consonants, vowels, and final consonants, the structure of the CNN used VGG-s and modified LeNet-5, which has more layers. All syllable components were classified, and then the word was found by the Euclidean distance. From this experiment, a classification rate of 72.327% using 318 total testing words was obtained when VGG-s was used. When LeNet-5 applied this classifier for words, however, the classification rate was 22.327%. © The Korean Institute of Intelligent Systems. | - |
dc.format.extent | 9 | - |
dc.language | 영어 | - |
dc.language.iso | ENG | - |
dc.publisher | Korean Institute of Intelligent Systems | - |
dc.title | Visual speech recognition of Korean words using convolutional neural network | - |
dc.type | Article | - |
dc.identifier.doi | 10.5391/IJFIS.2019.19.1.1 | - |
dc.identifier.bibliographicCitation | International Journal of Fuzzy Logic and Intelligent Systems, v.19, no.1, pp 1 - 9 | - |
dc.identifier.kciid | ART002448465 | - |
dc.description.isOpenAccess | N | - |
dc.identifier.wosid | 000473311100001 | - |
dc.identifier.scopusid | 2-s2.0-85065086404 | - |
dc.citation.endPage | 9 | - |
dc.citation.number | 1 | - |
dc.citation.startPage | 1 | - |
dc.citation.title | International Journal of Fuzzy Logic and Intelligent Systems | - |
dc.citation.volume | 19 | - |
dc.type.docType | Article | - |
dc.publisher.location | 대한민국 | - |
dc.subject.keywordAuthor | Convolutional neural network | - |
dc.subject.keywordAuthor | Human-robot interaction | - |
dc.subject.keywordAuthor | Korean word recognition | - |
dc.subject.keywordAuthor | Viola-Jones algorithm | - |
dc.subject.keywordAuthor | Visual speech recognition | - |
dc.relation.journalResearchArea | Computer Science | - |
dc.relation.journalWebOfScienceCategory | Computer Science, Theory & Methods | - |
dc.description.journalRegisteredClass | scopus | - |
dc.description.journalRegisteredClass | esci | - |
dc.description.journalRegisteredClass | kci | - |
Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.
84, Heukseok-ro, Dongjak-gu, Seoul, Republic of Korea (06974)02-820-6194
COPYRIGHT 2019 Chung-Ang University All Rights Reserved.
Certain data included herein are derived from the © Web of Science of Clarivate Analytics. All rights reserved.
You may not copy or re-distribute this material in whole or in part without the prior written consent of Clarivate Analytics.