Detailed Information

Cited 0 time in webofscience Cited 0 time in scopus
Metadata Downloads

End-to-End Speech Endpoint Detection Utilizing Acoustic and Language Modeling Knowledge for Online Low-Latency Speech Recognition

Full metadata record
DC Field Value Language
dc.contributor.authorHwang, Inyoung-
dc.contributor.authorChang, Joon-Hyuk-
dc.date.accessioned2021-08-02T08:53:07Z-
dc.date.available2021-08-02T08:53:07Z-
dc.date.created2021-05-12-
dc.date.issued2020-08-
dc.identifier.urihttps://scholarworks.bwise.kr/hanyang/handle/2021.sw.hanyang/9009-
dc.description.abstractSpeech endpoint detection (EPD) benefits from the decoder state features (DSFs) of online automatic speech recognition (ASR) system. However, the DSFs are obtained via the ASR decoding process, which can become prohibitively expensive especially in limited-resource scenarios such as the embedded devices. To address this problem, this paper proposes a language model (LM)-based end-of-utterance (EOU) predictor, which is trained to determine the framewise probabilities of the EOU token conditioned on the previous word history obtained from the 1-best decoding hypothesis of the ASR system in an end-to-end manner without an actual decoding process in the test step. Further, a novel end-to-end EPD strategy is presented to incorporate a phonetic embedding (PE)-based acoustic modeling knowledge and the proposed EOU predictor-based language modeling knowledge into an acoustic feature embedding (AFE)-based EPD approach within the recurrent neural networks (RNN)-based EPD framework. The proposed EPD algorithm is built upon the ensemble RNNs, which are independently trained for the three parts, which are the proposed LM-based EOU predictor, AFE-based EPD, and PE-based acoustic model (AM) in accordance with each target. The ensemble RNNs are concatenated at the level of the last hidden layers and then attached into the fully-connected deep neural networks (DNN)-based EPD classifier, which is trained in accordance with the ultimate EPD target. Thereafter, they are jointly retrained at the second step of the DNN training to yield the lower endpoint error. The proposed EPD framework was evaluated in terms of the endpoint accuracy and word error rate for the CHiME-3 and large-scale ASR tasks. The experimental results turn out that the proposed EPD algorithm efficiently outperforms the conventional EPD approaches.-
dc.language영어-
dc.language.isoen-
dc.publisherIEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC-
dc.titleEnd-to-End Speech Endpoint Detection Utilizing Acoustic and Language Modeling Knowledge for Online Low-Latency Speech Recognition-
dc.typeArticle-
dc.contributor.affiliatedAuthorChang, Joon-Hyuk-
dc.identifier.doi10.1109/ACCESS.2020.3020696-
dc.identifier.scopusid2-s2.0-85091302562-
dc.identifier.wosid000570215000001-
dc.identifier.bibliographicCitationIEEE ACCESS, v.8, pp.161109 - 161123-
dc.relation.isPartOfIEEE ACCESS-
dc.citation.titleIEEE ACCESS-
dc.citation.volume8-
dc.citation.startPage161109-
dc.citation.endPage161123-
dc.type.rimsART-
dc.type.docTypeArticle-
dc.description.journalClass1-
dc.description.isOpenAccessY-
dc.description.journalRegisteredClassscie-
dc.description.journalRegisteredClassscopus-
dc.relation.journalResearchAreaComputer Science-
dc.relation.journalResearchAreaEngineering-
dc.relation.journalResearchAreaTelecommunications-
dc.relation.journalWebOfScienceCategoryComputer Science, Information Systems-
dc.relation.journalWebOfScienceCategoryEngineering, Electrical & Electronic-
dc.relation.journalWebOfScienceCategoryTelecommunications-
dc.subject.keywordPlusComputational linguistics-
dc.subject.keywordPlusDecoding-
dc.subject.keywordPlusDeep neural networks-
dc.subject.keywordPlusEmbeddings-
dc.subject.keywordPlusModeling languages-
dc.subject.keywordPlusMultilayer neural networks-
dc.subject.keywordPlusRecurrent neural networks-
dc.subject.keywordPlusSpeech recognition-
dc.subject.keywordPlusAcoustic and language models-
dc.subject.keywordPlusAcoustic features-
dc.subject.keywordPlusDecoding process-
dc.subject.keywordPlusEmbedded device-
dc.subject.keywordPlusOn-line automatic speech recognition-
dc.subject.keywordPlusRecurrent neural network (RNN)-
dc.subject.keywordPlusSpeech endpoint detection-
dc.subject.keywordPlusWord error rate-
dc.subject.keywordAuthorFeature extraction-
dc.subject.keywordAuthorDecoding-
dc.subject.keywordAuthorAcoustics-
dc.subject.keywordAuthorSpeech recognition-
dc.subject.keywordAuthorTask analysis-
dc.subject.keywordAuthorConvolution-
dc.subject.keywordAuthorTime-frequency analysis-
dc.subject.keywordAuthorAcoustic model (AM)-
dc.subject.keywordAuthorend-of-turn detection-
dc.subject.keywordAuthorend-of-utterance (EOU) detection-
dc.subject.keywordAuthorfeature embedding-
dc.subject.keywordAuthorlanguage model (LM)-
dc.subject.keywordAuthoronline speech recognition-
dc.subject.keywordAuthorpause hesitation-
dc.subject.keywordAuthorspeech endpoint detection (EPD)-
dc.subject.keywordAuthorspoken dialogue system-
dc.identifier.urlhttps://ieeexplore.ieee.org/document/9181510-
Files in This Item
Appears in
Collections
서울 공과대학 > 서울 융합전자공학부 > 1. Journal Articles

qrcode

Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.

Related Researcher

Researcher Chang, Joon-Hyuk photo

Chang, Joon-Hyuk
COLLEGE OF ENGINEERING (SCHOOL OF ELECTRONIC ENGINEERING)
Read more

Altmetrics

Total Views & Downloads

BROWSE