End-to-End Speech Endpoint Detection Utilizing Acoustic and Language Modeling Knowledge for Online Low-Latency Speech Recognition

Hwang, Inyoung; Chang, Joon-Hyuk

doi:10.1109/ACCESS.2020.3020696

Detailed Information

Cited 0 time in webofscience

Cited 0 time in scopus

Metadata Downloads

End-to-End Speech Endpoint Detection Utilizing Acoustic and Language Modeling Knowledge for Online Low-Latency Speech Recognition

Full metadata record

DC Field	Value	Language
dc.contributor.author	Hwang, Inyoung	-
dc.contributor.author	Chang, Joon-Hyuk	-
dc.date.accessioned	2021-08-02T08:53:07Z	-
dc.date.available	2021-08-02T08:53:07Z	-
dc.date.created	2021-05-12	-
dc.date.issued	2020-08	-
dc.identifier.uri	https://scholarworks.bwise.kr/hanyang/handle/2021.sw.hanyang/9009	-
dc.description.abstract	Speech endpoint detection (EPD) benefits from the decoder state features (DSFs) of online automatic speech recognition (ASR) system. However, the DSFs are obtained via the ASR decoding process, which can become prohibitively expensive especially in limited-resource scenarios such as the embedded devices. To address this problem, this paper proposes a language model (LM)-based end-of-utterance (EOU) predictor, which is trained to determine the framewise probabilities of the EOU token conditioned on the previous word history obtained from the 1-best decoding hypothesis of the ASR system in an end-to-end manner without an actual decoding process in the test step. Further, a novel end-to-end EPD strategy is presented to incorporate a phonetic embedding (PE)-based acoustic modeling knowledge and the proposed EOU predictor-based language modeling knowledge into an acoustic feature embedding (AFE)-based EPD approach within the recurrent neural networks (RNN)-based EPD framework. The proposed EPD algorithm is built upon the ensemble RNNs, which are independently trained for the three parts, which are the proposed LM-based EOU predictor, AFE-based EPD, and PE-based acoustic model (AM) in accordance with each target. The ensemble RNNs are concatenated at the level of the last hidden layers and then attached into the fully-connected deep neural networks (DNN)-based EPD classifier, which is trained in accordance with the ultimate EPD target. Thereafter, they are jointly retrained at the second step of the DNN training to yield the lower endpoint error. The proposed EPD framework was evaluated in terms of the endpoint accuracy and word error rate for the CHiME-3 and large-scale ASR tasks. The experimental results turn out that the proposed EPD algorithm efficiently outperforms the conventional EPD approaches.	-
dc.language	영어	-
dc.language.iso	en	-
dc.publisher	IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC	-
dc.title	End-to-End Speech Endpoint Detection Utilizing Acoustic and Language Modeling Knowledge for Online Low-Latency Speech Recognition	-
dc.type	Article	-
dc.contributor.affiliatedAuthor	Chang, Joon-Hyuk	-
dc.identifier.doi	10.1109/ACCESS.2020.3020696	-
dc.identifier.scopusid	2-s2.0-85091302562	-
dc.identifier.wosid	000570215000001	-
dc.identifier.bibliographicCitation	IEEE ACCESS, v.8, pp.161109 - 161123	-
dc.relation.isPartOf	IEEE ACCESS	-
dc.citation.title	IEEE ACCESS	-
dc.citation.volume	8	-
dc.citation.startPage	161109	-
dc.citation.endPage	161123	-
dc.type.rims	ART	-
dc.type.docType	Article	-
dc.description.journalClass	1	-
dc.description.isOpenAccess	Y	-
dc.description.journalRegisteredClass	scie	-
dc.description.journalRegisteredClass	scopus	-
dc.relation.journalResearchArea	Computer Science	-
dc.relation.journalResearchArea	Engineering	-
dc.relation.journalResearchArea	Telecommunications	-
dc.relation.journalWebOfScienceCategory	Computer Science, Information Systems	-
dc.relation.journalWebOfScienceCategory	Engineering, Electrical & Electronic	-
dc.relation.journalWebOfScienceCategory	Telecommunications	-
dc.subject.keywordPlus	Computational linguistics	-
dc.subject.keywordPlus	Decoding	-
dc.subject.keywordPlus	Deep neural networks	-
dc.subject.keywordPlus	Embeddings	-
dc.subject.keywordPlus	Modeling languages	-
dc.subject.keywordPlus	Multilayer neural networks	-
dc.subject.keywordPlus	Recurrent neural networks	-
dc.subject.keywordPlus	Speech recognition	-
dc.subject.keywordPlus	Acoustic and language models	-
dc.subject.keywordPlus	Acoustic features	-
dc.subject.keywordPlus	Decoding process	-
dc.subject.keywordPlus	Embedded device	-
dc.subject.keywordPlus	On-line automatic speech recognition	-
dc.subject.keywordPlus	Recurrent neural network (RNN)	-
dc.subject.keywordPlus	Speech endpoint detection	-
dc.subject.keywordPlus	Word error rate	-
dc.subject.keywordAuthor	Feature extraction	-
dc.subject.keywordAuthor	Decoding	-
dc.subject.keywordAuthor	Acoustics	-
dc.subject.keywordAuthor	Speech recognition	-
dc.subject.keywordAuthor	Task analysis	-
dc.subject.keywordAuthor	Convolution	-
dc.subject.keywordAuthor	Time-frequency analysis	-
dc.subject.keywordAuthor	Acoustic model (AM)	-
dc.subject.keywordAuthor	end-of-turn detection	-
dc.subject.keywordAuthor	end-of-utterance (EOU) detection	-
dc.subject.keywordAuthor	feature embedding	-
dc.subject.keywordAuthor	language model (LM)	-
dc.subject.keywordAuthor	online speech recognition	-
dc.subject.keywordAuthor	pause hesitation	-
dc.subject.keywordAuthor	speech endpoint detection (EPD)	-
dc.subject.keywordAuthor	spoken dialogue system	-
dc.identifier.url	https://ieeexplore.ieee.org/document/9181510	-

Files in This Item

End-to-End_Speech_Endpoint_Detection_Utilizing_Acoustic_and_Language_Modeling_Knowledge_for_Online_Low-Latency_Speech_Recognition.pdf 7.38 MB

Appears in Collections: 서울 공과대학 > 서울 융합전자공학부 > 1. Journal Articles

Show simple item record

qrcode

Related Researcher

Researcher Chang, Joon-Hyuk photo

Chang, Joon-Hyuk: COLLEGE OF ENGINEERING (SCHOOL OF ELECTRONIC ENGINEERING)

Read more

Altmetrics

Total Views & Downloads

RSS_1.0 RSS_2.0 ATOM_1.0

222, Wangsimni-ro, Seongdong-gu, Seoul, 04763, Korea+82-2-2220-1365

Certain data included herein are derived from the © Web of Science of Clarivate Analytics. All rights reserved.
You may not copy or re-distribute this material in whole or in part without the prior written consent of Clarivate Analytics.

Detailed Information

Related Researcher

Altmetrics

Total Views & Downloads

BROWSE