Attention-based latent features for jointly trained end-to-end automatic speech recognition with modified speech enhancement

Yang, Da-Hee; Chang, Joon-Hyuk

doi:10.1016/j.jksuci.2023.02.007

Detailed Information

Cited 0 time in webofscience

Cited 0 time in scopus

Metadata Downloads

Attention-based latent features for jointly trained end-to-end automatic speech recognition with modified speech enhancement

Full metadata record

DC Field	Value	Language
dc.contributor.author	Yang, Da-Hee	-
dc.contributor.author	Chang, Joon-Hyuk	-
dc.date.accessioned	2023-10-10T02:30:23Z	-
dc.date.available	2023-10-10T02:30:23Z	-
dc.date.created	2023-04-06	-
dc.date.issued	2023-03	-
dc.identifier.issn	1319-1578	-
dc.identifier.uri	https://scholarworks.bwise.kr/hanyang/handle/2021.sw.hanyang/191747	-
dc.description.abstract	In this paper, we propose a joint training framework that efficiently combines time-domain speech enhancement (SE) with an end-to-end (E2E) automatic speech recognition (ASR) system utilizing attention-based latent features. Using the latent feature to train E2E ASR implies that various time-domain SE models can be applied for noise-robust ASR and our modified framework is the first approach. We implement a fully E2E scheme pipelined from SE to ASR without domain knowledge and short-time Fourier transform (STFT) consistency constraints by applying a time-domain SE model. Therefore, using the latent feature of time-domain SE as appropriate features for ASR inputs is the main approach in our framework. Furthermore, we apply an attention algorithm to the time-domain SE model to selectively concentrate on certain latent features to achieve the better relevant feature for the task. Detailed experiments are conducted on the hybrid CTC/attention architecture for E2E ASR, and we demonstrate the superiority of our approach compared to baseline ASR systems trained with Mel filter bank coefficients features as input. Compared to the baseline ASR model trained only on clean data, the proposed joint training method achieves 63.6% and 86.8% relative error reductions on the TIMIT and WSJ “matched” test set, respectively.	-
dc.language	영어	-
dc.language.iso	en	-
dc.publisher	King Saud bin Abdulaziz University	-
dc.title	Attention-based latent features for jointly trained end-to-end automatic speech recognition with modified speech enhancement	-
dc.type	Article	-
dc.contributor.affiliatedAuthor	Chang, Joon-Hyuk	-
dc.identifier.doi	10.1016/j.jksuci.2023.02.007	-
dc.identifier.scopusid	2-s2.0-85149206123	-
dc.identifier.wosid	000991156900001	-
dc.identifier.bibliographicCitation	Journal of King Saud University - Computer and Information Sciences, v.35, no.3, pp.202 - 210	-
dc.relation.isPartOf	Journal of King Saud University - Computer and Information Sciences	-
dc.citation.title	Journal of King Saud University - Computer and Information Sciences	-
dc.citation.volume	35	-
dc.citation.number	3	-
dc.citation.startPage	202	-
dc.citation.endPage	210	-
dc.type.rims	ART	-
dc.type.docType	Article	-
dc.description.journalClass	1	-
dc.description.isOpenAccess	Y	-
dc.description.journalRegisteredClass	scie	-
dc.description.journalRegisteredClass	scopus	-
dc.relation.journalResearchArea	Computer Science	-
dc.relation.journalWebOfScienceCategory	Computer Science, Information Systems	-
dc.subject.keywordPlus	OPTIMIZATION	-
dc.subject.keywordPlus	FRAMEWORK	-
dc.subject.keywordPlus	NOISE	-
dc.subject.keywordPlus	CNN	-
dc.subject.keywordAuthor	Time -domain speech enhancement	-
dc.subject.keywordAuthor	End -to -end automatic speech recognition	-
dc.subject.keywordAuthor	Attention -based latent feature	-
dc.subject.keywordAuthor	Joint training framework	-
dc.identifier.url	https://www.sciencedirect.com/science/article/pii/S1319157823000368?via%3Dihub	-

Files in This Item

1-s2.0-S1319157823000368-main.pdf 1.07 MB

Appears in Collections: 서울 공과대학 > 서울 융합전자공학부 > 1. Journal Articles

Show simple item record

qrcode

Related Researcher

Researcher Chang, Joon-Hyuk photo

Chang, Joon-Hyuk: COLLEGE OF ENGINEERING (SCHOOL OF ELECTRONIC ENGINEERING)

Read more

Altmetrics

Total Views & Downloads

RSS_1.0 RSS_2.0 ATOM_1.0

222, Wangsimni-ro, Seongdong-gu, Seoul, 04763, Korea+82-2-2220-1365

Certain data included herein are derived from the © Web of Science of Clarivate Analytics. All rights reserved.
You may not copy or re-distribute this material in whole or in part without the prior written consent of Clarivate Analytics.

Detailed Information

Related Researcher

Altmetrics

Total Views & Downloads

BROWSE