Detailed Information

Cited 0 time in webofscience Cited 0 time in scopus
Metadata Downloads

Attention-based latent features for jointly trained end-to-end automatic speech recognition with modified speech enhancement

Full metadata record
DC Field Value Language
dc.contributor.authorYang, Da-Hee-
dc.contributor.authorChang, Joon-Hyuk-
dc.date.accessioned2023-10-10T02:30:23Z-
dc.date.available2023-10-10T02:30:23Z-
dc.date.created2023-04-06-
dc.date.issued2023-03-
dc.identifier.issn1319-1578-
dc.identifier.urihttps://scholarworks.bwise.kr/hanyang/handle/2021.sw.hanyang/191747-
dc.description.abstractIn this paper, we propose a joint training framework that efficiently combines time-domain speech enhancement (SE) with an end-to-end (E2E) automatic speech recognition (ASR) system utilizing attention-based latent features. Using the latent feature to train E2E ASR implies that various time-domain SE models can be applied for noise-robust ASR and our modified framework is the first approach. We implement a fully E2E scheme pipelined from SE to ASR without domain knowledge and short-time Fourier transform (STFT) consistency constraints by applying a time-domain SE model. Therefore, using the latent feature of time-domain SE as appropriate features for ASR inputs is the main approach in our framework. Furthermore, we apply an attention algorithm to the time-domain SE model to selectively concentrate on certain latent features to achieve the better relevant feature for the task. Detailed experiments are conducted on the hybrid CTC/attention architecture for E2E ASR, and we demonstrate the superiority of our approach compared to baseline ASR systems trained with Mel filter bank coefficients features as input. Compared to the baseline ASR model trained only on clean data, the proposed joint training method achieves 63.6% and 86.8% relative error reductions on the TIMIT and WSJ “matched” test set, respectively.-
dc.language영어-
dc.language.isoen-
dc.publisherKing Saud bin Abdulaziz University-
dc.titleAttention-based latent features for jointly trained end-to-end automatic speech recognition with modified speech enhancement-
dc.typeArticle-
dc.contributor.affiliatedAuthorChang, Joon-Hyuk-
dc.identifier.doi10.1016/j.jksuci.2023.02.007-
dc.identifier.scopusid2-s2.0-85149206123-
dc.identifier.wosid000991156900001-
dc.identifier.bibliographicCitationJournal of King Saud University - Computer and Information Sciences, v.35, no.3, pp.202 - 210-
dc.relation.isPartOfJournal of King Saud University - Computer and Information Sciences-
dc.citation.titleJournal of King Saud University - Computer and Information Sciences-
dc.citation.volume35-
dc.citation.number3-
dc.citation.startPage202-
dc.citation.endPage210-
dc.type.rimsART-
dc.type.docTypeArticle-
dc.description.journalClass1-
dc.description.isOpenAccessY-
dc.description.journalRegisteredClassscie-
dc.description.journalRegisteredClassscopus-
dc.relation.journalResearchAreaComputer Science-
dc.relation.journalWebOfScienceCategoryComputer Science, Information Systems-
dc.subject.keywordPlusOPTIMIZATION-
dc.subject.keywordPlusFRAMEWORK-
dc.subject.keywordPlusNOISE-
dc.subject.keywordPlusCNN-
dc.subject.keywordAuthorTime -domain speech enhancement-
dc.subject.keywordAuthorEnd -to -end automatic speech recognition-
dc.subject.keywordAuthorAttention -based latent feature-
dc.subject.keywordAuthorJoint training framework-
dc.identifier.urlhttps://www.sciencedirect.com/science/article/pii/S1319157823000368?via%3Dihub-
Files in This Item
Appears in
Collections
서울 공과대학 > 서울 융합전자공학부 > 1. Journal Articles

qrcode

Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.

Related Researcher

Researcher Chang, Joon-Hyuk photo

Chang, Joon-Hyuk
COLLEGE OF ENGINEERING (SCHOOL OF ELECTRONIC ENGINEERING)
Read more

Altmetrics

Total Views & Downloads

BROWSE