Detailed Information

Cited 0 time in webofscience Cited 0 time in scopus
Metadata Downloads

Attention-based latent features for jointly trained end-to-end automatic speech recognition with modified speech enhancementopen access

Authors
Yang, Da-HeeChang, Joon-Hyuk
Issue Date
Mar-2023
Publisher
King Saud bin Abdulaziz University
Keywords
Time -domain speech enhancement; End -to -end automatic speech recognition; Attention -based latent feature; Joint training framework
Citation
Journal of King Saud University - Computer and Information Sciences, v.35, no.3, pp.202 - 210
Indexed
SCIE
SCOPUS
Journal Title
Journal of King Saud University - Computer and Information Sciences
Volume
35
Number
3
Start Page
202
End Page
210
URI
https://scholarworks.bwise.kr/hanyang/handle/2021.sw.hanyang/191747
DOI
10.1016/j.jksuci.2023.02.007
ISSN
1319-1578
Abstract
In this paper, we propose a joint training framework that efficiently combines time-domain speech enhancement (SE) with an end-to-end (E2E) automatic speech recognition (ASR) system utilizing attention-based latent features. Using the latent feature to train E2E ASR implies that various time-domain SE models can be applied for noise-robust ASR and our modified framework is the first approach. We implement a fully E2E scheme pipelined from SE to ASR without domain knowledge and short-time Fourier transform (STFT) consistency constraints by applying a time-domain SE model. Therefore, using the latent feature of time-domain SE as appropriate features for ASR inputs is the main approach in our framework. Furthermore, we apply an attention algorithm to the time-domain SE model to selectively concentrate on certain latent features to achieve the better relevant feature for the task. Detailed experiments are conducted on the hybrid CTC/attention architecture for E2E ASR, and we demonstrate the superiority of our approach compared to baseline ASR systems trained with Mel filter bank coefficients features as input. Compared to the baseline ASR model trained only on clean data, the proposed joint training method achieves 63.6% and 86.8% relative error reductions on the TIMIT and WSJ “matched” test set, respectively.
Files in This Item
Appears in
Collections
서울 공과대학 > 서울 융합전자공학부 > 1. Journal Articles

qrcode

Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.

Related Researcher

Researcher Chang, Joon-Hyuk photo

Chang, Joon-Hyuk
COLLEGE OF ENGINEERING (SCHOOL OF ELECTRONIC ENGINEERING)
Read more

Altmetrics

Total Views & Downloads

BROWSE