Detailed Information

Cited 0 time in webofscience Cited 0 time in scopus
Metadata Downloads

Improving Generalization of End-to-End ASR through Diversity and Independence Regularization

Full metadata record
DC Field Value Language
dc.contributor.authorKo, Ye-Eun-
dc.contributor.authorLee, Mun-Hak-
dc.contributor.authorKim, Dong-Hyun-
dc.contributor.authorChang, Joon-Hyuk-
dc.date.accessioned2025-11-20T01:30:39Z-
dc.date.available2025-11-20T01:30:39Z-
dc.date.issued2025-08-
dc.identifier.issn2958-1796-
dc.identifier.urihttps://scholarworks.bwise.kr/hanyang/handle/2021.sw.hanyang/209225-
dc.description.abstractAutomatic speech recognition (ASR) has been driven by representative end-to-end model architectures, including connectionist temporal classification (CTC), attention-based encoder-decoder (AED), and recurrent neural network transducer (RNN-T). However, these models are prone to overfitting during training, which degrades their generalization performance. In this paper, we propose a novel regularization technique applicable to various ASR models: diversity loss and independence loss. Diversity loss reduces the similarity between feature representations, encouraging the model to learn diverse patterns. Independence loss minimizes the covariance between feature vectors, ensuring that they contain independent information and reducing redundancy. We apply these techniques to CTC, AED, and RNN-T models and demonstrate that the proposed regularization method effectively improves the model generalization performance and robustness through extensive experiments.-
dc.format.extent5-
dc.language영어-
dc.language.isoENG-
dc.publisherInternational Speech Communication Association-
dc.titleImproving Generalization of End-to-End ASR through Diversity and Independence Regularization-
dc.typeArticle-
dc.identifier.doi10.21437/Interspeech.2025-1309-
dc.identifier.scopusid2-s2.0-105020065992-
dc.identifier.bibliographicCitationProceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, pp 3578 - 3582-
dc.citation.titleProceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH-
dc.citation.startPage3578-
dc.citation.endPage3582-
dc.type.docTypeConference paper-
dc.description.isOpenAccessN-
dc.description.journalRegisteredClassscopus-
dc.subject.keywordPlusSpeech communication-
dc.subject.keywordAuthordiversity loss-
dc.subject.keywordAuthorindependence loss-
dc.subject.keywordAuthorregularization-
dc.subject.keywordAuthorspeech recognition-
dc.identifier.urlhttps://www.isca-archive.org/interspeech_2025/ko25_interspeech.html-
Files in This Item
Go to Link
Appears in
Collections
서울 공과대학 > 서울 융합전자공학부 > 1. Journal Articles

qrcode

Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.

Related Researcher

Researcher Chang, Joon-Hyuk photo

Chang, Joon-Hyuk
COLLEGE OF ENGINEERING (SCHOOL OF ELECTRONIC ENGINEERING)
Read more

Altmetrics

Total Views & Downloads

BROWSE