Improving Generalization of End-to-End ASR through Diversity and Independence Regularization

Ko, Ye-Eun; Lee, Mun-Hak; Kim, Dong-Hyun; Chang, Joon-Hyuk

doi:10.21437/Interspeech.2025-1309

Detailed Information

Cited 0 time in webofscience

Cited 0 time in scopus

Metadata Downloads

Improving Generalization of End-to-End ASR through Diversity and Independence Regularization

Full metadata record

DC Field	Value	Language
dc.contributor.author	Ko, Ye-Eun	-
dc.contributor.author	Lee, Mun-Hak	-
dc.contributor.author	Kim, Dong-Hyun	-
dc.contributor.author	Chang, Joon-Hyuk	-
dc.date.accessioned	2025-11-20T01:30:39Z	-
dc.date.available	2025-11-20T01:30:39Z	-
dc.date.issued	2025-08	-
dc.identifier.issn	2958-1796	-
dc.identifier.uri	https://scholarworks.bwise.kr/hanyang/handle/2021.sw.hanyang/209225	-
dc.description.abstract	Automatic speech recognition (ASR) has been driven by representative end-to-end model architectures, including connectionist temporal classification (CTC), attention-based encoder-decoder (AED), and recurrent neural network transducer (RNN-T). However, these models are prone to overfitting during training, which degrades their generalization performance. In this paper, we propose a novel regularization technique applicable to various ASR models: diversity loss and independence loss. Diversity loss reduces the similarity between feature representations, encouraging the model to learn diverse patterns. Independence loss minimizes the covariance between feature vectors, ensuring that they contain independent information and reducing redundancy. We apply these techniques to CTC, AED, and RNN-T models and demonstrate that the proposed regularization method effectively improves the model generalization performance and robustness through extensive experiments.	-
dc.format.extent	5	-
dc.language	영어	-
dc.language.iso	ENG	-
dc.publisher	International Speech Communication Association	-
dc.title	Improving Generalization of End-to-End ASR through Diversity and Independence Regularization	-
dc.type	Article	-
dc.identifier.doi	10.21437/Interspeech.2025-1309	-
dc.identifier.scopusid	2-s2.0-105020065992	-
dc.identifier.bibliographicCitation	Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, pp 3578 - 3582	-
dc.citation.title	Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH	-
dc.citation.startPage	3578	-
dc.citation.endPage	3582	-
dc.type.docType	Conference paper	-
dc.description.isOpenAccess	N	-
dc.description.journalRegisteredClass	scopus	-
dc.subject.keywordPlus	Speech communication	-
dc.subject.keywordAuthor	diversity loss	-
dc.subject.keywordAuthor	independence loss	-
dc.subject.keywordAuthor	regularization	-
dc.subject.keywordAuthor	speech recognition	-
dc.identifier.url	https://www.isca-archive.org/interspeech_2025/ko25_interspeech.html	-

Files in This Item: Go to Link

Appears in Collections: 서울 공과대학 > 서울 융합전자공학부 > 1. Journal Articles

Show simple item record

qrcode

Related Researcher

Researcher Chang, Joon-Hyuk photo

Chang, Joon-Hyuk: COLLEGE OF ENGINEERING (SCHOOL OF ELECTRONIC ENGINEERING)

Read more

Altmetrics

Total Views & Downloads

RSS_1.0 RSS_2.0 ATOM_1.0

222, Wangsimni-ro, Seongdong-gu, Seoul, 04763, Korea+82-2-2220-1366

Certain data included herein are derived from the © Web of Science of Clarivate Analytics. All rights reserved.
You may not copy or re-distribute this material in whole or in part without the prior written consent of Clarivate Analytics.

Detailed Information

Related Researcher

Altmetrics

Total Views & Downloads

BROWSE