Improving Generalization of End-to-End ASR through Diversity and Independence Regularization
- Authors
- Ko, Ye-Eun; Lee, Mun-Hak; Kim, Dong-Hyun; Chang, Joon-Hyuk
- Issue Date
- Aug-2025
- Publisher
- International Speech Communication Association
- Keywords
- diversity loss; independence loss; regularization; speech recognition
- Citation
- Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, pp 3578 - 3582
- Pages
- 5
- Indexed
- SCOPUS
- Journal Title
- Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
- Start Page
- 3578
- End Page
- 3582
- URI
- https://scholarworks.bwise.kr/hanyang/handle/2021.sw.hanyang/209225
- DOI
- 10.21437/Interspeech.2025-1309
- ISSN
- 2958-1796
- Abstract
- Automatic speech recognition (ASR) has been driven by representative end-to-end model architectures, including connectionist temporal classification (CTC), attention-based encoder-decoder (AED), and recurrent neural network transducer (RNN-T). However, these models are prone to overfitting during training, which degrades their generalization performance. In this paper, we propose a novel regularization technique applicable to various ASR models: diversity loss and independence loss. Diversity loss reduces the similarity between feature representations, encouraging the model to learn diverse patterns. Independence loss minimizes the covariance between feature vectors, ensuring that they contain independent information and reducing redundancy. We apply these techniques to CTC, AED, and RNN-T models and demonstrate that the proposed regularization method effectively improves the model generalization performance and robustness through extensive experiments.
- Files in This Item
-
Go to Link
- Appears in
Collections - 서울 공과대학 > 서울 융합전자공학부 > 1. Journal Articles

Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.