Cited 0 time in
Improving Generalization of End-to-End ASR through Diversity and Independence Regularization
| DC Field | Value | Language |
|---|---|---|
| dc.contributor.author | Ko, Ye-Eun | - |
| dc.contributor.author | Lee, Mun-Hak | - |
| dc.contributor.author | Kim, Dong-Hyun | - |
| dc.contributor.author | Chang, Joon-Hyuk | - |
| dc.date.accessioned | 2025-11-20T01:30:39Z | - |
| dc.date.available | 2025-11-20T01:30:39Z | - |
| dc.date.issued | 2025-08 | - |
| dc.identifier.issn | 2958-1796 | - |
| dc.identifier.uri | https://scholarworks.bwise.kr/hanyang/handle/2021.sw.hanyang/209225 | - |
| dc.description.abstract | Automatic speech recognition (ASR) has been driven by representative end-to-end model architectures, including connectionist temporal classification (CTC), attention-based encoder-decoder (AED), and recurrent neural network transducer (RNN-T). However, these models are prone to overfitting during training, which degrades their generalization performance. In this paper, we propose a novel regularization technique applicable to various ASR models: diversity loss and independence loss. Diversity loss reduces the similarity between feature representations, encouraging the model to learn diverse patterns. Independence loss minimizes the covariance between feature vectors, ensuring that they contain independent information and reducing redundancy. We apply these techniques to CTC, AED, and RNN-T models and demonstrate that the proposed regularization method effectively improves the model generalization performance and robustness through extensive experiments. | - |
| dc.format.extent | 5 | - |
| dc.language | 영어 | - |
| dc.language.iso | ENG | - |
| dc.publisher | International Speech Communication Association | - |
| dc.title | Improving Generalization of End-to-End ASR through Diversity and Independence Regularization | - |
| dc.type | Article | - |
| dc.identifier.doi | 10.21437/Interspeech.2025-1309 | - |
| dc.identifier.scopusid | 2-s2.0-105020065992 | - |
| dc.identifier.bibliographicCitation | Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, pp 3578 - 3582 | - |
| dc.citation.title | Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH | - |
| dc.citation.startPage | 3578 | - |
| dc.citation.endPage | 3582 | - |
| dc.type.docType | Conference paper | - |
| dc.description.isOpenAccess | N | - |
| dc.description.journalRegisteredClass | scopus | - |
| dc.subject.keywordPlus | Speech communication | - |
| dc.subject.keywordAuthor | diversity loss | - |
| dc.subject.keywordAuthor | independence loss | - |
| dc.subject.keywordAuthor | regularization | - |
| dc.subject.keywordAuthor | speech recognition | - |
| dc.identifier.url | https://www.isca-archive.org/interspeech_2025/ko25_interspeech.html | - |
Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.
222, Wangsimni-ro, Seongdong-gu, Seoul, 04763, Korea+82-2-2220-1366
COPYRIGHT © 2024 HANYANG UNIVERSITY.
Certain data included herein are derived from the © Web of Science of Clarivate Analytics. All rights reserved.
You may not copy or re-distribute this material in whole or in part without the prior written consent of Clarivate Analytics.
