Cited 0 time in
Balanced-Wav2Vec: Enhancing Stability and Robustness of Representation Learning Through Sample Reweighting Techniques
| DC Field | Value | Language |
|---|---|---|
| dc.contributor.author | Lee, Mun-Hak | - |
| dc.contributor.author | Lee, Jae-Hong | - |
| dc.contributor.author | Kim, DoHee | - |
| dc.contributor.author | Ko, Ye-Eun | - |
| dc.contributor.author | Chang, Joon-Hyuk | - |
| dc.date.accessioned | 2025-02-12T06:01:40Z | - |
| dc.date.available | 2025-02-12T06:01:40Z | - |
| dc.date.issued | 2024-09 | - |
| dc.identifier.issn | 1990-9772 | - |
| dc.identifier.uri | https://scholarworks.bwise.kr/hanyang/handle/2021.sw.hanyang/206460 | - |
| dc.description.abstract | Mode collapse refers to the phenomenon where a representation model fits only a subset of modes in the feature space. Today, numerous self-supervised learning algorithms, including Wav2Vec 2.0, encounter the problem of reduced expressiveness due to mode collapse or dimension collapse. In this study, we experimentally verify that the highly skewed codebook distribution of the Wav2Vec 2.0 exacerbates the mode collapse problem. Based on this empirical finding, we propose the balanced-infoNCE loss, which suppresses the emergence of over-represented modes. We show that the Wav2Vec 2.0 model trained with balanced-infoNCE loss maintains high codebook entropy and converges stably. Furthermore, through finetuning experiments on a multilingual dataset for the ASR task, we demonstrate that balanced-Wav2Vec 2.0 models exhibit superior generalization performance. | - |
| dc.format.extent | 5 | - |
| dc.language | 영어 | - |
| dc.language.iso | ENG | - |
| dc.title | Balanced-Wav2Vec: Enhancing Stability and Robustness of Representation Learning Through Sample Reweighting Techniques | - |
| dc.type | Article | - |
| dc.identifier.doi | 10.21437/Interspeech.2024-1875 | - |
| dc.identifier.scopusid | 2-s2.0-85214827710 | - |
| dc.identifier.wosid | 001331850105034 | - |
| dc.identifier.bibliographicCitation | Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, pp 5058 - 5062 | - |
| dc.citation.title | Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH | - |
| dc.citation.startPage | 5058 | - |
| dc.citation.endPage | 5062 | - |
| dc.type.docType | Proceedings Paper | - |
| dc.description.isOpenAccess | N | - |
| dc.description.journalRegisteredClass | scopus | - |
| dc.relation.journalResearchArea | Computer Science | - |
| dc.relation.journalWebOfScienceCategory | Computer Science, Artificial Intelligence | - |
| dc.subject.keywordPlus | Adversarial machine learning | - |
| dc.subject.keywordPlus | Contrastive Learning | - |
| dc.subject.keywordPlus | Federated learning | - |
| dc.subject.keywordPlus | Learning algorithms | - |
| dc.subject.keywordPlus | Semi-supervised learning | - |
| dc.subject.keywordPlus | Speech recognition | - |
| dc.subject.keywordAuthor | diversity loss | - |
| dc.subject.keywordAuthor | mode collapse | - |
| dc.subject.keywordAuthor | self-supervised learning | - |
| dc.subject.keywordAuthor | speech recognition | - |
| dc.subject.keywordAuthor | Wav2Vec 2.0 | - |
| dc.identifier.url | https://www.isca-archive.org/interspeech_2024/lee24k_interspeech.html | - |
Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.
222, Wangsimni-ro, Seongdong-gu, Seoul, 04763, Korea+82-2-2220-1366
COPYRIGHT © 2024 HANYANG UNIVERSITY.
Certain data included herein are derived from the © Web of Science of Clarivate Analytics. All rights reserved.
You may not copy or re-distribute this material in whole or in part without the prior written consent of Clarivate Analytics.
