Detailed Information

Cited 0 time in webofscience Cited 0 time in scopus
Metadata Downloads

Balanced-Wav2Vec: Enhancing Stability and Robustness of Representation Learning Through Sample Reweighting Techniques

Full metadata record
DC Field Value Language
dc.contributor.authorLee, Mun-Hak-
dc.contributor.authorLee, Jae-Hong-
dc.contributor.authorKim, DoHee-
dc.contributor.authorKo, Ye-Eun-
dc.contributor.authorChang, Joon-Hyuk-
dc.date.accessioned2025-02-12T06:01:40Z-
dc.date.available2025-02-12T06:01:40Z-
dc.date.issued2024-09-
dc.identifier.issn1990-9772-
dc.identifier.urihttps://scholarworks.bwise.kr/hanyang/handle/2021.sw.hanyang/206460-
dc.description.abstractMode collapse refers to the phenomenon where a representation model fits only a subset of modes in the feature space. Today, numerous self-supervised learning algorithms, including Wav2Vec 2.0, encounter the problem of reduced expressiveness due to mode collapse or dimension collapse. In this study, we experimentally verify that the highly skewed codebook distribution of the Wav2Vec 2.0 exacerbates the mode collapse problem. Based on this empirical finding, we propose the balanced-infoNCE loss, which suppresses the emergence of over-represented modes. We show that the Wav2Vec 2.0 model trained with balanced-infoNCE loss maintains high codebook entropy and converges stably. Furthermore, through finetuning experiments on a multilingual dataset for the ASR task, we demonstrate that balanced-Wav2Vec 2.0 models exhibit superior generalization performance.-
dc.format.extent5-
dc.language영어-
dc.language.isoENG-
dc.titleBalanced-Wav2Vec: Enhancing Stability and Robustness of Representation Learning Through Sample Reweighting Techniques-
dc.typeArticle-
dc.identifier.doi10.21437/Interspeech.2024-1875-
dc.identifier.scopusid2-s2.0-85214827710-
dc.identifier.wosid001331850105034-
dc.identifier.bibliographicCitationProceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, pp 5058 - 5062-
dc.citation.titleProceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH-
dc.citation.startPage5058-
dc.citation.endPage5062-
dc.type.docTypeProceedings Paper-
dc.description.isOpenAccessN-
dc.description.journalRegisteredClassscopus-
dc.relation.journalResearchAreaComputer Science-
dc.relation.journalWebOfScienceCategoryComputer Science, Artificial Intelligence-
dc.subject.keywordPlusAdversarial machine learning-
dc.subject.keywordPlusContrastive Learning-
dc.subject.keywordPlusFederated learning-
dc.subject.keywordPlusLearning algorithms-
dc.subject.keywordPlusSemi-supervised learning-
dc.subject.keywordPlusSpeech recognition-
dc.subject.keywordAuthordiversity loss-
dc.subject.keywordAuthormode collapse-
dc.subject.keywordAuthorself-supervised learning-
dc.subject.keywordAuthorspeech recognition-
dc.subject.keywordAuthorWav2Vec 2.0-
dc.identifier.urlhttps://www.isca-archive.org/interspeech_2024/lee24k_interspeech.html-
Files in This Item
Go to Link
Appears in
Collections
서울 공과대학 > 서울 융합전자공학부 > 1. Journal Articles

qrcode

Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.

Related Researcher

Researcher Chang, Joon-Hyuk photo

Chang, Joon-Hyuk
COLLEGE OF ENGINEERING (SCHOOL OF ELECTRONIC ENGINEERING)
Read more

Altmetrics

Total Views & Downloads

BROWSE