Balanced-Wav2Vec: Enhancing Stability and Robustness of Representation Learning Through Sample Reweighting Techniques

Lee, Mun-Hak; Lee, Jae-Hong; Kim, DoHee; Ko, Ye-Eun; Chang, Joon-Hyuk

doi:10.21437/Interspeech.2024-1875

Detailed Information

Cited 0 time in webofscience

Cited 0 time in scopus

Metadata Downloads

Balanced-Wav2Vec: Enhancing Stability and Robustness of Representation Learning Through Sample Reweighting Techniques

Full metadata record

DC Field	Value	Language
dc.contributor.author	Lee, Mun-Hak	-
dc.contributor.author	Lee, Jae-Hong	-
dc.contributor.author	Kim, DoHee	-
dc.contributor.author	Ko, Ye-Eun	-
dc.contributor.author	Chang, Joon-Hyuk	-
dc.date.accessioned	2025-02-12T06:01:40Z	-
dc.date.available	2025-02-12T06:01:40Z	-
dc.date.issued	2024-09	-
dc.identifier.issn	1990-9772	-
dc.identifier.uri	https://scholarworks.bwise.kr/hanyang/handle/2021.sw.hanyang/206460	-
dc.description.abstract	Mode collapse refers to the phenomenon where a representation model fits only a subset of modes in the feature space. Today, numerous self-supervised learning algorithms, including Wav2Vec 2.0, encounter the problem of reduced expressiveness due to mode collapse or dimension collapse. In this study, we experimentally verify that the highly skewed codebook distribution of the Wav2Vec 2.0 exacerbates the mode collapse problem. Based on this empirical finding, we propose the balanced-infoNCE loss, which suppresses the emergence of over-represented modes. We show that the Wav2Vec 2.0 model trained with balanced-infoNCE loss maintains high codebook entropy and converges stably. Furthermore, through finetuning experiments on a multilingual dataset for the ASR task, we demonstrate that balanced-Wav2Vec 2.0 models exhibit superior generalization performance.	-
dc.format.extent	5	-
dc.language	영어	-
dc.language.iso	ENG	-
dc.title	Balanced-Wav2Vec: Enhancing Stability and Robustness of Representation Learning Through Sample Reweighting Techniques	-
dc.type	Article	-
dc.identifier.doi	10.21437/Interspeech.2024-1875	-
dc.identifier.scopusid	2-s2.0-85214827710	-
dc.identifier.wosid	001331850105034	-
dc.identifier.bibliographicCitation	Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, pp 5058 - 5062	-
dc.citation.title	Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH	-
dc.citation.startPage	5058	-
dc.citation.endPage	5062	-
dc.type.docType	Proceedings Paper	-
dc.description.isOpenAccess	N	-
dc.description.journalRegisteredClass	scopus	-
dc.relation.journalResearchArea	Computer Science	-
dc.relation.journalWebOfScienceCategory	Computer Science, Artificial Intelligence	-
dc.subject.keywordPlus	Adversarial machine learning	-
dc.subject.keywordPlus	Contrastive Learning	-
dc.subject.keywordPlus	Federated learning	-
dc.subject.keywordPlus	Learning algorithms	-
dc.subject.keywordPlus	Semi-supervised learning	-
dc.subject.keywordPlus	Speech recognition	-
dc.subject.keywordAuthor	diversity loss	-
dc.subject.keywordAuthor	mode collapse	-
dc.subject.keywordAuthor	self-supervised learning	-
dc.subject.keywordAuthor	speech recognition	-
dc.subject.keywordAuthor	Wav2Vec 2.0	-
dc.identifier.url	https://www.isca-archive.org/interspeech_2024/lee24k_interspeech.html	-

Files in This Item: Go to Link

Appears in Collections: 서울 공과대학 > 서울 융합전자공학부 > 1. Journal Articles

Show simple item record

qrcode

Related Researcher

Researcher Chang, Joon-Hyuk photo

Chang, Joon-Hyuk: COLLEGE OF ENGINEERING (SCHOOL OF ELECTRONIC ENGINEERING)

Read more

Altmetrics

Total Views & Downloads

RSS_1.0 RSS_2.0 ATOM_1.0

222, Wangsimni-ro, Seongdong-gu, Seoul, 04763, Korea+82-2-2220-1366

Certain data included herein are derived from the © Web of Science of Clarivate Analytics. All rights reserved.
You may not copy or re-distribute this material in whole or in part without the prior written consent of Clarivate Analytics.

Detailed Information

Related Researcher

Altmetrics

Total Views & Downloads

BROWSE