Whisper Multilingual Downstream Task Tuning Using Task Vectors

Kang, Ji-Hun; Lee, Jae-Hong; Lee, Mun-Hak; Chang, Joon-Hyuk

doi:10.21437/Interspeech.2024-513

Detailed Information

Cited 0 time in webofscience

Cited 0 time in scopus

Metadata Downloads

Whisper Multilingual Downstream Task Tuning Using Task Vectors

Full metadata record

DC Field	Value	Language
dc.contributor.author	Kang, Ji-Hun	-
dc.contributor.author	Lee, Jae-Hong	-
dc.contributor.author	Lee, Mun-Hak	-
dc.contributor.author	Chang, Joon-Hyuk	-
dc.date.accessioned	2025-02-13T02:00:15Z	-
dc.date.available	2025-02-13T02:00:15Z	-
dc.date.issued	2024-09	-
dc.identifier.issn	1990-9772	-
dc.identifier.uri	https://scholarworks.bwise.kr/hanyang/handle/2021.sw.hanyang/206479	-
dc.description.abstract	Recently, the size of automatic speech recognition (ASR) models has been increasing, similar to large language models (LLMs), and efficient tuning to enhance the performance of downstream tasks with limited resources remains a challenge. In this paper, we propose a simple and effective downstream task tuning method using task vectors. We utilize task vectors to orient the pre-trained Whisper model in the weight space, moving in that direction to achieve downstream task adaptation. We demonstrate that the model can be adjusted through arithmetic operations of the task vector, and this adjustment is reflected in the Whisper. Furthermore, we can efficiently construct a generalized model by summing vectors. We set the direction of the model weight space for each multilingual language as the task vector to evaluate its effectiveness. We confirm that the task vector serves as a simple and effective approach for tuning downstream tasks in ASR using the Common Voice multilingual dataset.	-
dc.format.extent	5	-
dc.language	영어	-
dc.language.iso	ENG	-
dc.title	Whisper Multilingual Downstream Task Tuning Using Task Vectors	-
dc.type	Article	-
dc.identifier.doi	10.21437/Interspeech.2024-513	-
dc.identifier.scopusid	2-s2.0-85214813487	-
dc.identifier.wosid	001331850102106	-
dc.identifier.bibliographicCitation	Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, pp 2385 - 2389	-
dc.citation.title	Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH	-
dc.citation.startPage	2385	-
dc.citation.endPage	2389	-
dc.type.docType	Proceedings Paper	-
dc.description.isOpenAccess	N	-
dc.description.journalRegisteredClass	scopus	-
dc.relation.journalResearchArea	Computer Science	-
dc.relation.journalWebOfScienceCategory	Computer Science, Artificial Intelligence	-
dc.subject.keywordPlus	Character recognition	-
dc.subject.keywordPlus	Speech enhancement	-
dc.subject.keywordPlus	Speech recognition	-
dc.subject.keywordPlus	Vector spaces	-
dc.subject.keywordAuthor	downstream tasks adaptation	-
dc.subject.keywordAuthor	multilingual	-
dc.subject.keywordAuthor	speech recognition	-

Files in This Item: There are no files associated with this item.

Appears in Collections: 서울 공과대학 > 서울 융합전자공학부 > 1. Journal Articles

Show simple item record

qrcode

Related Researcher

Researcher Chang, Joon-Hyuk photo

Chang, Joon-Hyuk: COLLEGE OF ENGINEERING (SCHOOL OF ELECTRONIC ENGINEERING)

Read more

Altmetrics

Total Views & Downloads

RSS_1.0 RSS_2.0 ATOM_1.0

222, Wangsimni-ro, Seongdong-gu, Seoul, 04763, Korea+82-2-2220-1366

Certain data included herein are derived from the © Web of Science of Clarivate Analytics. All rights reserved.
You may not copy or re-distribute this material in whole or in part without the prior written consent of Clarivate Analytics.

Detailed Information

Related Researcher

Altmetrics

Total Views & Downloads

BROWSE