Detailed Information

Cited 0 time in webofscience Cited 0 time in scopus
Metadata Downloads

Whisper Multilingual Downstream Task Tuning Using Task Vectors

Full metadata record
DC Field Value Language
dc.contributor.authorKang, Ji-Hun-
dc.contributor.authorLee, Jae-Hong-
dc.contributor.authorLee, Mun-Hak-
dc.contributor.authorChang, Joon-Hyuk-
dc.date.accessioned2025-02-13T02:00:15Z-
dc.date.available2025-02-13T02:00:15Z-
dc.date.issued2024-09-
dc.identifier.issn1990-9772-
dc.identifier.urihttps://scholarworks.bwise.kr/hanyang/handle/2021.sw.hanyang/206479-
dc.description.abstractRecently, the size of automatic speech recognition (ASR) models has been increasing, similar to large language models (LLMs), and efficient tuning to enhance the performance of downstream tasks with limited resources remains a challenge. In this paper, we propose a simple and effective downstream task tuning method using task vectors. We utilize task vectors to orient the pre-trained Whisper model in the weight space, moving in that direction to achieve downstream task adaptation. We demonstrate that the model can be adjusted through arithmetic operations of the task vector, and this adjustment is reflected in the Whisper. Furthermore, we can efficiently construct a generalized model by summing vectors. We set the direction of the model weight space for each multilingual language as the task vector to evaluate its effectiveness. We confirm that the task vector serves as a simple and effective approach for tuning downstream tasks in ASR using the Common Voice multilingual dataset.-
dc.format.extent5-
dc.language영어-
dc.language.isoENG-
dc.titleWhisper Multilingual Downstream Task Tuning Using Task Vectors-
dc.typeArticle-
dc.identifier.doi10.21437/Interspeech.2024-513-
dc.identifier.scopusid2-s2.0-85214813487-
dc.identifier.wosid001331850102106-
dc.identifier.bibliographicCitationProceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, pp 2385 - 2389-
dc.citation.titleProceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH-
dc.citation.startPage2385-
dc.citation.endPage2389-
dc.type.docTypeProceedings Paper-
dc.description.isOpenAccessN-
dc.description.journalRegisteredClassscopus-
dc.relation.journalResearchAreaComputer Science-
dc.relation.journalWebOfScienceCategoryComputer Science, Artificial Intelligence-
dc.subject.keywordPlusCharacter recognition-
dc.subject.keywordPlusSpeech enhancement-
dc.subject.keywordPlusSpeech recognition-
dc.subject.keywordPlusVector spaces-
dc.subject.keywordAuthordownstream tasks adaptation-
dc.subject.keywordAuthormultilingual-
dc.subject.keywordAuthorspeech recognition-
Files in This Item
There are no files associated with this item.
Appears in
Collections
서울 공과대학 > 서울 융합전자공학부 > 1. Journal Articles

qrcode

Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.

Related Researcher

Researcher Chang, Joon-Hyuk photo

Chang, Joon-Hyuk
COLLEGE OF ENGINEERING (SCHOOL OF ELECTRONIC ENGINEERING)
Read more

Altmetrics

Total Views & Downloads

BROWSE