Whisper Multilingual Downstream Task Tuning Using Task Vectors
- Authors
- Kang, Ji-Hun; Lee, Jae-Hong; Lee, Mun-Hak; Chang, Joon-Hyuk
- Issue Date
- Sep-2024
- Keywords
- downstream tasks adaptation; multilingual; speech recognition
- Citation
- Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, pp 2385 - 2389
- Pages
- 5
- Indexed
- SCOPUS
- Journal Title
- Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
- Start Page
- 2385
- End Page
- 2389
- URI
- https://scholarworks.bwise.kr/hanyang/handle/2021.sw.hanyang/206479
- DOI
- 10.21437/Interspeech.2024-513
- ISSN
- 1990-9772
- Abstract
- Recently, the size of automatic speech recognition (ASR) models has been increasing, similar to large language models (LLMs), and efficient tuning to enhance the performance of downstream tasks with limited resources remains a challenge. In this paper, we propose a simple and effective downstream task tuning method using task vectors. We utilize task vectors to orient the pre-trained Whisper model in the weight space, moving in that direction to achieve downstream task adaptation. We demonstrate that the model can be adjusted through arithmetic operations of the task vector, and this adjustment is reflected in the Whisper. Furthermore, we can efficiently construct a generalized model by summing vectors. We set the direction of the model weight space for each multilingual language as the task vector to evaluate its effectiveness. We confirm that the task vector serves as a simple and effective approach for tuning downstream tasks in ASR using the Common Voice multilingual dataset.
- Files in This Item
- There are no files associated with this item.
- Appears in
Collections - 서울 공과대학 > 서울 융합전자공학부 > 1. Journal Articles

Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.