Cited 0 time in
Whisper Multilingual Downstream Task Tuning Using Task Vectors
| DC Field | Value | Language |
|---|---|---|
| dc.contributor.author | Kang, Ji-Hun | - |
| dc.contributor.author | Lee, Jae-Hong | - |
| dc.contributor.author | Lee, Mun-Hak | - |
| dc.contributor.author | Chang, Joon-Hyuk | - |
| dc.date.accessioned | 2025-02-13T02:00:15Z | - |
| dc.date.available | 2025-02-13T02:00:15Z | - |
| dc.date.issued | 2024-09 | - |
| dc.identifier.issn | 1990-9772 | - |
| dc.identifier.uri | https://scholarworks.bwise.kr/hanyang/handle/2021.sw.hanyang/206479 | - |
| dc.description.abstract | Recently, the size of automatic speech recognition (ASR) models has been increasing, similar to large language models (LLMs), and efficient tuning to enhance the performance of downstream tasks with limited resources remains a challenge. In this paper, we propose a simple and effective downstream task tuning method using task vectors. We utilize task vectors to orient the pre-trained Whisper model in the weight space, moving in that direction to achieve downstream task adaptation. We demonstrate that the model can be adjusted through arithmetic operations of the task vector, and this adjustment is reflected in the Whisper. Furthermore, we can efficiently construct a generalized model by summing vectors. We set the direction of the model weight space for each multilingual language as the task vector to evaluate its effectiveness. We confirm that the task vector serves as a simple and effective approach for tuning downstream tasks in ASR using the Common Voice multilingual dataset. | - |
| dc.format.extent | 5 | - |
| dc.language | 영어 | - |
| dc.language.iso | ENG | - |
| dc.title | Whisper Multilingual Downstream Task Tuning Using Task Vectors | - |
| dc.type | Article | - |
| dc.identifier.doi | 10.21437/Interspeech.2024-513 | - |
| dc.identifier.scopusid | 2-s2.0-85214813487 | - |
| dc.identifier.wosid | 001331850102106 | - |
| dc.identifier.bibliographicCitation | Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, pp 2385 - 2389 | - |
| dc.citation.title | Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH | - |
| dc.citation.startPage | 2385 | - |
| dc.citation.endPage | 2389 | - |
| dc.type.docType | Proceedings Paper | - |
| dc.description.isOpenAccess | N | - |
| dc.description.journalRegisteredClass | scopus | - |
| dc.relation.journalResearchArea | Computer Science | - |
| dc.relation.journalWebOfScienceCategory | Computer Science, Artificial Intelligence | - |
| dc.subject.keywordPlus | Character recognition | - |
| dc.subject.keywordPlus | Speech enhancement | - |
| dc.subject.keywordPlus | Speech recognition | - |
| dc.subject.keywordPlus | Vector spaces | - |
| dc.subject.keywordAuthor | downstream tasks adaptation | - |
| dc.subject.keywordAuthor | multilingual | - |
| dc.subject.keywordAuthor | speech recognition | - |
Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.
222, Wangsimni-ro, Seongdong-gu, Seoul, 04763, Korea+82-2-2220-1366
COPYRIGHT © 2024 HANYANG UNIVERSITY.
Certain data included herein are derived from the © Web of Science of Clarivate Analytics. All rights reserved.
You may not copy or re-distribute this material in whole or in part without the prior written consent of Clarivate Analytics.
