W2V2-Light: A Lightweight Version of Wav2vec 2.0 for Automatic Speech Recognition
- Authors
- Kim, Dong-Hyun; Lee, Jae-Hong; Mo, Ji-Hwan; Chang, Joon-Hyuk
- Issue Date
- Sep-2022
- Publisher
- International Speech Communication Association
- Keywords
- attention alignment; Automatic speech recognition; parameter sharing; representation learning; semi-supervised learning
- Citation
- Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, v.2022-September, pp.3038 - 3042
- Indexed
- SCOPUS
- Journal Title
- Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
- Volume
- 2022-September
- Start Page
- 3038
- End Page
- 3042
- URI
- https://scholarworks.bwise.kr/hanyang/handle/2021.sw.hanyang/173086
- DOI
- 10.21437/Interspeech.2022-10339
- ISSN
- 2308-457X
- Abstract
- Wav2vec 2.0 (W2V2) has shown remarkable speech recognition performance by pre-training only with unlabeled data and fine-tuning with a small amount of labeled data. However, the practical application of W2V2 is hindered by hardware memory limitations, as it contains 317 million parameters. To address this issue, we propose W2V2-Light, a lightweight version of W2V2. We introduce two simple sharing methods to reduce the memory consumption as well as the computational costs of W2V2. Compared to W2V2, our model has 91% lesser parameters and a speedup of 1.31 times with minor degradation in downstream task performance. Moreover, by quantifying the stability of representations, we provide an empirical insight into why our model is capable of maintaining competitive performance despite the significant reduction in memory.
- Files in This Item
-
Go to Link
- Appears in
Collections - 서울 공과대학 > 서울 융합전자공학부 > 1. Journal Articles
![qrcode](https://api.qrserver.com/v1/create-qr-code/?size=55x55&data=https://scholarworks.bwise.kr/hanyang/handle/2021.sw.hanyang/173086)
Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.