General-purpose Adversarial Training for Enhanced Automatic Speech Recognition Model Generalization
- Authors
- Kim, Dohee; Shim, Daeyeol; Chang, Joon-Hyuk
- Issue Date
- Aug-2023
- Publisher
- International Speech Communication Association
- Keywords
- adversarial training; data augmentation; speech recognition
- Citation
- Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, v.2023-August, pp.889 - 893
- Indexed
- SCOPUS
- Journal Title
- Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
- Volume
- 2023-August
- Start Page
- 889
- End Page
- 893
- URI
- https://scholarworks.bwise.kr/hanyang/handle/2021.sw.hanyang/191796
- DOI
- 10.21437/Interspeech.2023-2389
- ISSN
- 2308-457X
- Abstract
- We present a new adversarial training method called General-purpose adversarial training (GPAT) that enhances the performance of automatic speech recognition models. In GPAT, we propose the followings: (1) a plausible adversarial examples converter (PAC); (2) a distribution matching regularization term (DM reg.). Compared to previous studies that directly compute gradients with respect to the input, PAC incorporates non-linearity to achieve performance improvement while eliminating the need for extra forward passes. Furthermore, unlike previous studies that use fixed norms, GPAT can generate similar yet diverse samples through DM reg. We demonstrate that the GPAT elevates the performance of various models on the LibriSpeech dataset. Specifically, by applying GPAT to the conformer model, we achieved 5.3% average relative improvements. With respect to the wav2vec 2.0 experiments, our method yielded a 2.0%/4.4% word error rate on the LibriSpeech test sets without a language model.
- Files in This Item
-
Go to Link
- Appears in
Collections - 서울 공과대학 > 서울 융합전자공학부 > 1. Journal Articles
Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.