General-purpose Adversarial Training for Enhanced Automatic Speech Recognition Model Generalization
DC Field | Value | Language |
---|---|---|
dc.contributor.author | Kim, Dohee | - |
dc.contributor.author | Shim, Daeyeol | - |
dc.contributor.author | Chang, Joon-Hyuk | - |
dc.date.accessioned | 2023-10-10T02:36:18Z | - |
dc.date.available | 2023-10-10T02:36:18Z | - |
dc.date.created | 2023-10-04 | - |
dc.date.issued | 2023-08 | - |
dc.identifier.issn | 2308-457X | - |
dc.identifier.uri | https://scholarworks.bwise.kr/hanyang/handle/2021.sw.hanyang/191796 | - |
dc.description.abstract | We present a new adversarial training method called General-purpose adversarial training (GPAT) that enhances the performance of automatic speech recognition models. In GPAT, we propose the followings: (1) a plausible adversarial examples converter (PAC); (2) a distribution matching regularization term (DM reg.). Compared to previous studies that directly compute gradients with respect to the input, PAC incorporates non-linearity to achieve performance improvement while eliminating the need for extra forward passes. Furthermore, unlike previous studies that use fixed norms, GPAT can generate similar yet diverse samples through DM reg. We demonstrate that the GPAT elevates the performance of various models on the LibriSpeech dataset. Specifically, by applying GPAT to the conformer model, we achieved 5.3% average relative improvements. With respect to the wav2vec 2.0 experiments, our method yielded a 2.0%/4.4% word error rate on the LibriSpeech test sets without a language model. | - |
dc.language | 영어 | - |
dc.language.iso | en | - |
dc.publisher | International Speech Communication Association | - |
dc.title | General-purpose Adversarial Training for Enhanced Automatic Speech Recognition Model Generalization | - |
dc.type | Article | - |
dc.contributor.affiliatedAuthor | Chang, Joon-Hyuk | - |
dc.identifier.doi | 10.21437/Interspeech.2023-2389 | - |
dc.identifier.scopusid | 2-s2.0-85171526253 | - |
dc.identifier.bibliographicCitation | Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, v.2023-August, pp.889 - 893 | - |
dc.relation.isPartOf | Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH | - |
dc.citation.title | Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH | - |
dc.citation.volume | 2023-August | - |
dc.citation.startPage | 889 | - |
dc.citation.endPage | 893 | - |
dc.type.rims | ART | - |
dc.type.docType | Conference paper | - |
dc.description.journalClass | 1 | - |
dc.description.isOpenAccess | N | - |
dc.description.journalRegisteredClass | scopus | - |
dc.subject.keywordPlus | Speech communication | - |
dc.subject.keywordPlus | Adversarial training | - |
dc.subject.keywordPlus | Automatic speech recognition | - |
dc.subject.keywordPlus | Data augmentation | - |
dc.subject.keywordPlus | Distribution matching | - |
dc.subject.keywordPlus | Model generalization | - |
dc.subject.keywordPlus | Performance | - |
dc.subject.keywordPlus | Recognition models | - |
dc.subject.keywordPlus | Regularization terms | - |
dc.subject.keywordPlus | Training methods | - |
dc.subject.keywordPlus | Word error rate | - |
dc.subject.keywordPlus | Speech recognition | - |
dc.subject.keywordAuthor | adversarial training | - |
dc.subject.keywordAuthor | data augmentation | - |
dc.subject.keywordAuthor | speech recognition | - |
dc.identifier.url | https://www.isca-speech.org/archive/interspeech_2023/kim23l_interspeech.html | - |
Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.
222, Wangsimni-ro, Seongdong-gu, Seoul, 04763, Korea+82-2-2220-1365
COPYRIGHT © 2021 HANYANG UNIVERSITY.
Certain data included herein are derived from the © Web of Science of Clarivate Analytics. All rights reserved.
You may not copy or re-distribute this material in whole or in part without the prior written consent of Clarivate Analytics.