Detailed Information

Cited 0 time in webofscience Cited 0 time in scopus
Metadata Downloads

General-purpose Adversarial Training for Enhanced Automatic Speech Recognition Model Generalization

Full metadata record
DC Field Value Language
dc.contributor.authorKim, Dohee-
dc.contributor.authorShim, Daeyeol-
dc.contributor.authorChang, Joon-Hyuk-
dc.date.accessioned2023-10-10T02:36:18Z-
dc.date.available2023-10-10T02:36:18Z-
dc.date.created2023-10-04-
dc.date.issued2023-08-
dc.identifier.issn2308-457X-
dc.identifier.urihttps://scholarworks.bwise.kr/hanyang/handle/2021.sw.hanyang/191796-
dc.description.abstractWe present a new adversarial training method called General-purpose adversarial training (GPAT) that enhances the performance of automatic speech recognition models. In GPAT, we propose the followings: (1) a plausible adversarial examples converter (PAC); (2) a distribution matching regularization term (DM reg.). Compared to previous studies that directly compute gradients with respect to the input, PAC incorporates non-linearity to achieve performance improvement while eliminating the need for extra forward passes. Furthermore, unlike previous studies that use fixed norms, GPAT can generate similar yet diverse samples through DM reg. We demonstrate that the GPAT elevates the performance of various models on the LibriSpeech dataset. Specifically, by applying GPAT to the conformer model, we achieved 5.3% average relative improvements. With respect to the wav2vec 2.0 experiments, our method yielded a 2.0%/4.4% word error rate on the LibriSpeech test sets without a language model.-
dc.language영어-
dc.language.isoen-
dc.publisherInternational Speech Communication Association-
dc.titleGeneral-purpose Adversarial Training for Enhanced Automatic Speech Recognition Model Generalization-
dc.typeArticle-
dc.contributor.affiliatedAuthorChang, Joon-Hyuk-
dc.identifier.doi10.21437/Interspeech.2023-2389-
dc.identifier.scopusid2-s2.0-85171526253-
dc.identifier.bibliographicCitationProceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, v.2023-August, pp.889 - 893-
dc.relation.isPartOfProceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH-
dc.citation.titleProceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH-
dc.citation.volume2023-August-
dc.citation.startPage889-
dc.citation.endPage893-
dc.type.rimsART-
dc.type.docTypeConference paper-
dc.description.journalClass1-
dc.description.isOpenAccessN-
dc.description.journalRegisteredClassscopus-
dc.subject.keywordPlusSpeech communication-
dc.subject.keywordPlusAdversarial training-
dc.subject.keywordPlusAutomatic speech recognition-
dc.subject.keywordPlusData augmentation-
dc.subject.keywordPlusDistribution matching-
dc.subject.keywordPlusModel generalization-
dc.subject.keywordPlusPerformance-
dc.subject.keywordPlusRecognition models-
dc.subject.keywordPlusRegularization terms-
dc.subject.keywordPlusTraining methods-
dc.subject.keywordPlusWord error rate-
dc.subject.keywordPlusSpeech recognition-
dc.subject.keywordAuthoradversarial training-
dc.subject.keywordAuthordata augmentation-
dc.subject.keywordAuthorspeech recognition-
dc.identifier.urlhttps://www.isca-speech.org/archive/interspeech_2023/kim23l_interspeech.html-
Files in This Item
Go to Link
Appears in
Collections
서울 공과대학 > 서울 융합전자공학부 > 1. Journal Articles

qrcode

Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.

Related Researcher

Researcher Chang, Joon-Hyuk photo

Chang, Joon-Hyuk
COLLEGE OF ENGINEERING (SCHOOL OF ELECTRONIC ENGINEERING)
Read more

Altmetrics

Total Views & Downloads

BROWSE