Detailed Information

Cited 0 time in webofscience Cited 0 time in scopus
Metadata Downloads

Regularizing Transformer-based Acoustic Models by Penalizing Attention Weights for Robust Speech Recognition

Full metadata record
DC Field Value Language
dc.contributor.authorLee, Mun-Hak-
dc.contributor.authorLee, Sang-Eon-
dc.contributor.authorSeong, Ju-Seok-
dc.contributor.authorChang, Joon-Hyuk-
dc.contributor.authorKwon, Haeyoung-
dc.contributor.authorPark, Chanhee-
dc.date.accessioned2022-12-20T06:24:47Z-
dc.date.available2022-12-20T06:24:47Z-
dc.date.created2022-11-02-
dc.date.issued2022-09-
dc.identifier.issn2308-457X-
dc.identifier.urihttps://scholarworks.bwise.kr/hanyang/handle/2021.sw.hanyang/173085-
dc.description.abstractThe application of deep learning has significantly advanced the performance of automatic speech recognition (ASR) systems. Various components make up an ASR system, such as the acoustic model (AM), language model (LM), and lexicon. Generally, the AM has benefited the most from deep learning. Numerous types of neural network-based AMs have been studied, but the structure that has received the most attention in recent years is the Transformer [1]. In this study, we demonstrate that the Transformer model is more vulnerable to input sparsity compared to the convolutional neural network (CNN) and analyze the cause of performance degradation through structural characteristics of the Transformer. Moreover, we also propose a novel regularization method that makes the transformer model robust against input sparsity. The proposed sparsity regularization method directly regulates attention weights using silence label information in forced-alignment and has the advantage of not requiring additional module training and excessive computation. We tested the proposed method on five benchmarks and observed an average relative error rate reduction (RERR) of 4.7%.-
dc.language영어-
dc.language.isoen-
dc.publisherInternational Speech Communication Association-
dc.titleRegularizing Transformer-based Acoustic Models by Penalizing Attention Weights for Robust Speech Recognition-
dc.typeArticle-
dc.contributor.affiliatedAuthorChang, Joon-Hyuk-
dc.identifier.doi10.21437/Interspeech.2022-362-
dc.identifier.scopusid2-s2.0-85140072608-
dc.identifier.wosid000900724500012-
dc.identifier.bibliographicCitationProceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, v.2022-September, pp.56 - 60-
dc.relation.isPartOfProceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH-
dc.citation.titleProceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH-
dc.citation.volume2022-September-
dc.citation.startPage56-
dc.citation.endPage60-
dc.type.rimsART-
dc.type.docTypeProceedings Paper-
dc.description.journalClass1-
dc.description.isOpenAccessN-
dc.description.journalRegisteredClassscopus-
dc.relation.journalResearchAreaAcoustics-
dc.relation.journalResearchAreaAudiology & Speech-Language Pathology-
dc.relation.journalResearchAreaComputer Science-
dc.relation.journalResearchAreaEngineering-
dc.relation.journalWebOfScienceCategoryAcoustics-
dc.relation.journalWebOfScienceCategoryAudiology & Speech-Language Pathology-
dc.relation.journalWebOfScienceCategoryComputer Science, Artificial Intelligence-
dc.relation.journalWebOfScienceCategoryEngineering, Electrical & Electronic-
dc.subject.keywordPlusConvolutional neural networks-
dc.subject.keywordPlusDeep learning-
dc.subject.keywordPlusSpeech communication-
dc.subject.keywordPlusSpeech recognition-
dc.subject.keywordPlusAcoustics model-
dc.subject.keywordPlusAutomatic speech recognition-
dc.subject.keywordPlusAutomatic speech recognition system-
dc.subject.keywordPlusHMM based hybrid automatic speech recognition-
dc.subject.keywordPlusHMM-based-
dc.subject.keywordPlusRegularization methods-
dc.subject.keywordPlusRobust speech recognition-
dc.subject.keywordPlusSparse features-
dc.subject.keywordPlusTransformer-
dc.subject.keywordPlusTransformer modeling-
dc.subject.keywordAuthorAcoustic Model-
dc.subject.keywordAuthorHMM based hybrid ASR-
dc.subject.keywordAuthorSparse Feature-
dc.subject.keywordAuthorSpeech Recognition-
dc.subject.keywordAuthorTransformer-
dc.identifier.urlhttps://www.isca-speech.org/archive/interspeech_2022/lee22b_interspeech.html-
Files in This Item
Go to Link
Appears in
Collections
서울 공과대학 > 서울 융합전자공학부 > 1. Journal Articles

qrcode

Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.

Related Researcher

Researcher Chang, Joon-Hyuk photo

Chang, Joon-Hyuk
COLLEGE OF ENGINEERING (SCHOOL OF ELECTRONIC ENGINEERING)
Read more

Altmetrics

Total Views & Downloads

BROWSE