Regularizing Transformer-based Acoustic Models by Penalizing Attention Weights for Robust Speech Recognition
DC Field | Value | Language |
---|---|---|
dc.contributor.author | Lee, Mun-Hak | - |
dc.contributor.author | Lee, Sang-Eon | - |
dc.contributor.author | Seong, Ju-Seok | - |
dc.contributor.author | Chang, Joon-Hyuk | - |
dc.contributor.author | Kwon, Haeyoung | - |
dc.contributor.author | Park, Chanhee | - |
dc.date.accessioned | 2022-12-20T06:24:47Z | - |
dc.date.available | 2022-12-20T06:24:47Z | - |
dc.date.created | 2022-11-02 | - |
dc.date.issued | 2022-09 | - |
dc.identifier.issn | 2308-457X | - |
dc.identifier.uri | https://scholarworks.bwise.kr/hanyang/handle/2021.sw.hanyang/173085 | - |
dc.description.abstract | The application of deep learning has significantly advanced the performance of automatic speech recognition (ASR) systems. Various components make up an ASR system, such as the acoustic model (AM), language model (LM), and lexicon. Generally, the AM has benefited the most from deep learning. Numerous types of neural network-based AMs have been studied, but the structure that has received the most attention in recent years is the Transformer [1]. In this study, we demonstrate that the Transformer model is more vulnerable to input sparsity compared to the convolutional neural network (CNN) and analyze the cause of performance degradation through structural characteristics of the Transformer. Moreover, we also propose a novel regularization method that makes the transformer model robust against input sparsity. The proposed sparsity regularization method directly regulates attention weights using silence label information in forced-alignment and has the advantage of not requiring additional module training and excessive computation. We tested the proposed method on five benchmarks and observed an average relative error rate reduction (RERR) of 4.7%. | - |
dc.language | 영어 | - |
dc.language.iso | en | - |
dc.publisher | International Speech Communication Association | - |
dc.title | Regularizing Transformer-based Acoustic Models by Penalizing Attention Weights for Robust Speech Recognition | - |
dc.type | Article | - |
dc.contributor.affiliatedAuthor | Chang, Joon-Hyuk | - |
dc.identifier.doi | 10.21437/Interspeech.2022-362 | - |
dc.identifier.scopusid | 2-s2.0-85140072608 | - |
dc.identifier.wosid | 000900724500012 | - |
dc.identifier.bibliographicCitation | Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, v.2022-September, pp.56 - 60 | - |
dc.relation.isPartOf | Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH | - |
dc.citation.title | Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH | - |
dc.citation.volume | 2022-September | - |
dc.citation.startPage | 56 | - |
dc.citation.endPage | 60 | - |
dc.type.rims | ART | - |
dc.type.docType | Proceedings Paper | - |
dc.description.journalClass | 1 | - |
dc.description.isOpenAccess | N | - |
dc.description.journalRegisteredClass | scopus | - |
dc.relation.journalResearchArea | Acoustics | - |
dc.relation.journalResearchArea | Audiology & Speech-Language Pathology | - |
dc.relation.journalResearchArea | Computer Science | - |
dc.relation.journalResearchArea | Engineering | - |
dc.relation.journalWebOfScienceCategory | Acoustics | - |
dc.relation.journalWebOfScienceCategory | Audiology & Speech-Language Pathology | - |
dc.relation.journalWebOfScienceCategory | Computer Science, Artificial Intelligence | - |
dc.relation.journalWebOfScienceCategory | Engineering, Electrical & Electronic | - |
dc.subject.keywordPlus | Convolutional neural networks | - |
dc.subject.keywordPlus | Deep learning | - |
dc.subject.keywordPlus | Speech communication | - |
dc.subject.keywordPlus | Speech recognition | - |
dc.subject.keywordPlus | Acoustics model | - |
dc.subject.keywordPlus | Automatic speech recognition | - |
dc.subject.keywordPlus | Automatic speech recognition system | - |
dc.subject.keywordPlus | HMM based hybrid automatic speech recognition | - |
dc.subject.keywordPlus | HMM-based | - |
dc.subject.keywordPlus | Regularization methods | - |
dc.subject.keywordPlus | Robust speech recognition | - |
dc.subject.keywordPlus | Sparse features | - |
dc.subject.keywordPlus | Transformer | - |
dc.subject.keywordPlus | Transformer modeling | - |
dc.subject.keywordAuthor | Acoustic Model | - |
dc.subject.keywordAuthor | HMM based hybrid ASR | - |
dc.subject.keywordAuthor | Sparse Feature | - |
dc.subject.keywordAuthor | Speech Recognition | - |
dc.subject.keywordAuthor | Transformer | - |
dc.identifier.url | https://www.isca-speech.org/archive/interspeech_2022/lee22b_interspeech.html | - |
Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.
222, Wangsimni-ro, Seongdong-gu, Seoul, 04763, Korea+82-2-2220-1365
COPYRIGHT © 2021 HANYANG UNIVERSITY.
Certain data included herein are derived from the © Web of Science of Clarivate Analytics. All rights reserved.
You may not copy or re-distribute this material in whole or in part without the prior written consent of Clarivate Analytics.