Regularizing Transformer-based Acoustic Models by Penalizing Attention Weights for Robust Speech Recognition

Lee, Mun-Hak; Lee, Sang-Eon; Seong, Ju-Seok; Chang, Joon-Hyuk; Kwon, Haeyoung; Park, Chanhee

doi:10.21437/Interspeech.2022-362

Detailed Information

Cited 0 time in webofscience

Cited 0 time in scopus

Metadata Downloads

Regularizing Transformer-based Acoustic Models by Penalizing Attention Weights for Robust Speech Recognition

Full metadata record

DC Field	Value	Language
dc.contributor.author	Lee, Mun-Hak	-
dc.contributor.author	Lee, Sang-Eon	-
dc.contributor.author	Seong, Ju-Seok	-
dc.contributor.author	Chang, Joon-Hyuk	-
dc.contributor.author	Kwon, Haeyoung	-
dc.contributor.author	Park, Chanhee	-
dc.date.accessioned	2022-12-20T06:24:47Z	-
dc.date.available	2022-12-20T06:24:47Z	-
dc.date.created	2022-11-02	-
dc.date.issued	2022-09	-
dc.identifier.issn	2308-457X	-
dc.identifier.uri	https://scholarworks.bwise.kr/hanyang/handle/2021.sw.hanyang/173085	-
dc.description.abstract	The application of deep learning has significantly advanced the performance of automatic speech recognition (ASR) systems. Various components make up an ASR system, such as the acoustic model (AM), language model (LM), and lexicon. Generally, the AM has benefited the most from deep learning. Numerous types of neural network-based AMs have been studied, but the structure that has received the most attention in recent years is the Transformer [1]. In this study, we demonstrate that the Transformer model is more vulnerable to input sparsity compared to the convolutional neural network (CNN) and analyze the cause of performance degradation through structural characteristics of the Transformer. Moreover, we also propose a novel regularization method that makes the transformer model robust against input sparsity. The proposed sparsity regularization method directly regulates attention weights using silence label information in forced-alignment and has the advantage of not requiring additional module training and excessive computation. We tested the proposed method on five benchmarks and observed an average relative error rate reduction (RERR) of 4.7%.	-
dc.language	영어	-
dc.language.iso	en	-
dc.publisher	International Speech Communication Association	-
dc.title	Regularizing Transformer-based Acoustic Models by Penalizing Attention Weights for Robust Speech Recognition	-
dc.type	Article	-
dc.contributor.affiliatedAuthor	Chang, Joon-Hyuk	-
dc.identifier.doi	10.21437/Interspeech.2022-362	-
dc.identifier.scopusid	2-s2.0-85140072608	-
dc.identifier.wosid	000900724500012	-
dc.identifier.bibliographicCitation	Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, v.2022-September, pp.56 - 60	-
dc.relation.isPartOf	Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH	-
dc.citation.title	Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH	-
dc.citation.volume	2022-September	-
dc.citation.startPage	56	-
dc.citation.endPage	60	-
dc.type.rims	ART	-
dc.type.docType	Proceedings Paper	-
dc.description.journalClass	1	-
dc.description.isOpenAccess	N	-
dc.description.journalRegisteredClass	scopus	-
dc.relation.journalResearchArea	Acoustics	-
dc.relation.journalResearchArea	Audiology & Speech-Language Pathology	-
dc.relation.journalResearchArea	Computer Science	-
dc.relation.journalResearchArea	Engineering	-
dc.relation.journalWebOfScienceCategory	Acoustics	-
dc.relation.journalWebOfScienceCategory	Audiology & Speech-Language Pathology	-
dc.relation.journalWebOfScienceCategory	Computer Science, Artificial Intelligence	-
dc.relation.journalWebOfScienceCategory	Engineering, Electrical & Electronic	-
dc.subject.keywordPlus	Convolutional neural networks	-
dc.subject.keywordPlus	Deep learning	-
dc.subject.keywordPlus	Speech communication	-
dc.subject.keywordPlus	Speech recognition	-
dc.subject.keywordPlus	Acoustics model	-
dc.subject.keywordPlus	Automatic speech recognition	-
dc.subject.keywordPlus	Automatic speech recognition system	-
dc.subject.keywordPlus	HMM based hybrid automatic speech recognition	-
dc.subject.keywordPlus	HMM-based	-
dc.subject.keywordPlus	Regularization methods	-
dc.subject.keywordPlus	Robust speech recognition	-
dc.subject.keywordPlus	Sparse features	-
dc.subject.keywordPlus	Transformer	-
dc.subject.keywordPlus	Transformer modeling	-
dc.subject.keywordAuthor	Acoustic Model	-
dc.subject.keywordAuthor	HMM based hybrid ASR	-
dc.subject.keywordAuthor	Sparse Feature	-
dc.subject.keywordAuthor	Speech Recognition	-
dc.subject.keywordAuthor	Transformer	-
dc.identifier.url	https://www.isca-speech.org/archive/interspeech_2022/lee22b_interspeech.html	-

Files in This Item: Go to Link

Appears in Collections: 서울 공과대학 > 서울 융합전자공학부 > 1. Journal Articles

Show simple item record

qrcode

Related Researcher

Researcher Chang, Joon-Hyuk photo

Chang, Joon-Hyuk: COLLEGE OF ENGINEERING (SCHOOL OF ELECTRONIC ENGINEERING)

Read more

Altmetrics

Total Views & Downloads

STATISTICS: Total View :6,007,935; Today View :34,677

RSS_1.0 RSS_2.0 ATOM_1.0

222, Wangsimni-ro, Seongdong-gu, Seoul, 04763, Korea+82-2-2220-1365

Certain data included herein are derived from the © Web of Science of Clarivate Analytics. All rights reserved.
You may not copy or re-distribute this material in whole or in part without the prior written consent of Clarivate Analytics.

Detailed Information

Related Researcher

Altmetrics

Total Views & Downloads

BROWSE