UNDERSTANDING THE ROLE OF SELF ATTENTION FOR EFFICIENT SPEECH RECOGNITION

Shim, Kyuhong; Choi, Jung wook; Sung, Wonyong

Detailed Information

Cited 0 time in webofscience

Cited 0 time in scopus

Metadata Downloads

UNDERSTANDING THE ROLE OF SELF ATTENTION FOR EFFICIENT SPEECH RECOGNITION

Full metadata record

DC Field	Value	Language
dc.contributor.author	Shim, Kyuhong	-
dc.contributor.author	Choi, Jung wook	-
dc.contributor.author	Sung, Wonyong	-
dc.date.accessioned	2023-05-03T09:39:50Z	-
dc.date.available	2023-05-03T09:39:50Z	-
dc.date.created	2023-04-06	-
dc.date.issued	2022-04	-
dc.identifier.uri	https://scholarworks.bwise.kr/hanyang/handle/2021.sw.hanyang/184848	-
dc.description.abstract	Self-attention (SA) is a critical component of Transformer neural networks that have succeeded in automatic speech recognition (ASR). In this paper, we analyze the role of SA in Transformer-based ASR models for not only understanding the mechanism of improved recognition accuracy but also lowering the computational complexity. We reveal that SA performs two distinct roles: phonetic and linguistic localization. Especially, we show by experiments that phonetic localization in the lower layers extracts phonologically meaningful features from speech and reduces the phonetic variance in the utterance for proper linguistic localization in the upper layers. From this understanding, we discover that attention maps can be reused as long as their localization capability is preserved. To evaluate this idea, we implement the layer-wise attention map reuse on real GPU platforms and achieve up to 1.96 times speedup in inference and 33% savings in training time with noticeably improved ASR performance for the challenging benchmark on LibriSpeech dev/test-other dataset.	-
dc.language	영어	-
dc.language.iso	en	-
dc.publisher	International Conference on Learning Representations, ICLR	-
dc.title	UNDERSTANDING THE ROLE OF SELF ATTENTION FOR EFFICIENT SPEECH RECOGNITION	-
dc.type	Article	-
dc.contributor.affiliatedAuthor	Choi, Jung wook	-
dc.identifier.scopusid	2-s2.0-85150355364	-
dc.identifier.bibliographicCitation	ICLR 2022 - 10th International Conference on Learning Representations, pp.1 - 19	-
dc.relation.isPartOf	ICLR 2022 - 10th International Conference on Learning Representations	-
dc.citation.title	ICLR 2022 - 10th International Conference on Learning Representations	-
dc.citation.startPage	1	-
dc.citation.endPage	19	-
dc.type.rims	ART	-
dc.type.docType	Conference Paper	-
dc.description.journalClass	1	-
dc.description.isOpenAccess	N	-
dc.description.journalRegisteredClass	scopus	-
dc.subject.keywordPlus	Benchmarking	-
dc.subject.keywordPlus	Speech recognition	-
dc.subject.keywordPlus	Linguistics	-
dc.subject.keywordPlus	Automatic speech recognition	-
dc.subject.keywordPlus	Critical component	-
dc.subject.keywordPlus	Layer-wise	-
dc.subject.keywordPlus	Localisation	-
dc.subject.keywordPlus	Neural-networks	-
dc.subject.keywordPlus	Recognition accuracy	-
dc.subject.keywordPlus	Recognition models	-
dc.subject.keywordPlus	Reuse	-
dc.subject.keywordPlus	Training time	-
dc.subject.keywordPlus	Upper layer	-
dc.identifier.url	https://openreview.net/forum?id=AvcfxqRy4Y	-

Files in This Item: Go to Link

Appears in Collections: 서울 공과대학 > 서울 융합전자공학부 > 1. Journal Articles

Show simple item record

qrcode

Related Researcher

Researcher Choi, Jung wook photo

Choi, Jung wook: COLLEGE OF ENGINEERING (SCHOOL OF ELECTRONIC ENGINEERING)

Read more

Altmetrics

Total Views & Downloads

RSS_1.0 RSS_2.0 ATOM_1.0

222, Wangsimni-ro, Seongdong-gu, Seoul, 04763, Korea+82-2-2220-1366

Certain data included herein are derived from the © Web of Science of Clarivate Analytics. All rights reserved.
You may not copy or re-distribute this material in whole or in part without the prior written consent of Clarivate Analytics.

Detailed Information

Related Researcher

Altmetrics

Total Views & Downloads

BROWSE