Cited 0 time in
UNDERSTANDING THE ROLE OF SELF ATTENTION FOR EFFICIENT SPEECH RECOGNITION
| DC Field | Value | Language |
|---|---|---|
| dc.contributor.author | Shim, Kyuhong | - |
| dc.contributor.author | Choi, Jung wook | - |
| dc.contributor.author | Sung, Wonyong | - |
| dc.date.accessioned | 2023-05-03T09:39:50Z | - |
| dc.date.available | 2023-05-03T09:39:50Z | - |
| dc.date.created | 2023-04-06 | - |
| dc.date.issued | 2022-04 | - |
| dc.identifier.uri | https://scholarworks.bwise.kr/hanyang/handle/2021.sw.hanyang/184848 | - |
| dc.description.abstract | Self-attention (SA) is a critical component of Transformer neural networks that have succeeded in automatic speech recognition (ASR). In this paper, we analyze the role of SA in Transformer-based ASR models for not only understanding the mechanism of improved recognition accuracy but also lowering the computational complexity. We reveal that SA performs two distinct roles: phonetic and linguistic localization. Especially, we show by experiments that phonetic localization in the lower layers extracts phonologically meaningful features from speech and reduces the phonetic variance in the utterance for proper linguistic localization in the upper layers. From this understanding, we discover that attention maps can be reused as long as their localization capability is preserved. To evaluate this idea, we implement the layer-wise attention map reuse on real GPU platforms and achieve up to 1.96 times speedup in inference and 33% savings in training time with noticeably improved ASR performance for the challenging benchmark on LibriSpeech dev/test-other dataset. | - |
| dc.language | 영어 | - |
| dc.language.iso | en | - |
| dc.publisher | International Conference on Learning Representations, ICLR | - |
| dc.title | UNDERSTANDING THE ROLE OF SELF ATTENTION FOR EFFICIENT SPEECH RECOGNITION | - |
| dc.type | Article | - |
| dc.contributor.affiliatedAuthor | Choi, Jung wook | - |
| dc.identifier.scopusid | 2-s2.0-85150355364 | - |
| dc.identifier.bibliographicCitation | ICLR 2022 - 10th International Conference on Learning Representations, pp.1 - 19 | - |
| dc.relation.isPartOf | ICLR 2022 - 10th International Conference on Learning Representations | - |
| dc.citation.title | ICLR 2022 - 10th International Conference on Learning Representations | - |
| dc.citation.startPage | 1 | - |
| dc.citation.endPage | 19 | - |
| dc.type.rims | ART | - |
| dc.type.docType | Conference Paper | - |
| dc.description.journalClass | 1 | - |
| dc.description.isOpenAccess | N | - |
| dc.description.journalRegisteredClass | scopus | - |
| dc.subject.keywordPlus | Benchmarking | - |
| dc.subject.keywordPlus | Speech recognition | - |
| dc.subject.keywordPlus | Linguistics | - |
| dc.subject.keywordPlus | Automatic speech recognition | - |
| dc.subject.keywordPlus | Critical component | - |
| dc.subject.keywordPlus | Layer-wise | - |
| dc.subject.keywordPlus | Localisation | - |
| dc.subject.keywordPlus | Neural-networks | - |
| dc.subject.keywordPlus | Recognition accuracy | - |
| dc.subject.keywordPlus | Recognition models | - |
| dc.subject.keywordPlus | Reuse | - |
| dc.subject.keywordPlus | Training time | - |
| dc.subject.keywordPlus | Upper layer | - |
| dc.identifier.url | https://openreview.net/forum?id=AvcfxqRy4Y | - |
Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.
222, Wangsimni-ro, Seongdong-gu, Seoul, 04763, Korea+82-2-2220-1366
COPYRIGHT © 2024 HANYANG UNIVERSITY.
Certain data included herein are derived from the © Web of Science of Clarivate Analytics. All rights reserved.
You may not copy or re-distribute this material in whole or in part without the prior written consent of Clarivate Analytics.
