Cited 0 time in
Partitioning Attention Weight: Mitigating Adverse Effect of Incorrect Pseudo-labels for Self-Supervised ASR
| DC Field | Value | Language |
|---|---|---|
| dc.contributor.author | Lee, Jae-Hong | - |
| dc.contributor.author | Chang, Joon-Hyuk | - |
| dc.date.accessioned | 2024-11-28T13:31:21Z | - |
| dc.date.available | 2024-11-28T13:31:21Z | - |
| dc.date.issued | 2023-12 | - |
| dc.identifier.issn | 2329-9290 | - |
| dc.identifier.issn | 2329-9304 | - |
| dc.identifier.uri | https://scholarworks.bwise.kr/hanyang/handle/2021.sw.hanyang/196609 | - |
| dc.description.abstract | The performance of automatic speech recognition (ASR) models has been significantly improved owing to advances in deep learning and end-to-end approaches. However, these require a large amount of labeled data, which are expensive to obtain. Semi-supervised learning techniques, such as pseudo-labeling and self-supervised learning, have emerged as potential solutions to reduce the reliance on labeled data. Recently, some studies have combined self-supervised learning and pseudo-labeling to further enhance ASR performance. However, these methods suffer from incorrect pseudo-labels that propagate errors and reduce ASR performance. In this paper, we propose a novel method called partitioning attention weight (PAW) to mitigate the adverse effects of incorrect labels without requiring additional language models. Our proposed method isolates audio segments by partitioning a fully connected attention weight into sub-attention weights to prevent adverse effects that the model learns the wrong context for the entire attention weights from incorrect labels as well as overfitting. The proposed method is simple, requiring few changes to existing learning frameworks, and leverages the alignment information obtained during the pseudo-labeling process. Our experimental results show consistent performance improvements in ASR performance across various semi-supervised learning scenarios. | - |
| dc.format.extent | 15 | - |
| dc.language | 영어 | - |
| dc.language.iso | ENG | - |
| dc.publisher | IEEE Advancing Technology for Humanity | - |
| dc.title | Partitioning Attention Weight: Mitigating Adverse Effect of Incorrect Pseudo-labels for Self-Supervised ASR | - |
| dc.type | Article | - |
| dc.publisher.location | 미국 | - |
| dc.identifier.doi | 10.1109/TASLP.2023.3343615 | - |
| dc.identifier.scopusid | 2-s2.0-85180358086 | - |
| dc.identifier.wosid | 001134412800001 | - |
| dc.identifier.bibliographicCitation | IEEE/ACM Transactions on Audio, Speech, and Language Processing, v.32, pp 891 - 905 | - |
| dc.citation.title | IEEE/ACM Transactions on Audio, Speech, and Language Processing | - |
| dc.citation.volume | 32 | - |
| dc.citation.startPage | 891 | - |
| dc.citation.endPage | 905 | - |
| dc.type.docType | Article | - |
| dc.description.isOpenAccess | N | - |
| dc.description.journalRegisteredClass | scie | - |
| dc.description.journalRegisteredClass | scopus | - |
| dc.relation.journalResearchArea | Acoustics | - |
| dc.relation.journalResearchArea | Engineering | - |
| dc.relation.journalWebOfScienceCategory | Acoustics | - |
| dc.relation.journalWebOfScienceCategory | Engineering, Electrical & Electronic | - |
| dc.subject.keywordPlus | DEEP-NEURAL-NETWORKS | - |
| dc.subject.keywordPlus | SPEECH RECOGNITION | - |
| dc.subject.keywordAuthor | Computational modeling | - |
| dc.subject.keywordAuthor | Data augmentation | - |
| dc.subject.keywordAuthor | Data models | - |
| dc.subject.keywordAuthor | end-to-end speech recognition | - |
| dc.subject.keywordAuthor | pseudo-labeling | - |
| dc.subject.keywordAuthor | self-supervised learning | - |
| dc.subject.keywordAuthor | Self-supervised learning | - |
| dc.subject.keywordAuthor | self-training | - |
| dc.subject.keywordAuthor | Semi-supervised learning | - |
| dc.subject.keywordAuthor | Semisupervised learning | - |
| dc.subject.keywordAuthor | Task analysis | - |
| dc.subject.keywordAuthor | Transformers | - |
| dc.identifier.url | https://ieeexplore.ieee.org/document/10361275 | - |
Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.
222, Wangsimni-ro, Seongdong-gu, Seoul, 04763, Korea+82-2-2220-1366
COPYRIGHT © 2024 HANYANG UNIVERSITY.
Certain data included herein are derived from the © Web of Science of Clarivate Analytics. All rights reserved.
You may not copy or re-distribute this material in whole or in part without the prior written consent of Clarivate Analytics.
