Partitioning Attention Weight: Mitigating Adverse Effect of Incorrect Pseudo-labels for Self-Supervised ASR

Lee, Jae-Hong; Chang, Joon-Hyuk

doi:10.1109/TASLP.2023.3343615

Detailed Information

Cited 0 time in webofscience

Cited 0 time in scopus

Metadata Downloads

Partitioning Attention Weight: Mitigating Adverse Effect of Incorrect Pseudo-labels for Self-Supervised ASR

Authors: Lee, Jae-Hong; Chang, Joon-Hyuk

Issue Date: Dec-2023

Publisher: IEEE Advancing Technology for Humanity

Keywords: Computational modeling; Data augmentation; Data models; end-to-end speech recognition; pseudo-labeling; self-supervised learning; Self-supervised learning; self-training; Semi-supervised learning; Semisupervised learning; Task analysis; Transformers

Citation: IEEE/ACM Transactions on Audio, Speech, and Language Processing, v.32, pp 891 - 905

Pages: 15

Indexed: SCIE
SCOPUS

Journal Title: IEEE/ACM Transactions on Audio, Speech, and Language Processing

Volume: 32

Start Page: 891

End Page: 905

URI: https://scholarworks.bwise.kr/hanyang/handle/2021.sw.hanyang/196609

DOI: 10.1109/TASLP.2023.3343615

ISSN: 2329-9290
2329-9304

Abstract: The performance of automatic speech recognition (ASR) models has been significantly improved owing to advances in deep learning and end-to-end approaches. However, these require a large amount of labeled data, which are expensive to obtain. Semi-supervised learning techniques, such as pseudo-labeling and self-supervised learning, have emerged as potential solutions to reduce the reliance on labeled data. Recently, some studies have combined self-supervised learning and pseudo-labeling to further enhance ASR performance. However, these methods suffer from incorrect pseudo-labels that propagate errors and reduce ASR performance. In this paper, we propose a novel method called partitioning attention weight (PAW) to mitigate the adverse effects of incorrect labels without requiring additional language models. Our proposed method isolates audio segments by partitioning a fully connected attention weight into sub-attention weights to prevent adverse effects that the model learns the wrong context for the entire attention weights from incorrect labels as well as overfitting. The proposed method is simple, requiring few changes to existing learning frameworks, and leverages the alignment information obtained during the pseudo-labeling process. Our experimental results show consistent performance improvements in ASR performance across various semi-supervised learning scenarios.

Files in This Item: Go to Link

Appears in Collections: 서울 공과대학 > 서울 융합전자공학부 > 1. Journal Articles

Show full item record

qrcode

Related Researcher

Researcher Chang, Joon-Hyuk photo

Chang, Joon-Hyuk: COLLEGE OF ENGINEERING (SCHOOL OF ELECTRONIC ENGINEERING)

Read more

Altmetrics

Total Views & Downloads

RSS_1.0 RSS_2.0 ATOM_1.0

222, Wangsimni-ro, Seongdong-gu, Seoul, 04763, Korea+82-2-2220-1366

Certain data included herein are derived from the © Web of Science of Clarivate Analytics. All rights reserved.
You may not copy or re-distribute this material in whole or in part without the prior written consent of Clarivate Analytics.

Detailed Information

Related Researcher

Altmetrics

Total Views & Downloads

BROWSE