TEXT-ONLY UNSUPERVISED DOMAIN ADAPTATION FOR NEURAL TRANSDUCER-BASED ASR PERSONALIZATION USING SYNTHESIZED DATA
- Authors
- Kim, Dong-Hyun; Lee, Jae-Hong; Chang, Joon-Hyuk
- Issue Date
- Mar-2024
- Publisher
- Institute of Electrical and Electronics Engineers Inc.
- Keywords
- automatic speech recognition; neural transducer; personalization; synthesized data; unsupervised domain adaptation
- Citation
- ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings, pp 11131 - 11135
- Pages
- 5
- Indexed
- SCOPUS
- Journal Title
- ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
- Start Page
- 11131
- End Page
- 11135
- URI
- https://scholarworks.bwise.kr/hanyang/handle/2021.sw.hanyang/211633
- DOI
- 10.1109/ICASSP48485.2024.10446454
- ISSN
- 1520-6149
2379-190X
- Abstract
- Research on personalizing neural transducer-based automatic speech recognition (ASR) systems using the text-only data is currently flourishing. Among various approaches, utilizing synthesized speech offers an advantage of adapting the entire ASR system. In this study, we explore the problem of personalization from a domain adaptation perspective and highlight the potential risk of overfitting associated with synthesized speech. To mitigate this risk, we propose the text-only unsupervised domain adaptation (ToUDA) strategy that robustly finetunes the generic ASR model on synthesized speech by incorporating parameter-averaging over time, model freezing, and filtering out-of-distribution instances. Via various experiments, we not only showcase the effectiveness of our approach but also uncover a noteworthy limitation when it comes to personalizing atypical speech.
- Files in This Item
-
Go to Link
- Appears in
Collections - 서울 공과대학 > 서울 융합전자공학부 > 1. Journal Articles

Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.