Enhancing Target-speaker Automatic Speech Recognition Using Multiple Speaker Embedding Extractors with Virtual Speaker Embedding
- Authors
- Seong, Ju-Seok; Choi, Jeong-Hwan; Jeoung, Ye-Rin; Kim, Ilseok; Chang, Joon-Hyuk
- Issue Date
- Aug-2025
- Publisher
- International Speech Communication Association
- Keywords
- lightweight speaker embedding extractor; speaker embedding; Target-speaker automatic speech recognition; virtual speaker embedding
- Citation
- Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, pp 4918 - 4922
- Pages
- 5
- Indexed
- SCOPUS
- Journal Title
- Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
- Start Page
- 4918
- End Page
- 4922
- URI
- https://scholarworks.bwise.kr/hanyang/handle/2021.sw.hanyang/209226
- DOI
- 10.21437/Interspeech.2025-2486
- ISSN
- 2958-1796
- Abstract
- Target-speaker automatic speech recognition (TS-ASR) utilizes speaker embeddings to identify a target speaker in multi-talker environments. While high-performance speaker embedding extractors provide discriminative embeddings, their computational demands limit practical deployment. In this study, we present two novel methods that effectively utilize lightweight extractors to enhance TS-ASR performance. First, we propose a multiple embeddings modulation that effectively transfers comprehensive speaker information to the ASR module, thereby improving overall performance and robustness against embedding variations. Second, we present a virtual speaker embedding augmentation technique that synthesizes embeddings of unseen speakers, reducing dependence on specific extractors while enhancing independent contributions from each extractor. Experimental results on the Libri2Mix dataset demonstrate that our proposed methods achieve significant WER reductions compared to the baseline model.
- Files in This Item
-
Go to Link
- Appears in
Collections - 서울 공과대학 > 서울 융합전자공학부 > 1. Journal Articles

Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.