Enhancing Target-speaker Automatic Speech Recognition Using Multiple Speaker Embedding Extractors with Virtual Speaker Embedding

Seong, Ju-Seok; Choi, Jeong-Hwan; Jeoung, Ye-Rin; Kim, Ilseok; Chang, Joon-Hyuk

doi:10.21437/Interspeech.2025-2486

Detailed Information

Cited 0 time in webofscience

Cited 0 time in scopus

Metadata Downloads

Enhancing Target-speaker Automatic Speech Recognition Using Multiple Speaker Embedding Extractors with Virtual Speaker Embedding

Authors: Seong, Ju-Seok; Choi, Jeong-Hwan; Jeoung, Ye-Rin; Kim, Ilseok; Chang, Joon-Hyuk

Issue Date: Aug-2025

Publisher: International Speech Communication Association

Keywords: lightweight speaker embedding extractor; speaker embedding; Target-speaker automatic speech recognition; virtual speaker embedding

Citation: Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, pp 4918 - 4922

Pages: 5

Indexed: SCOPUS

Journal Title: Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH

Start Page: 4918

End Page: 4922

URI: https://scholarworks.bwise.kr/hanyang/handle/2021.sw.hanyang/209226

DOI: 10.21437/Interspeech.2025-2486

ISSN: 2958-1796

Abstract: Target-speaker automatic speech recognition (TS-ASR) utilizes speaker embeddings to identify a target speaker in multi-talker environments. While high-performance speaker embedding extractors provide discriminative embeddings, their computational demands limit practical deployment. In this study, we present two novel methods that effectively utilize lightweight extractors to enhance TS-ASR performance. First, we propose a multiple embeddings modulation that effectively transfers comprehensive speaker information to the ASR module, thereby improving overall performance and robustness against embedding variations. Second, we present a virtual speaker embedding augmentation technique that synthesizes embeddings of unseen speakers, reducing dependence on specific extractors while enhancing independent contributions from each extractor. Experimental results on the Libri2Mix dataset demonstrate that our proposed methods achieve significant WER reductions compared to the baseline model.

Files in This Item: Go to Link

Appears in Collections: 서울 공과대학 > 서울 융합전자공학부 > 1. Journal Articles

Show full item record

qrcode

Related Researcher

Researcher Chang, Joon-Hyuk photo

Chang, Joon-Hyuk: COLLEGE OF ENGINEERING (SCHOOL OF ELECTRONIC ENGINEERING)

Read more

Altmetrics

Total Views & Downloads

RSS_1.0 RSS_2.0 ATOM_1.0

222, Wangsimni-ro, Seongdong-gu, Seoul, 04763, Korea+82-2-2220-1366

Certain data included herein are derived from the © Web of Science of Clarivate Analytics. All rights reserved.
You may not copy or re-distribute this material in whole or in part without the prior written consent of Clarivate Analytics.

Detailed Information

Related Researcher

Altmetrics

Total Views & Downloads

BROWSE