Advanced Speaker Embedding with Predictive Variance of Gaussian Distribution for Speaker Adaptation in TTS
- Authors
- Lee, Jaeuk; Chang, Joon-Hyuk
- Issue Date
- Sep-2022
- Publisher
- International Speech Communication Association
- Keywords
- multi-speaker; speaker adaptation; voice cloning
- Citation
- Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, v.2022-September, pp.2988 - 2992
- Indexed
- SCOPUS
- Journal Title
- Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
- Volume
- 2022-September
- Start Page
- 2988
- End Page
- 2992
- URI
- https://scholarworks.bwise.kr/hanyang/handle/2021.sw.hanyang/173083
- DOI
- 10.21437/Interspeech.2022-10193
- ISSN
- 2308-457X
- Abstract
- Speaker adaptation in text-to-speech (TTS) has three goals: high-quality audio, requirement of a small amount of data for adapting to a new speaker, and fine-tuning few parameters for storage efficiency in commercial service of custom voice. In this paper, we introduce a novel adaptation method to achieve the aforementioned three goals. First, we estimate variances from a speaker embedding and add them back to the speaker embedding. Through this operation, the distribution of each speaker in latent space increases. Moreover, we design a prediction model that could generate a speaker embedding that approximately represents the new speaker's timbre. We can obtain a new speaker embedding well representing the timbre of a new speaker by the search process to the starting point of fine-tuning and the prediction model. We observe the performance change according to the number of fine-tuning parameters. Finally, we evaluate the proposed method using the mean opinion score (MOS) to demonstrate the remarkable performance of our proposed method.
- Files in This Item
-
Go to Link
- Appears in
Collections - 서울 공과대학 > 서울 융합전자공학부 > 1. Journal Articles
![qrcode](https://api.qrserver.com/v1/create-qr-code/?size=55x55&data=https://scholarworks.bwise.kr/hanyang/handle/2021.sw.hanyang/173083)
Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.