Zero-Shot Voice Cloning Using Variational Embedding with Attention Mechanism
- Authors
- Lee, Jaeuk; Kim, Jiye; Chang, Joon-Hyuk
- Issue Date
- Jan-2022
- Publisher
- Institute of Electrical and Electronics Engineers Inc.
- Keywords
- Global style token; Multi-speaker; Text-to-speech; Voice cloning
- Citation
- Proceedings of 2021 7th IEEE International Conference on Network Intelligence and Digital Content, IC-NIDC 2021, pp.344 - 348
- Indexed
- SCOPUS
- Journal Title
- Proceedings of 2021 7th IEEE International Conference on Network Intelligence and Digital Content, IC-NIDC 2021
- Start Page
- 344
- End Page
- 348
- URI
- https://scholarworks.bwise.kr/hanyang/handle/2021.sw.hanyang/139794
- DOI
- 10.1109/IC-NIDC54101.2021.9660599
- ISSN
- 0000-0000
- Abstract
- Many voice cloning studies based on multi-speaker text-to-speech (TTS) have been conducted. Among the techniques of voice cloning, we focus on zero-shot voice cloning. The most important aspect of zero-shot voice cloning is which speaker embedding is used. In this study, two types of speaker embeddings are used. One is extracted from the mel spectrogram using a speaker encoder and the other is stored in an embedding dictionary, such as a vector quantized-variational autoencoder (VQ-VAE). To extract embedding from the embedding dictionary, an attention mechanism is applied, which we call attention- V AE (AT - V AE). By employing the embedding extracted by the speaker encoder as a query in the attention mechanism, the attention weights are calculated in the embedding dictionary. This mechanism allows the extraction of speaker embedding, which represents unseen speakers. In addition, training is applied to make our model robust to unseen speakers. Through the training stage, our system has developed further. The performance of the proposed method was validated in terms of various metrics, and it was demonstrated that the proposed method enables voice cloning without adaptation training.
- Files in This Item
-
Go to Link
- Appears in
Collections - 서울 공과대학 > 서울 융합전자공학부 > 1. Journal Articles
![qrcode](https://api.qrserver.com/v1/create-qr-code/?size=55x55&data=https://scholarworks.bwise.kr/hanyang/handle/2021.sw.hanyang/139794)
Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.