Zero-Shot Voice Cloning Using Variational Embedding with Attention Mechanism
DC Field | Value | Language |
---|---|---|
dc.contributor.author | Lee, Jaeuk | - |
dc.contributor.author | Kim, Jiye | - |
dc.contributor.author | Chang, Joon-Hyuk | - |
dc.date.accessioned | 2022-07-06T10:38:19Z | - |
dc.date.available | 2022-07-06T10:38:19Z | - |
dc.date.created | 2022-03-07 | - |
dc.date.issued | 2022-01 | - |
dc.identifier.issn | 0000-0000 | - |
dc.identifier.uri | https://scholarworks.bwise.kr/hanyang/handle/2021.sw.hanyang/139794 | - |
dc.description.abstract | Many voice cloning studies based on multi-speaker text-to-speech (TTS) have been conducted. Among the techniques of voice cloning, we focus on zero-shot voice cloning. The most important aspect of zero-shot voice cloning is which speaker embedding is used. In this study, two types of speaker embeddings are used. One is extracted from the mel spectrogram using a speaker encoder and the other is stored in an embedding dictionary, such as a vector quantized-variational autoencoder (VQ-VAE). To extract embedding from the embedding dictionary, an attention mechanism is applied, which we call attention- V AE (AT - V AE). By employing the embedding extracted by the speaker encoder as a query in the attention mechanism, the attention weights are calculated in the embedding dictionary. This mechanism allows the extraction of speaker embedding, which represents unseen speakers. In addition, training is applied to make our model robust to unseen speakers. Through the training stage, our system has developed further. The performance of the proposed method was validated in terms of various metrics, and it was demonstrated that the proposed method enables voice cloning without adaptation training. | - |
dc.language | 영어 | - |
dc.language.iso | en | - |
dc.publisher | Institute of Electrical and Electronics Engineers Inc. | - |
dc.title | Zero-Shot Voice Cloning Using Variational Embedding with Attention Mechanism | - |
dc.type | Article | - |
dc.contributor.affiliatedAuthor | Chang, Joon-Hyuk | - |
dc.identifier.doi | 10.1109/IC-NIDC54101.2021.9660599 | - |
dc.identifier.scopusid | 2-s2.0-85124799580 | - |
dc.identifier.bibliographicCitation | Proceedings of 2021 7th IEEE International Conference on Network Intelligence and Digital Content, IC-NIDC 2021, pp.344 - 348 | - |
dc.relation.isPartOf | Proceedings of 2021 7th IEEE International Conference on Network Intelligence and Digital Content, IC-NIDC 2021 | - |
dc.citation.title | Proceedings of 2021 7th IEEE International Conference on Network Intelligence and Digital Content, IC-NIDC 2021 | - |
dc.citation.startPage | 344 | - |
dc.citation.endPage | 348 | - |
dc.type.rims | ART | - |
dc.type.docType | Conference Paper | - |
dc.description.journalClass | 1 | - |
dc.description.isOpenAccess | N | - |
dc.description.journalRegisteredClass | scopus | - |
dc.subject.keywordPlus | Cloning | - |
dc.subject.keywordPlus | Signal encoding | - |
dc.subject.keywordPlus | Attention mechanisms | - |
dc.subject.keywordPlus | Auto encoders | - |
dc.subject.keywordPlus | Embeddings | - |
dc.subject.keywordPlus | Global style token | - |
dc.subject.keywordPlus | Multi-speaker | - |
dc.subject.keywordPlus | Performance | - |
dc.subject.keywordPlus | Spectrograms | - |
dc.subject.keywordPlus | Text to speech | - |
dc.subject.keywordPlus | Voice cloning | - |
dc.subject.keywordPlus | Embeddings | - |
dc.subject.keywordAuthor | Global style token | - |
dc.subject.keywordAuthor | Multi-speaker | - |
dc.subject.keywordAuthor | Text-to-speech | - |
dc.subject.keywordAuthor | Voice cloning | - |
dc.identifier.url | https://ieeexplore.ieee.org/document/9660599 | - |
Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.
222, Wangsimni-ro, Seongdong-gu, Seoul, 04763, Korea+82-2-2220-1365
COPYRIGHT © 2021 HANYANG UNIVERSITY.
Certain data included herein are derived from the © Web of Science of Clarivate Analytics. All rights reserved.
You may not copy or re-distribute this material in whole or in part without the prior written consent of Clarivate Analytics.