Zero-Shot Voice Cloning Using Variational Embedding with Attention Mechanism

Lee, Jaeuk; Kim, Jiye; Chang, Joon-Hyuk

doi:10.1109/IC-NIDC54101.2021.9660599

Detailed Information

Cited 0 time in webofscience

Cited 0 time in scopus

Metadata Downloads

Zero-Shot Voice Cloning Using Variational Embedding with Attention Mechanism

Full metadata record

DC Field	Value	Language
dc.contributor.author	Lee, Jaeuk	-
dc.contributor.author	Kim, Jiye	-
dc.contributor.author	Chang, Joon-Hyuk	-
dc.date.accessioned	2022-07-06T10:38:19Z	-
dc.date.available	2022-07-06T10:38:19Z	-
dc.date.created	2022-03-07	-
dc.date.issued	2022-01	-
dc.identifier.issn	0000-0000	-
dc.identifier.uri	https://scholarworks.bwise.kr/hanyang/handle/2021.sw.hanyang/139794	-
dc.description.abstract	Many voice cloning studies based on multi-speaker text-to-speech (TTS) have been conducted. Among the techniques of voice cloning, we focus on zero-shot voice cloning. The most important aspect of zero-shot voice cloning is which speaker embedding is used. In this study, two types of speaker embeddings are used. One is extracted from the mel spectrogram using a speaker encoder and the other is stored in an embedding dictionary, such as a vector quantized-variational autoencoder (VQ-VAE). To extract embedding from the embedding dictionary, an attention mechanism is applied, which we call attention- V AE (AT - V AE). By employing the embedding extracted by the speaker encoder as a query in the attention mechanism, the attention weights are calculated in the embedding dictionary. This mechanism allows the extraction of speaker embedding, which represents unseen speakers. In addition, training is applied to make our model robust to unseen speakers. Through the training stage, our system has developed further. The performance of the proposed method was validated in terms of various metrics, and it was demonstrated that the proposed method enables voice cloning without adaptation training.	-
dc.language	영어	-
dc.language.iso	en	-
dc.publisher	Institute of Electrical and Electronics Engineers Inc.	-
dc.title	Zero-Shot Voice Cloning Using Variational Embedding with Attention Mechanism	-
dc.type	Article	-
dc.contributor.affiliatedAuthor	Chang, Joon-Hyuk	-
dc.identifier.doi	10.1109/IC-NIDC54101.2021.9660599	-
dc.identifier.scopusid	2-s2.0-85124799580	-
dc.identifier.bibliographicCitation	Proceedings of 2021 7th IEEE International Conference on Network Intelligence and Digital Content, IC-NIDC 2021, pp.344 - 348	-
dc.relation.isPartOf	Proceedings of 2021 7th IEEE International Conference on Network Intelligence and Digital Content, IC-NIDC 2021	-
dc.citation.title	Proceedings of 2021 7th IEEE International Conference on Network Intelligence and Digital Content, IC-NIDC 2021	-
dc.citation.startPage	344	-
dc.citation.endPage	348	-
dc.type.rims	ART	-
dc.type.docType	Conference Paper	-
dc.description.journalClass	1	-
dc.description.isOpenAccess	N	-
dc.description.journalRegisteredClass	scopus	-
dc.subject.keywordPlus	Cloning	-
dc.subject.keywordPlus	Signal encoding	-
dc.subject.keywordPlus	Attention mechanisms	-
dc.subject.keywordPlus	Auto encoders	-
dc.subject.keywordPlus	Embeddings	-
dc.subject.keywordPlus	Global style token	-
dc.subject.keywordPlus	Multi-speaker	-
dc.subject.keywordPlus	Performance	-
dc.subject.keywordPlus	Spectrograms	-
dc.subject.keywordPlus	Text to speech	-
dc.subject.keywordPlus	Voice cloning	-
dc.subject.keywordPlus	Embeddings	-
dc.subject.keywordAuthor	Global style token	-
dc.subject.keywordAuthor	Multi-speaker	-
dc.subject.keywordAuthor	Text-to-speech	-
dc.subject.keywordAuthor	Voice cloning	-
dc.identifier.url	https://ieeexplore.ieee.org/document/9660599	-

Files in This Item: Go to Link

Appears in Collections: 서울 공과대학 > 서울 융합전자공학부 > 1. Journal Articles

Show simple item record

qrcode

Related Researcher

Researcher Chang, Joon-Hyuk photo

Chang, Joon-Hyuk: COLLEGE OF ENGINEERING (SCHOOL OF ELECTRONIC ENGINEERING)

Read more

Altmetrics

Total Views & Downloads

STATISTICS: Total View :5,967,280; Today View :4,817

RSS_1.0 RSS_2.0 ATOM_1.0

222, Wangsimni-ro, Seongdong-gu, Seoul, 04763, Korea+82-2-2220-1365

Certain data included herein are derived from the © Web of Science of Clarivate Analytics. All rights reserved.
You may not copy or re-distribute this material in whole or in part without the prior written consent of Clarivate Analytics.

Detailed Information

Related Researcher

Altmetrics

Total Views & Downloads

BROWSE