Detailed Information

Cited 0 time in webofscience Cited 0 time in scopus
Metadata Downloads

Zero-Shot Voice Cloning Using Variational Embedding with Attention Mechanism

Full metadata record
DC Field Value Language
dc.contributor.authorLee, Jaeuk-
dc.contributor.authorKim, Jiye-
dc.contributor.authorChang, Joon-Hyuk-
dc.date.accessioned2022-07-06T10:38:19Z-
dc.date.available2022-07-06T10:38:19Z-
dc.date.created2022-03-07-
dc.date.issued2022-01-
dc.identifier.issn0000-0000-
dc.identifier.urihttps://scholarworks.bwise.kr/hanyang/handle/2021.sw.hanyang/139794-
dc.description.abstractMany voice cloning studies based on multi-speaker text-to-speech (TTS) have been conducted. Among the techniques of voice cloning, we focus on zero-shot voice cloning. The most important aspect of zero-shot voice cloning is which speaker embedding is used. In this study, two types of speaker embeddings are used. One is extracted from the mel spectrogram using a speaker encoder and the other is stored in an embedding dictionary, such as a vector quantized-variational autoencoder (VQ-VAE). To extract embedding from the embedding dictionary, an attention mechanism is applied, which we call attention- V AE (AT - V AE). By employing the embedding extracted by the speaker encoder as a query in the attention mechanism, the attention weights are calculated in the embedding dictionary. This mechanism allows the extraction of speaker embedding, which represents unseen speakers. In addition, training is applied to make our model robust to unseen speakers. Through the training stage, our system has developed further. The performance of the proposed method was validated in terms of various metrics, and it was demonstrated that the proposed method enables voice cloning without adaptation training.-
dc.language영어-
dc.language.isoen-
dc.publisherInstitute of Electrical and Electronics Engineers Inc.-
dc.titleZero-Shot Voice Cloning Using Variational Embedding with Attention Mechanism-
dc.typeArticle-
dc.contributor.affiliatedAuthorChang, Joon-Hyuk-
dc.identifier.doi10.1109/IC-NIDC54101.2021.9660599-
dc.identifier.scopusid2-s2.0-85124799580-
dc.identifier.bibliographicCitationProceedings of 2021 7th IEEE International Conference on Network Intelligence and Digital Content, IC-NIDC 2021, pp.344 - 348-
dc.relation.isPartOfProceedings of 2021 7th IEEE International Conference on Network Intelligence and Digital Content, IC-NIDC 2021-
dc.citation.titleProceedings of 2021 7th IEEE International Conference on Network Intelligence and Digital Content, IC-NIDC 2021-
dc.citation.startPage344-
dc.citation.endPage348-
dc.type.rimsART-
dc.type.docTypeConference Paper-
dc.description.journalClass1-
dc.description.isOpenAccessN-
dc.description.journalRegisteredClassscopus-
dc.subject.keywordPlusCloning-
dc.subject.keywordPlusSignal encoding-
dc.subject.keywordPlusAttention mechanisms-
dc.subject.keywordPlusAuto encoders-
dc.subject.keywordPlusEmbeddings-
dc.subject.keywordPlusGlobal style token-
dc.subject.keywordPlusMulti-speaker-
dc.subject.keywordPlusPerformance-
dc.subject.keywordPlusSpectrograms-
dc.subject.keywordPlusText to speech-
dc.subject.keywordPlusVoice cloning-
dc.subject.keywordPlusEmbeddings-
dc.subject.keywordAuthorGlobal style token-
dc.subject.keywordAuthorMulti-speaker-
dc.subject.keywordAuthorText-to-speech-
dc.subject.keywordAuthorVoice cloning-
dc.identifier.urlhttps://ieeexplore.ieee.org/document/9660599-
Files in This Item
Go to Link
Appears in
Collections
서울 공과대학 > 서울 융합전자공학부 > 1. Journal Articles

qrcode

Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.

Related Researcher

Researcher Chang, Joon-Hyuk photo

Chang, Joon-Hyuk
COLLEGE OF ENGINEERING (SCHOOL OF ELECTRONIC ENGINEERING)
Read more

Altmetrics

Total Views & Downloads

BROWSE