Detailed Information

Cited 0 time in webofscience Cited 1 time in scopus
Metadata Downloads

DNN based multi-speaker speech synthesis with temporal auxiliary speaker ID embedding

Authors
Lee, JunmoSong, KwangsubNoh, KyoungjinPark, Tae-JunChang, Joon-Hyuk
Issue Date
May-2019
Publisher
Institute of Electrical and Electronics Engineers Inc.
Keywords
Deep learning; Multi speaker speech synthesis; Sequence to sequence; Speech synthesis
Citation
ICEIC 2019 - International Conference on Electronics, Information, and Communication, pp.1 - 4
Indexed
SCOPUS
Journal Title
ICEIC 2019 - International Conference on Electronics, Information, and Communication
Start Page
1
End Page
4
URI
https://scholarworks.bwise.kr/hanyang/handle/2021.sw.hanyang/4572
DOI
10.23919/ELINFOCOM.2019.8706390
ISSN
0000-0000
Abstract
In this paper, multi speaker speech synthesis using speaker embedding is proposed. The proposed model is based on Tacotron network, but post-processing network of the model is modified with dilated convolution layers, which used in Wavenet architecture, to make it more adaptive to speech. The model can generate multi speaker voice with only one neural network model by giving auxiliary input data, speaker embedding, to the network. This model shows successful result for generating two speaker's voices without significant deterioration of speech quality.
Files in This Item
Go to Link
Appears in
Collections
서울 공과대학 > 서울 융합전자공학부 > 1. Journal Articles

qrcode

Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.

Related Researcher

Researcher Chang, Joon-Hyuk photo

Chang, Joon-Hyuk
COLLEGE OF ENGINEERING (SCHOOL OF ELECTRONIC ENGINEERING)
Read more

Altmetrics

Total Views & Downloads

BROWSE