DNN based multi-speaker speech synthesis with temporal auxiliary speaker ID embedding
DC Field | Value | Language |
---|---|---|
dc.contributor.author | Lee, Junmo | - |
dc.contributor.author | Song, Kwangsub | - |
dc.contributor.author | Noh, Kyoungjin | - |
dc.contributor.author | Park, Tae-Jun | - |
dc.contributor.author | Chang, Joon-Hyuk | - |
dc.date.accessioned | 2021-07-30T05:23:06Z | - |
dc.date.available | 2021-07-30T05:23:06Z | - |
dc.date.created | 2021-05-13 | - |
dc.date.issued | 2019-05 | - |
dc.identifier.issn | 0000-0000 | - |
dc.identifier.uri | https://scholarworks.bwise.kr/hanyang/handle/2021.sw.hanyang/4572 | - |
dc.description.abstract | In this paper, multi speaker speech synthesis using speaker embedding is proposed. The proposed model is based on Tacotron network, but post-processing network of the model is modified with dilated convolution layers, which used in Wavenet architecture, to make it more adaptive to speech. The model can generate multi speaker voice with only one neural network model by giving auxiliary input data, speaker embedding, to the network. This model shows successful result for generating two speaker's voices without significant deterioration of speech quality. | - |
dc.language | 영어 | - |
dc.language.iso | en | - |
dc.publisher | Institute of Electrical and Electronics Engineers Inc. | - |
dc.title | DNN based multi-speaker speech synthesis with temporal auxiliary speaker ID embedding | - |
dc.type | Article | - |
dc.contributor.affiliatedAuthor | Chang, Joon-Hyuk | - |
dc.identifier.doi | 10.23919/ELINFOCOM.2019.8706390 | - |
dc.identifier.scopusid | 2-s2.0-85065886857 | - |
dc.identifier.wosid | 000470015800014 | - |
dc.identifier.bibliographicCitation | ICEIC 2019 - International Conference on Electronics, Information, and Communication, pp.1 - 4 | - |
dc.relation.isPartOf | ICEIC 2019 - International Conference on Electronics, Information, and Communication | - |
dc.citation.title | ICEIC 2019 - International Conference on Electronics, Information, and Communication | - |
dc.citation.startPage | 1 | - |
dc.citation.endPage | 4 | - |
dc.type.rims | ART | - |
dc.type.docType | Conference Paper | - |
dc.description.journalClass | 1 | - |
dc.description.isOpenAccess | N | - |
dc.description.journalRegisteredClass | scopus | - |
dc.relation.journalResearchArea | Computer Science | - |
dc.relation.journalResearchArea | Telecommunications | - |
dc.relation.journalWebOfScienceCategory | Computer Science, Information Systems | - |
dc.relation.journalWebOfScienceCategory | Telecommunications | - |
dc.subject.keywordPlus | Deep learning | - |
dc.subject.keywordPlus | Deep neural networks | - |
dc.subject.keywordPlus | Deterioration | - |
dc.subject.keywordPlus | Embeddings | - |
dc.subject.keywordPlus | Auxiliary inputs | - |
dc.subject.keywordPlus | Neural network model | - |
dc.subject.keywordPlus | Post processing | - |
dc.subject.keywordPlus | Sequence to sequence | - |
dc.subject.keywordPlus | Significant deteriorations | - |
dc.subject.keywordPlus | Speaker id | - |
dc.subject.keywordPlus | Speech quality | - |
dc.subject.keywordPlus | Speech synthesis | - |
dc.subject.keywordAuthor | Deep learning | - |
dc.subject.keywordAuthor | Multi speaker speech synthesis | - |
dc.subject.keywordAuthor | Sequence to sequence | - |
dc.subject.keywordAuthor | Speech synthesis | - |
dc.identifier.url | https://ieeexplore.ieee.org/document/8706390 | - |
Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.
222, Wangsimni-ro, Seongdong-gu, Seoul, 04763, Korea+82-2-2220-1365
COPYRIGHT © 2021 HANYANG UNIVERSITY.
Certain data included herein are derived from the © Web of Science of Clarivate Analytics. All rights reserved.
You may not copy or re-distribute this material in whole or in part without the prior written consent of Clarivate Analytics.