Prior-free Guided TTS: An Improved and Efficient Diffusion-based Text-Guided Speech Synthesis
DC Field | Value | Language |
---|---|---|
dc.contributor.author | Choi, Won-Gook | - |
dc.contributor.author | Kim, So-Jeong | - |
dc.contributor.author | Kim, TaeHo | - |
dc.contributor.author | Chang, Joon-Hyuk | - |
dc.date.accessioned | 2023-10-10T02:36:23Z | - |
dc.date.available | 2023-10-10T02:36:23Z | - |
dc.date.created | 2023-10-04 | - |
dc.date.issued | 2023-08 | - |
dc.identifier.issn | 2308-457X | - |
dc.identifier.uri | https://scholarworks.bwise.kr/hanyang/handle/2021.sw.hanyang/191797 | - |
dc.description.abstract | Recently, diffusion models have exhibited higher sample quality with guidance, such as classifier guidance and classifier-free guidance. However, these guidances have limitations: they require extra classifiers or joint training, and incur additional sampling cost. In this study, we propose prior-free guidance diffusion model and prior-free guided text-to-speech (PfGuided-TTS) that can generate a speech at a quality as high as other guidances without extra training resources and computational cost. PfGuided-TTS can generate higher human perceptual quality speech than the existing autoregressive (AR) and non-AR models, including diffusion-based TTS on LJSpeech. In addition, we provide a schematic describing why and how classifier- and prior-free guided scores produce high-fidelity samples. | - |
dc.language | 영어 | - |
dc.language.iso | en | - |
dc.publisher | International Speech Communication Association | - |
dc.title | Prior-free Guided TTS: An Improved and Efficient Diffusion-based Text-Guided Speech Synthesis | - |
dc.type | Article | - |
dc.contributor.affiliatedAuthor | Chang, Joon-Hyuk | - |
dc.identifier.doi | 10.21437/Interspeech.2023-506 | - |
dc.identifier.scopusid | 2-s2.0-85171583207 | - |
dc.identifier.bibliographicCitation | Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, v.2023-August, pp.4289 - 4293 | - |
dc.relation.isPartOf | Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH | - |
dc.citation.title | Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH | - |
dc.citation.volume | 2023-August | - |
dc.citation.startPage | 4289 | - |
dc.citation.endPage | 4293 | - |
dc.type.rims | ART | - |
dc.type.docType | Conference paper | - |
dc.description.journalClass | 1 | - |
dc.description.isOpenAccess | N | - |
dc.description.journalRegisteredClass | scopus | - |
dc.subject.keywordPlus | Speech communication | - |
dc.subject.keywordPlus | Speech synthesis | - |
dc.subject.keywordPlus | Additional sampling | - |
dc.subject.keywordPlus | Auto-regressive | - |
dc.subject.keywordPlus | Autoregressive modelling | - |
dc.subject.keywordPlus | Computational costs | - |
dc.subject.keywordPlus | Diffusion model | - |
dc.subject.keywordPlus | Guided score | - |
dc.subject.keywordPlus | Perceptual quality | - |
dc.subject.keywordPlus | Resource costs | - |
dc.subject.keywordPlus | Sample quality | - |
dc.subject.keywordPlus | Text to speech | - |
dc.subject.keywordPlus | Diffusion | - |
dc.subject.keywordAuthor | diffusion model | - |
dc.subject.keywordAuthor | guided score | - |
dc.subject.keywordAuthor | text-to-speech | - |
dc.identifier.url | https://www.isca-speech.org/archive/interspeech_2023/choi23c_interspeech.html | - |
Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.
222, Wangsimni-ro, Seongdong-gu, Seoul, 04763, Korea+82-2-2220-1365
COPYRIGHT © 2021 HANYANG UNIVERSITY.
Certain data included herein are derived from the © Web of Science of Clarivate Analytics. All rights reserved.
You may not copy or re-distribute this material in whole or in part without the prior written consent of Clarivate Analytics.