Detailed Information

Cited 0 time in webofscience Cited 0 time in scopus
Metadata Downloads

Prior-free Guided TTS: An Improved and Efficient Diffusion-based Text-Guided Speech Synthesis

Full metadata record
DC Field Value Language
dc.contributor.authorChoi, Won-Gook-
dc.contributor.authorKim, So-Jeong-
dc.contributor.authorKim, TaeHo-
dc.contributor.authorChang, Joon-Hyuk-
dc.date.accessioned2023-10-10T02:36:23Z-
dc.date.available2023-10-10T02:36:23Z-
dc.date.created2023-10-04-
dc.date.issued2023-08-
dc.identifier.issn2308-457X-
dc.identifier.urihttps://scholarworks.bwise.kr/hanyang/handle/2021.sw.hanyang/191797-
dc.description.abstractRecently, diffusion models have exhibited higher sample quality with guidance, such as classifier guidance and classifier-free guidance. However, these guidances have limitations: they require extra classifiers or joint training, and incur additional sampling cost. In this study, we propose prior-free guidance diffusion model and prior-free guided text-to-speech (PfGuided-TTS) that can generate a speech at a quality as high as other guidances without extra training resources and computational cost. PfGuided-TTS can generate higher human perceptual quality speech than the existing autoregressive (AR) and non-AR models, including diffusion-based TTS on LJSpeech. In addition, we provide a schematic describing why and how classifier- and prior-free guided scores produce high-fidelity samples.-
dc.language영어-
dc.language.isoen-
dc.publisherInternational Speech Communication Association-
dc.titlePrior-free Guided TTS: An Improved and Efficient Diffusion-based Text-Guided Speech Synthesis-
dc.typeArticle-
dc.contributor.affiliatedAuthorChang, Joon-Hyuk-
dc.identifier.doi10.21437/Interspeech.2023-506-
dc.identifier.scopusid2-s2.0-85171583207-
dc.identifier.bibliographicCitationProceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, v.2023-August, pp.4289 - 4293-
dc.relation.isPartOfProceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH-
dc.citation.titleProceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH-
dc.citation.volume2023-August-
dc.citation.startPage4289-
dc.citation.endPage4293-
dc.type.rimsART-
dc.type.docTypeConference paper-
dc.description.journalClass1-
dc.description.isOpenAccessN-
dc.description.journalRegisteredClassscopus-
dc.subject.keywordPlusSpeech communication-
dc.subject.keywordPlusSpeech synthesis-
dc.subject.keywordPlusAdditional sampling-
dc.subject.keywordPlusAuto-regressive-
dc.subject.keywordPlusAutoregressive modelling-
dc.subject.keywordPlusComputational costs-
dc.subject.keywordPlusDiffusion model-
dc.subject.keywordPlusGuided score-
dc.subject.keywordPlusPerceptual quality-
dc.subject.keywordPlusResource costs-
dc.subject.keywordPlusSample quality-
dc.subject.keywordPlusText to speech-
dc.subject.keywordPlusDiffusion-
dc.subject.keywordAuthordiffusion model-
dc.subject.keywordAuthorguided score-
dc.subject.keywordAuthortext-to-speech-
dc.identifier.urlhttps://www.isca-speech.org/archive/interspeech_2023/choi23c_interspeech.html-
Files in This Item
Go to Link
Appears in
Collections
서울 공과대학 > 서울 융합전자공학부 > 1. Journal Articles

qrcode

Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.

Related Researcher

Researcher Chang, Joon-Hyuk photo

Chang, Joon-Hyuk
COLLEGE OF ENGINEERING (SCHOOL OF ELECTRONIC ENGINEERING)
Read more

Altmetrics

Total Views & Downloads

BROWSE