Detailed Information

Cited 0 time in webofscience Cited 0 time in scopus
Metadata Downloads

Differentiable Duration Refinement Using Internal Division for Non-Autoregressive Text-to-Speech

Full metadata record
DC Field Value Language
dc.contributor.authorLee, Jaeuk-
dc.contributor.authorShin, Yoonsoo-
dc.contributor.authorChang, Joon-Hyuk-
dc.date.accessioned2024-12-06T05:30:19Z-
dc.date.available2024-12-06T05:30:19Z-
dc.date.issued2024-11-
dc.identifier.issn1070-9908-
dc.identifier.issn1558-2361-
dc.identifier.urihttps://scholarworks.bwise.kr/hanyang/handle/2021.sw.hanyang/202076-
dc.description.abstractMost non-autoregressive text-to-speech (TTS) models acquire target phoneme duration (target duration) from internal or external aligners. They transform the speech-phoneme alignment produced by the aligner into the target duration. Since this transformation is not differentiable, the gradient of the loss function that maximizes the TTS model's likelihood of speech (e.g., mel spectrogram or waveform) cannot be propagated to the target duration. In other words, the target duration is produced regardless of the TTS model's likelihood of speech. Hence, we introduce a differentiable duration refinement that produces a learnable target duration for maximizing the likelihood of speech. The proposed method uses an internal division to locate the phoneme boundary, which is determined to improve the performance of the TTS model. Additionally, we propose a duration distribution loss to enhance the performance of the duration predictor. Our baseline model is JETS, a representative end-to-end TTS model, and we apply the proposed methods to the baseline model. Experimental results show that the proposed method outperforms the baseline model in terms of subjective naturalness and character error rate.-
dc.format.extent5-
dc.language영어-
dc.language.isoENG-
dc.publisherInstitute of Electrical and Electronics Engineers-
dc.titleDifferentiable Duration Refinement Using Internal Division for Non-Autoregressive Text-to-Speech-
dc.typeArticle-
dc.publisher.location미국-
dc.identifier.doi10.1109/LSP.2024.3495578-
dc.identifier.scopusid2-s2.0-85209731795-
dc.identifier.wosid001360422900007-
dc.identifier.bibliographicCitationIEEE Signal Processing Letters, v.31, pp 3154 - 3158-
dc.citation.titleIEEE Signal Processing Letters-
dc.citation.volume31-
dc.citation.startPage3154-
dc.citation.endPage3158-
dc.type.docTypeArticle-
dc.description.isOpenAccessN-
dc.description.journalRegisteredClassscie-
dc.description.journalRegisteredClassscopus-
dc.relation.journalResearchAreaEngineering-
dc.relation.journalWebOfScienceCategoryEngineering, Electrical & Electronic-
dc.subject.keywordPlusMaximum likelihood-
dc.subject.keywordPlusSpectrographs-
dc.subject.keywordPlusSpeech enhancement-
dc.subject.keywordAuthorHidden Markov models-
dc.subject.keywordAuthorSpectrogram-
dc.subject.keywordAuthorTraining-
dc.subject.keywordAuthorAccuracy-
dc.subject.keywordAuthorTransformers-
dc.subject.keywordAuthorPredictive models-
dc.subject.keywordAuthorDecoding-
dc.subject.keywordAuthorText to speech-
dc.subject.keywordAuthorReactive power-
dc.subject.keywordAuthorError analysis-
dc.subject.keywordAuthorAlignment-
dc.subject.keywordAuthorduration modeling-
dc.subject.keywordAuthortext-to-speech-
dc.identifier.urlhttps://ieeexplore.ieee.org/document/10750273-
Files in This Item
Go to Link
Appears in
Collections
서울 공과대학 > 서울 융합전자공학부 > 1. Journal Articles

qrcode

Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.

Related Researcher

Researcher Chang, Joon-Hyuk photo

Chang, Joon-Hyuk
COLLEGE OF ENGINEERING (SCHOOL OF ELECTRONIC ENGINEERING)
Read more

Altmetrics

Total Views & Downloads

BROWSE