Differentiable Duration Refinement Using Internal Division for Non-Autoregressive Text-to-Speech

Lee, Jaeuk; Shin, Yoonsoo; Chang, Joon-Hyuk

doi:10.1109/LSP.2024.3495578

Detailed Information

Cited 0 time in webofscience

Cited 0 time in scopus

Metadata Downloads

Differentiable Duration Refinement Using Internal Division for Non-Autoregressive Text-to-Speech

Full metadata record

DC Field	Value	Language
dc.contributor.author	Lee, Jaeuk	-
dc.contributor.author	Shin, Yoonsoo	-
dc.contributor.author	Chang, Joon-Hyuk	-
dc.date.accessioned	2024-12-06T05:30:19Z	-
dc.date.available	2024-12-06T05:30:19Z	-
dc.date.issued	2024-11	-
dc.identifier.issn	1070-9908	-
dc.identifier.issn	1558-2361	-
dc.identifier.uri	https://scholarworks.bwise.kr/hanyang/handle/2021.sw.hanyang/202076	-
dc.description.abstract	Most non-autoregressive text-to-speech (TTS) models acquire target phoneme duration (target duration) from internal or external aligners. They transform the speech-phoneme alignment produced by the aligner into the target duration. Since this transformation is not differentiable, the gradient of the loss function that maximizes the TTS model's likelihood of speech (e.g., mel spectrogram or waveform) cannot be propagated to the target duration. In other words, the target duration is produced regardless of the TTS model's likelihood of speech. Hence, we introduce a differentiable duration refinement that produces a learnable target duration for maximizing the likelihood of speech. The proposed method uses an internal division to locate the phoneme boundary, which is determined to improve the performance of the TTS model. Additionally, we propose a duration distribution loss to enhance the performance of the duration predictor. Our baseline model is JETS, a representative end-to-end TTS model, and we apply the proposed methods to the baseline model. Experimental results show that the proposed method outperforms the baseline model in terms of subjective naturalness and character error rate.	-
dc.format.extent	5	-
dc.language	영어	-
dc.language.iso	ENG	-
dc.publisher	Institute of Electrical and Electronics Engineers	-
dc.title	Differentiable Duration Refinement Using Internal Division for Non-Autoregressive Text-to-Speech	-
dc.type	Article	-
dc.publisher.location	미국	-
dc.identifier.doi	10.1109/LSP.2024.3495578	-
dc.identifier.scopusid	2-s2.0-85209731795	-
dc.identifier.wosid	001360422900007	-
dc.identifier.bibliographicCitation	IEEE Signal Processing Letters, v.31, pp 3154 - 3158	-
dc.citation.title	IEEE Signal Processing Letters	-
dc.citation.volume	31	-
dc.citation.startPage	3154	-
dc.citation.endPage	3158	-
dc.type.docType	Article	-
dc.description.isOpenAccess	N	-
dc.description.journalRegisteredClass	scie	-
dc.description.journalRegisteredClass	scopus	-
dc.relation.journalResearchArea	Engineering	-
dc.relation.journalWebOfScienceCategory	Engineering, Electrical & Electronic	-
dc.subject.keywordPlus	Maximum likelihood	-
dc.subject.keywordPlus	Spectrographs	-
dc.subject.keywordPlus	Speech enhancement	-
dc.subject.keywordAuthor	Hidden Markov models	-
dc.subject.keywordAuthor	Spectrogram	-
dc.subject.keywordAuthor	Training	-
dc.subject.keywordAuthor	Accuracy	-
dc.subject.keywordAuthor	Transformers	-
dc.subject.keywordAuthor	Predictive models	-
dc.subject.keywordAuthor	Decoding	-
dc.subject.keywordAuthor	Text to speech	-
dc.subject.keywordAuthor	Reactive power	-
dc.subject.keywordAuthor	Error analysis	-
dc.subject.keywordAuthor	Alignment	-
dc.subject.keywordAuthor	duration modeling	-
dc.subject.keywordAuthor	text-to-speech	-
dc.identifier.url	https://ieeexplore.ieee.org/document/10750273	-

Files in This Item: Go to Link

Appears in Collections: 서울 공과대학 > 서울 융합전자공학부 > 1. Journal Articles

Show simple item record

qrcode

Related Researcher

Researcher Chang, Joon-Hyuk photo

Chang, Joon-Hyuk: COLLEGE OF ENGINEERING (SCHOOL OF ELECTRONIC ENGINEERING)

Read more

Altmetrics

Total Views & Downloads

RSS_1.0 RSS_2.0 ATOM_1.0

222, Wangsimni-ro, Seongdong-gu, Seoul, 04763, Korea+82-2-2220-1366

Certain data included herein are derived from the © Web of Science of Clarivate Analytics. All rights reserved.
You may not copy or re-distribute this material in whole or in part without the prior written consent of Clarivate Analytics.

Detailed Information

Related Researcher

Altmetrics

Total Views & Downloads

BROWSE