TSP-TTS: Text-based Style Predictor with Residual Vector Quantization for Expressive Text-to-Speech

Seong, Donghyun; Lee, Hoyoung; Chang, Joon-Hyuk

doi:10.21437/Interspeech.2024-1734

Detailed Information

Cited 0 time in webofscience

Cited 0 time in scopus

Metadata Downloads

TSP-TTS: Text-based Style Predictor with Residual Vector Quantization for Expressive Text-to-Speech

Full metadata record

DC Field	Value	Language
dc.contributor.author	Seong, Donghyun	-
dc.contributor.author	Lee, Hoyoung	-
dc.contributor.author	Chang, Joon-Hyuk	-
dc.date.accessioned	2025-02-12T07:00:41Z	-
dc.date.available	2025-02-12T07:00:41Z	-
dc.date.issued	2024-09	-
dc.identifier.issn	1990-9772	-
dc.identifier.uri	https://scholarworks.bwise.kr/hanyang/handle/2021.sw.hanyang/206466	-
dc.description.abstract	Expressive text-to-speech (TTS) aims to synthesize better human-like speech by incorporating diverse speech styles or emotions. While most expressive TTS models rely on reference speech to condition the style of the generated speech, they often fail to generate speech of regular quality. To ensure consistent speech quality, we propose an expressive TTS conditioned on style representation extracted from the text itself. To implement this text-based style predictor, we design a style module incorporating residual vector quantization. Furthermore, the style representation is enhanced through style-to-text alignment and a mel decoder with style hierarchical layer normalization (SHLN). Our experimental findings demonstrate that our proposed model accurately estimates style representation, enabling the generation of high-quality speech without the need for reference speech.	-
dc.format.extent	5	-
dc.language	영어	-
dc.language.iso	ENG	-
dc.title	TSP-TTS: Text-based Style Predictor with Residual Vector Quantization for Expressive Text-to-Speech	-
dc.type	Article	-
dc.identifier.doi	10.21437/Interspeech.2024-1734	-
dc.identifier.scopusid	2-s2.0-85214799703	-
dc.identifier.wosid	001331850101185	-
dc.identifier.bibliographicCitation	Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, pp 1780 - 1784	-
dc.citation.title	Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH	-
dc.citation.startPage	1780	-
dc.citation.endPage	1784	-
dc.type.docType	Proceedings Paper	-
dc.description.isOpenAccess	N	-
dc.description.journalRegisteredClass	scopus	-
dc.relation.journalResearchArea	Computer Science	-
dc.relation.journalWebOfScienceCategory	Computer Science, Artificial Intelligence	-
dc.subject.keywordPlus	Condition	-
dc.subject.keywordPlus	Expressive speech synthesis	-
dc.subject.keywordPlus	Human like	-
dc.subject.keywordPlus	Residual vector quantizations	-
dc.subject.keywordPlus	Speech emotions	-
dc.subject.keywordPlus	Speech models	-
dc.subject.keywordPlus	Speech quality	-
dc.subject.keywordPlus	Speech style	-
dc.subject.keywordPlus	Text alignments	-
dc.subject.keywordPlus	Text to speech	-
dc.subject.keywordAuthor	expressive speech synthesis	-
dc.subject.keywordAuthor	residual vector quantization	-
dc.subject.keywordAuthor	Text-to-speech	-

Files in This Item: There are no files associated with this item.

Appears in Collections: 서울 공과대학 > 서울 융합전자공학부 > 1. Journal Articles

Show simple item record

qrcode

Related Researcher

Researcher Chang, Joon-Hyuk photo

Chang, Joon-Hyuk: COLLEGE OF ENGINEERING (SCHOOL OF ELECTRONIC ENGINEERING)

Read more

Altmetrics

Total Views & Downloads

RSS_1.0 RSS_2.0 ATOM_1.0

222, Wangsimni-ro, Seongdong-gu, Seoul, 04763, Korea+82-2-2220-1366

Certain data included herein are derived from the © Web of Science of Clarivate Analytics. All rights reserved.
You may not copy or re-distribute this material in whole or in part without the prior written consent of Clarivate Analytics.

Detailed Information

Related Researcher

Altmetrics

Total Views & Downloads

BROWSE