Detailed Information

Cited 0 time in webofscience Cited 0 time in scopus
Metadata Downloads

TSP-TTS: Text-based Style Predictor with Residual Vector Quantization for Expressive Text-to-Speech

Full metadata record
DC Field Value Language
dc.contributor.authorSeong, Donghyun-
dc.contributor.authorLee, Hoyoung-
dc.contributor.authorChang, Joon-Hyuk-
dc.date.accessioned2025-02-12T07:00:41Z-
dc.date.available2025-02-12T07:00:41Z-
dc.date.issued2024-09-
dc.identifier.issn1990-9772-
dc.identifier.urihttps://scholarworks.bwise.kr/hanyang/handle/2021.sw.hanyang/206466-
dc.description.abstractExpressive text-to-speech (TTS) aims to synthesize better human-like speech by incorporating diverse speech styles or emotions. While most expressive TTS models rely on reference speech to condition the style of the generated speech, they often fail to generate speech of regular quality. To ensure consistent speech quality, we propose an expressive TTS conditioned on style representation extracted from the text itself. To implement this text-based style predictor, we design a style module incorporating residual vector quantization. Furthermore, the style representation is enhanced through style-to-text alignment and a mel decoder with style hierarchical layer normalization (SHLN). Our experimental findings demonstrate that our proposed model accurately estimates style representation, enabling the generation of high-quality speech without the need for reference speech.-
dc.format.extent5-
dc.language영어-
dc.language.isoENG-
dc.titleTSP-TTS: Text-based Style Predictor with Residual Vector Quantization for Expressive Text-to-Speech-
dc.typeArticle-
dc.identifier.doi10.21437/Interspeech.2024-1734-
dc.identifier.scopusid2-s2.0-85214799703-
dc.identifier.wosid001331850101185-
dc.identifier.bibliographicCitationProceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, pp 1780 - 1784-
dc.citation.titleProceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH-
dc.citation.startPage1780-
dc.citation.endPage1784-
dc.type.docTypeProceedings Paper-
dc.description.isOpenAccessN-
dc.description.journalRegisteredClassscopus-
dc.relation.journalResearchAreaComputer Science-
dc.relation.journalWebOfScienceCategoryComputer Science, Artificial Intelligence-
dc.subject.keywordPlusCondition-
dc.subject.keywordPlusExpressive speech synthesis-
dc.subject.keywordPlusHuman like-
dc.subject.keywordPlusResidual vector quantizations-
dc.subject.keywordPlusSpeech emotions-
dc.subject.keywordPlusSpeech models-
dc.subject.keywordPlusSpeech quality-
dc.subject.keywordPlusSpeech style-
dc.subject.keywordPlusText alignments-
dc.subject.keywordPlusText to speech-
dc.subject.keywordAuthorexpressive speech synthesis-
dc.subject.keywordAuthorresidual vector quantization-
dc.subject.keywordAuthorText-to-speech-
Files in This Item
There are no files associated with this item.
Appears in
Collections
서울 공과대학 > 서울 융합전자공학부 > 1. Journal Articles

qrcode

Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.

Related Researcher

Researcher Chang, Joon-Hyuk photo

Chang, Joon-Hyuk
COLLEGE OF ENGINEERING (SCHOOL OF ELECTRONIC ENGINEERING)
Read more

Altmetrics

Total Views & Downloads

BROWSE