Detailed Information

Cited 0 time in webofscience Cited 0 time in scopus
Metadata Downloads

H4C-TTS: Leveraging Multi-Modal Historical Context for Conversational Text-to-Speech

Full metadata record
DC Field Value Language
dc.contributor.authorSeong, Donghyun-
dc.contributor.authorChang, Joon-Hyuk-
dc.date.accessioned2025-02-13T02:00:11Z-
dc.date.available2025-02-13T02:00:11Z-
dc.date.issued2024-09-
dc.identifier.issn1990-9772-
dc.identifier.urihttps://scholarworks.bwise.kr/hanyang/handle/2021.sw.hanyang/206475-
dc.description.abstractConversational text-to-speech (TTS) aims to synthesize natural voices appropriate to a situation by considering the context of past conversations as well as the current text. However, analyzing and modeling the context of a conversation remains challenging. Most conversational TTS use the content of historical and recent conversations without distinguishing between them and often generate speech that does not fit the situation. Hence, we introduce a novel conversational TTS, H4C-TTS, that leverages multi-modal historical context to realize contextually appropriate natural speech synthesis. To facilitate conversational context modeling, we design a context encoder that incorporates historical and recent contexts and a multi-modal encoder that processes textual and acoustic inputs. Experimental results demonstrate that the proposed model significantly improves the naturalness and quality of speech in conversational contexts compared with existing conversational TTS.-
dc.format.extent5-
dc.language영어-
dc.language.isoENG-
dc.titleH4C-TTS: Leveraging Multi-Modal Historical Context for Conversational Text-to-Speech-
dc.typeArticle-
dc.identifier.doi10.21437/Interspeech.2024-1480-
dc.identifier.scopusid2-s2.0-85214843687-
dc.identifier.wosid001331850105009-
dc.identifier.bibliographicCitationProceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, pp 4933 - 4937-
dc.citation.titleProceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH-
dc.citation.startPage4933-
dc.citation.endPage4937-
dc.type.docTypeProceedings Paper-
dc.description.isOpenAccessN-
dc.description.journalRegisteredClassscopus-
dc.relation.journalResearchAreaComputer Science-
dc.relation.journalWebOfScienceCategoryComputer Science, Artificial Intelligence-
dc.subject.keywordPlus'current-
dc.subject.keywordPlusContext models-
dc.subject.keywordPlusConversational speech-
dc.subject.keywordPlusConversational speech synthesis-
dc.subject.keywordPlusMulti-modal-
dc.subject.keywordPlusNatural speech-
dc.subject.keywordPlusQuality of speech-
dc.subject.keywordPlusText to speech-
dc.subject.keywordAuthorconversational speech synthesis-
dc.subject.keywordAuthormulti-modal-
dc.subject.keywordAuthorText-to-speech-
Files in This Item
There are no files associated with this item.
Appears in
Collections
서울 공과대학 > 서울 융합전자공학부 > 1. Journal Articles

qrcode

Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.

Related Researcher

Researcher Chang, Joon-Hyuk photo

Chang, Joon-Hyuk
COLLEGE OF ENGINEERING (SCHOOL OF ELECTRONIC ENGINEERING)
Read more

Altmetrics

Total Views & Downloads

BROWSE