Vector Field Decomposition-based Flow Matching for Zero-Shot Cross-Lingual Text-to-Speech

Lee, Jaeuk; Song, Nam-Seok; Chang, Joon-Hyuk

doi:10.1109/LSP.2025.3571407

Detailed Information

Cited 0 time in webofscience

Cited 0 time in scopus

Metadata Downloads

Vector Field Decomposition-based Flow Matching for Zero-Shot Cross-Lingual Text-to-Speech

Full metadata record

DC Field	Value	Language
dc.contributor.author	Lee, Jaeuk	-
dc.contributor.author	Song, Nam-Seok	-
dc.contributor.author	Chang, Joon-Hyuk	-
dc.date.accessioned	2025-11-26T08:00:55Z	-
dc.date.available	2025-11-26T08:00:55Z	-
dc.date.issued	2025-05	-
dc.identifier.issn	1070-9908	-
dc.identifier.issn	1558-2361	-
dc.identifier.uri	https://scholarworks.bwise.kr/hanyang/handle/2021.sw.hanyang/209338	-
dc.description.abstract	Zero-shot text-to-speech (TTS) has recently achieved remarkable performance by leveraging a speech prompt instead of a speaker embedding, as it provides richer information. However, zero-shot cross-lingual tasks synthesize speech in multiple languages according to a given language ID, regardless of the language of the speech prompt. Consequently, the inherent language-specific characteristics of the speech prompt may conflict with the language ID, potentially affecting the accuracy of language representation in speech. Thus, we propose vector field decomposition-based flow matching that decomposes the vector field into speaker and language components. These components are trained to be activated in different frequency bins, as speaker and language identity are distributed across distinct frequency ranges in speech. This approach is particularly effective for cross-lingual TTS, as it minimizes conflicts between speech prompts and language IDs. As a result, the summation of the two components directly forms the vector field that represents the probability path from a Gaussian distribution to the target data distribution (e.g., mel spectrogram). Experimental results demonstrate that the proposed method outperforms the conventional method in terms of both subjective and objective evaluations.	-
dc.format.extent	5	-
dc.language	영어	-
dc.language.iso	ENG	-
dc.publisher	Institute of Electrical and Electronics Engineers	-
dc.title	Vector Field Decomposition-based Flow Matching for Zero-Shot Cross-Lingual Text-to-Speech	-
dc.type	Article	-
dc.publisher.location	미국	-
dc.identifier.doi	10.1109/LSP.2025.3571407	-
dc.identifier.scopusid	2-s2.0-105005878653	-
dc.identifier.wosid	001579027500007	-
dc.identifier.bibliographicCitation	IEEE Signal Processing Letters, v.32, pp 3560 - 3564	-
dc.citation.title	IEEE Signal Processing Letters	-
dc.citation.volume	32	-
dc.citation.startPage	3560	-
dc.citation.endPage	3564	-
dc.type.docType	Article	-
dc.description.isOpenAccess	N	-
dc.description.journalRegisteredClass	scie	-
dc.description.journalRegisteredClass	scopus	-
dc.relation.journalResearchArea	Engineering	-
dc.relation.journalWebOfScienceCategory	Engineering, Electrical & Electronic	-
dc.subject.keywordPlus	TTS	-
dc.subject.keywordAuthor	Flow matching	-
dc.subject.keywordAuthor	speech prompt	-
dc.subject.keywordAuthor	zero-shot cross-lingual text-to-speech	-
dc.identifier.url	https://ieeexplore.ieee.org/document/11006940	-

Files in This Item: Go to Link

Appears in Collections: 서울 공과대학 > 서울 융합전자공학부 > 1. Journal Articles

Show simple item record

qrcode

Related Researcher

Researcher Chang, Joon-Hyuk photo

Chang, Joon-Hyuk: COLLEGE OF ENGINEERING (SCHOOL OF ELECTRONIC ENGINEERING)

Read more

Altmetrics

Total Views & Downloads

RSS_1.0 RSS_2.0 ATOM_1.0

222, Wangsimni-ro, Seongdong-gu, Seoul, 04763, Korea+82-2-2220-1366

Certain data included herein are derived from the © Web of Science of Clarivate Analytics. All rights reserved.
You may not copy or re-distribute this material in whole or in part without the prior written consent of Clarivate Analytics.

Detailed Information

Related Researcher

Altmetrics

Total Views & Downloads

BROWSE