An experimental study of diffusion-based general speech restoration with predictive-guided conditioning

Yang, Da-Hee; Chang, Joon-Hyuk

doi:10.1016/j.csl.2026.101940

Detailed Information

Cited 0 time in webofscience

Cited 0 time in scopus

Metadata Downloads

An experimental study of diffusion-based general speech restoration with predictive-guided conditioning

Full metadata record

DC Field	Value	Language
dc.contributor.author	Yang, Da-Hee	-
dc.contributor.author	Chang, Joon-Hyuk	-
dc.date.accessioned	2026-03-23T02:00:28Z	-
dc.date.available	2026-03-23T02:00:28Z	-
dc.date.issued	2026-07	-
dc.identifier.issn	0885-2308	-
dc.identifier.issn	1095-8363	-
dc.identifier.uri	https://scholarworks.bwise.kr/hanyang/handle/2021.sw.hanyang/211432	-
dc.description.abstract	This study presents a hybrid speech restoration framework that integrates predictive-guided conditioning into a diffusion-based generative model to address complex distortions, including noise, reverberation, and bandwidth reduction. The proposed method employs the outputs of a predictive model to guide the diffusion process, enabling more accurate reconstruction under challenging acoustic conditions. Furthermore, during the final sampling stage, the outputs of the predictive and generative models are fused with a tunable ratio, balancing signal fidelity and perceptual naturalness. Experimental results demonstrate that the proposed approach significantly improves objective restoration metrics compared to conventional diffusion baselines. However, the perceptual quality varies with the fusion ratio, revealing a trade-off between objective gains and subjective preference. These findings highlight the potential of predictive-guided conditioning for robust speech restoration and provide insights into optimizing the balance between predictive and generative contributions.	-
dc.format.extent	11	-
dc.language	영어	-
dc.language.iso	ENG	-
dc.publisher	ACADEMIC PRESS LTD- ELSEVIER SCIENCE LTD	-
dc.title	An experimental study of diffusion-based general speech restoration with predictive-guided conditioning	-
dc.type	Article	-
dc.publisher.location	영국	-
dc.identifier.doi	10.1016/j.csl.2026.101940	-
dc.identifier.scopusid	2-s2.0-105027726418	-
dc.identifier.wosid	001674828400001	-
dc.identifier.bibliographicCitation	COMPUTER SPEECH AND LANGUAGE, v.99, pp 1 - 11	-
dc.citation.title	COMPUTER SPEECH AND LANGUAGE	-
dc.citation.volume	99	-
dc.citation.startPage	1	-
dc.citation.endPage	11	-
dc.type.docType	Article	-
dc.description.isOpenAccess	N	-
dc.description.journalRegisteredClass	scie	-
dc.description.journalRegisteredClass	scopus	-
dc.relation.journalResearchArea	Computer Science	-
dc.relation.journalWebOfScienceCategory	Computer Science, Artificial Intelligence	-
dc.subject.keywordPlus	Acoustic noise	-
dc.subject.keywordPlus	Architectural acoustics	-
dc.subject.keywordPlus	Diffusion	-
dc.subject.keywordPlus	Restoration	-
dc.subject.keywordPlus	Speech communication	-
dc.subject.keywordPlus	Speech enhancement	-
dc.subject.keywordAuthor	Score-based diffusion model	-
dc.subject.keywordAuthor	Predictive-guided conditioning	-
dc.subject.keywordAuthor	General speech restoration	-
dc.subject.keywordAuthor	Speech enhancement	-
dc.identifier.url	https://www.sciencedirect.com/science/article/pii/S0885230826000045?via%3Dihub	-

Files in This Item: Go to Link

Appears in Collections: 서울 공과대학 > 서울 융합전자공학부 > 1. Journal Articles

Show simple item record

qrcode

Related Researcher

Researcher Chang, Joon-Hyuk photo

Chang, Joon-Hyuk: COLLEGE OF ENGINEERING (SCHOOL OF ELECTRONIC ENGINEERING)

Read more

Altmetrics

Total Views & Downloads

RSS_1.0 RSS_2.0 ATOM_1.0

222, Wangsimni-ro, Seongdong-gu, Seoul, 04763, Korea+82-2-2220-1366

Certain data included herein are derived from the © Web of Science of Clarivate Analytics. All rights reserved.
You may not copy or re-distribute this material in whole or in part without the prior written consent of Clarivate Analytics.

Detailed Information

Related Researcher

Altmetrics

Total Views & Downloads

BROWSE