Guided conditioning with predictive network on score-based diffusion model for speech enhancement

Kim, Dail; Yang, Da-Hee; Kim, Donghyun; Chang, Joon-Hyuk; Yang, Jaemo; Choi, Jeonghwan; Lee, Moa; Moon, Han-gil

doi:10.21437/Interspeech.2024-1545

Detailed Information

Cited 0 time in webofscience

Cited 0 time in scopus

Metadata Downloads

Guided conditioning with predictive network on score-based diffusion model for speech enhancement

Full metadata record

DC Field	Value	Language
dc.contributor.author	Kim, Dail	-
dc.contributor.author	Yang, Da-Hee	-
dc.contributor.author	Kim, Donghyun	-
dc.contributor.author	Chang, Joon-Hyuk	-
dc.contributor.author	Yang, Jaemo	-
dc.contributor.author	Choi, Jeonghwan	-
dc.contributor.author	Lee, Moa	-
dc.contributor.author	Moon, Han-gil	-
dc.date.accessioned	2025-04-10T08:30:17Z	-
dc.date.available	2025-04-10T08:30:17Z	-
dc.date.issued	2024-09	-
dc.identifier.issn	1990-9772	-
dc.identifier.issn	2308-457X	-
dc.identifier.uri	https://scholarworks.bwise.kr/hanyang/handle/2021.sw.hanyang/207030	-
dc.description.abstract	Although diffusion-based speech enhancement (SE) models have emerged, they exhibit lower ability in noise removal than other predictive-based SE models. This reflects a trade-off between generative models, which are capable of producing more natural speech based on estimated target distribution, and predictive models, which are more effective in noise removal. To mitigate this trade-off, we propose a novel conditioning method for score-based diffusion models. The proposed approach involves guiding the diffusion model with a pretrained predictive model without joint training, thereby enabling enhanced speech to offer the proper direction to the diffusion model. The effectiveness of the proposed method is highlighted by outperforming the baseline method, with only half the number of sampling steps.	-
dc.format.extent	5	-
dc.language	영어	-
dc.language.iso	ENG	-
dc.title	Guided conditioning with predictive network on score-based diffusion model for speech enhancement	-
dc.type	Article	-
dc.identifier.doi	10.21437/Interspeech.2024-1545	-
dc.identifier.scopusid	2-s2.0-85206651720	-
dc.identifier.wosid	001331850101067	-
dc.identifier.bibliographicCitation	Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, pp 1190 - 1194	-
dc.citation.title	Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH	-
dc.citation.startPage	1190	-
dc.citation.endPage	1194	-
dc.type.docType	Proceedings Paper	-
dc.description.isOpenAccess	N	-
dc.description.journalRegisteredClass	scopus	-
dc.relation.journalResearchArea	Computer Science	-
dc.relation.journalWebOfScienceCategory	Computer Science, Artificial Intelligence	-
dc.subject.keywordPlus	Speech enhancement	-
dc.subject.keywordAuthor	Speech enhancement	-
dc.subject.keywordAuthor	score-based diffusion models	-
dc.subject.keywordAuthor	generative modeling	-
dc.subject.keywordAuthor	predictive modeling	-
dc.subject.keywordAuthor	conditioning	-
dc.identifier.url	https://www.isca-archive.org/interspeech_2024/kim24o_interspeech.html	-

Files in This Item: Go to Link

Appears in Collections: 서울 공과대학 > 서울 융합전자공학부 > 1. Journal Articles

Show simple item record

qrcode

Related Researcher

Researcher Chang, Joon-Hyuk photo

Chang, Joon-Hyuk: COLLEGE OF ENGINEERING (SCHOOL OF ELECTRONIC ENGINEERING)

Read more

Altmetrics

Total Views & Downloads

RSS_1.0 RSS_2.0 ATOM_1.0

222, Wangsimni-ro, Seongdong-gu, Seoul, 04763, Korea+82-2-2220-1366

Certain data included herein are derived from the © Web of Science of Clarivate Analytics. All rights reserved.
You may not copy or re-distribute this material in whole or in part without the prior written consent of Clarivate Analytics.

Detailed Information

Related Researcher

Altmetrics

Total Views & Downloads

BROWSE