Guided conditioning with predictive network on score-based diffusion model for speech enhancement
- Authors
- Kim, Dail; Yang, Da-Hee; Kim, Donghyun; Chang, Joon-Hyuk; Yang, Jaemo; Choi, Jeonghwan; Lee, Moa; Moon, Han-gil
- Issue Date
- Sep-2024
- Keywords
- Speech enhancement; score-based diffusion models; generative modeling; predictive modeling; conditioning
- Citation
- Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, pp 1190 - 1194
- Pages
- 5
- Indexed
- SCOPUS
- Journal Title
- Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
- Start Page
- 1190
- End Page
- 1194
- URI
- https://scholarworks.bwise.kr/hanyang/handle/2021.sw.hanyang/207030
- DOI
- 10.21437/Interspeech.2024-1545
- ISSN
- 1990-9772
2308-457X
- Abstract
- Although diffusion-based speech enhancement (SE) models have emerged, they exhibit lower ability in noise removal than other predictive-based SE models. This reflects a trade-off between generative models, which are capable of producing more natural speech based on estimated target distribution, and predictive models, which are more effective in noise removal. To mitigate this trade-off, we propose a novel conditioning method for score-based diffusion models. The proposed approach involves guiding the diffusion model with a pretrained predictive model without joint training, thereby enabling enhanced speech to offer the proper direction to the diffusion model. The effectiveness of the proposed method is highlighted by outperforming the baseline method, with only half the number of sampling steps.
- Files in This Item
-
Go to Link
- Appears in
Collections - 서울 공과대학 > 서울 융합전자공학부 > 1. Journal Articles

Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.