Tokenized Generative Speech Enhancement With Language Model and Flow Matching
- Authors
- Yang, Da-Hee; Lee, Jaeuk; Chang, Joon-Hyuk
- Issue Date
- Jul-2025
- Publisher
- Institute of Electrical and Electronics Engineers
- Keywords
- Spectrogram; Noise measurement; Speech enhancement; Tokenization; Decoding; Training; Noise; Indexes; Computational modeling; Acoustics; tokenization; language model; flow-matching
- Citation
- IEEE Signal Processing Letters, v.32, pp 2828 - 2832
- Pages
- 5
- Indexed
- SCIE
SCOPUS
- Journal Title
- IEEE Signal Processing Letters
- Volume
- 32
- Start Page
- 2828
- End Page
- 2832
- URI
- https://scholarworks.bwise.kr/hanyang/handle/2021.sw.hanyang/208580
- DOI
- 10.1109/LSP.2025.3589128
- ISSN
- 1070-9908
1558-2361
- Abstract
- We propose a novel generative speech enhancement (SE) framework that integrates a language model (LM) and a flow-matching model. To utilize an LM with discrete tokens, we introduce dMel, which discretizes Mel spectrograms into a predefined set of quantized values on a linear-scale without requiring additional neural networks. dMel preserves both semantic and acoustic characteristics, providing a compact and effective token-based alternative to Mel spectrograms. We design the first encoder-decoder LM for SE, which learns to map noisy dMel to enhanced ones. Subsequently, flow-matching de-quantizes enhanced dMel into continuous representation and refines it by learning the optimal transport-based probability path, improving perceptual quality. This unified approach enables structured reconstruction while effectively suppressing noise. Experimental results demonstrate the effectiveness of our method in enhancing speech quality, establishing a new paradigm for generative SE without reliance on neural codec-based representations.
- Files in This Item
-
Go to Link
- Appears in
Collections - 서울 공과대학 > 서울 융합전자공학부 > 1. Journal Articles

Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.