Detailed Information

Cited 0 time in webofscience Cited 0 time in scopus
Metadata Downloads

Tokenized Generative Speech Enhancement With Language Model and Flow Matching

Full metadata record
DC Field Value Language
dc.contributor.authorYang, Da-Hee-
dc.contributor.authorLee, Jaeuk-
dc.contributor.authorChang, Joon-Hyuk-
dc.date.accessioned2025-08-26T02:00:11Z-
dc.date.available2025-08-26T02:00:11Z-
dc.date.issued2025-07-
dc.identifier.issn1070-9908-
dc.identifier.issn1558-2361-
dc.identifier.urihttps://scholarworks.bwise.kr/hanyang/handle/2021.sw.hanyang/208580-
dc.description.abstractWe propose a novel generative speech enhancement (SE) framework that integrates a language model (LM) and a flow-matching model. To utilize an LM with discrete tokens, we introduce dMel, which discretizes Mel spectrograms into a predefined set of quantized values on a linear-scale without requiring additional neural networks. dMel preserves both semantic and acoustic characteristics, providing a compact and effective token-based alternative to Mel spectrograms. We design the first encoder-decoder LM for SE, which learns to map noisy dMel to enhanced ones. Subsequently, flow-matching de-quantizes enhanced dMel into continuous representation and refines it by learning the optimal transport-based probability path, improving perceptual quality. This unified approach enables structured reconstruction while effectively suppressing noise. Experimental results demonstrate the effectiveness of our method in enhancing speech quality, establishing a new paradigm for generative SE without reliance on neural codec-based representations.-
dc.format.extent5-
dc.language영어-
dc.language.isoENG-
dc.publisherInstitute of Electrical and Electronics Engineers-
dc.titleTokenized Generative Speech Enhancement With Language Model and Flow Matching-
dc.typeArticle-
dc.publisher.location미국-
dc.identifier.doi10.1109/LSP.2025.3589128-
dc.identifier.scopusid2-s2.0-105012356240-
dc.identifier.wosid001536693600005-
dc.identifier.bibliographicCitationIEEE Signal Processing Letters, v.32, pp 2828 - 2832-
dc.citation.titleIEEE Signal Processing Letters-
dc.citation.volume32-
dc.citation.startPage2828-
dc.citation.endPage2832-
dc.type.docTypeArticle-
dc.description.isOpenAccessN-
dc.description.journalRegisteredClassscie-
dc.description.journalRegisteredClassscopus-
dc.relation.journalResearchAreaEngineering-
dc.relation.journalWebOfScienceCategoryEngineering, Electrical & Electronic-
dc.subject.keywordPlusComputational linguistics-
dc.subject.keywordPlusNeural networks-
dc.subject.keywordPlusOptimization-
dc.subject.keywordPlusSemantics-
dc.subject.keywordPlusSpeech coding-
dc.subject.keywordPlusSpeech communication-
dc.subject.keywordAuthorSpectrogram-
dc.subject.keywordAuthorNoise measurement-
dc.subject.keywordAuthorSpeech enhancement-
dc.subject.keywordAuthorTokenization-
dc.subject.keywordAuthorDecoding-
dc.subject.keywordAuthorTraining-
dc.subject.keywordAuthorNoise-
dc.subject.keywordAuthorIndexes-
dc.subject.keywordAuthorComputational modeling-
dc.subject.keywordAuthorAcoustics-
dc.subject.keywordAuthortokenization-
dc.subject.keywordAuthorlanguage model-
dc.subject.keywordAuthorflow-matching-
dc.identifier.urlhttps://ieeexplore.ieee.org/document/11079998-
Files in This Item
Go to Link
Appears in
Collections
서울 공과대학 > 서울 융합전자공학부 > 1. Journal Articles

qrcode

Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.

Related Researcher

Researcher Chang, Joon-Hyuk photo

Chang, Joon-Hyuk
COLLEGE OF ENGINEERING (SCHOOL OF ELECTRONIC ENGINEERING)
Read more

Altmetrics

Total Views & Downloads

BROWSE