DiffATSM: High quality adaptive time-scale modification using diffusion-based post-processing

Jang, Sohee; Kim, Yeon-Ju; Chang, Joon-Hyuk

doi:10.1016/j.csl.2025.101895

Detailed Information

Cited 0 time in webofscience

Cited 0 time in scopus

Metadata Downloads

DiffATSM: High quality adaptive time-scale modification using diffusion-based post-processing

Full metadata record

DC Field	Value	Language
dc.contributor.author	Jang, Sohee	-
dc.contributor.author	Kim, Yeon-Ju	-
dc.contributor.author	Chang, Joon-Hyuk	-
dc.date.accessioned	2025-12-02T00:00:10Z	-
dc.date.available	2025-12-02T00:00:10Z	-
dc.date.issued	2026-03	-
dc.identifier.issn	0885-2308	-
dc.identifier.issn	1095-8363	-
dc.identifier.uri	https://scholarworks.bwise.kr/hanyang/handle/2021.sw.hanyang/209408	-
dc.description.abstract	The advent of adaptive time-scale modification (ATSM) has marked a significant evolution in audio processing, applying adaptive speaking rates that surpass the performance of conventional time-scale modification (TSM) systems employing a fixed speaking rate. However, ATSM requires audio transcriptions and additional phoneme localization modules, which limit its applicability when such resources are unavailable. Furthermore, traditional signal processing approaches in the time domain often degrade audio quality due to artifacts resulting from phase mismatches. To overcome these limitations, we propose DiffATSM, a novel deep learning-based TSM framework that directly generates time-scaled speech from raw waveforms without requiring transcription. DiffATSM comprises two main components: an adaptive neural generator and a post-processing network using a diffusion probabilistic model. The adaptive neural generator modulates the temporal scale of the mel spectrogram by conditioning on phonetic posteriorgrams (PPG), which are extracted from a self-supervised speech model. These PPG features serve as auxiliary information to preserve phonetic structure during time scaling. The generated spectrogram is further refined by the diffusion-based post-processing network, which enhances fidelity by modeling complex speech distributions. Our experimental results demonstrate that DiffATSM significantly outperforms existing TSM algorithms, including ATSM, in subjective and objective evaluations.	-
dc.format.extent	11	-
dc.language	영어	-
dc.language.iso	ENG	-
dc.publisher	Academic Press	-
dc.title	DiffATSM: High quality adaptive time-scale modification using diffusion-based post-processing	-
dc.type	Article	-
dc.publisher.location	영국	-
dc.identifier.doi	10.1016/j.csl.2025.101895	-
dc.identifier.scopusid	2-s2.0-105021086812	-
dc.identifier.wosid	001614130800001	-
dc.identifier.bibliographicCitation	Computer Speech and Language, v.97, pp 1 - 11	-
dc.citation.title	Computer Speech and Language	-
dc.citation.volume	97	-
dc.citation.startPage	1	-
dc.citation.endPage	11	-
dc.type.docType	Article	-
dc.description.isOpenAccess	N	-
dc.description.journalRegisteredClass	scie	-
dc.description.journalRegisteredClass	scopus	-
dc.relation.journalResearchArea	Computer Science	-
dc.relation.journalWebOfScienceCategory	Computer Science, Artificial Intelligence	-
dc.subject.keywordPlus	Audio signal processing	-
dc.subject.keywordPlus	Audio systems	-
dc.subject.keywordPlus	Deep learning	-
dc.subject.keywordPlus	Diffusion	-
dc.subject.keywordPlus	Linguistics	-
dc.subject.keywordPlus	Neural networks	-
dc.subject.keywordPlus	Spectrographs	-
dc.subject.keywordPlus	Speech analysis	-
dc.subject.keywordPlus	Speech communication	-
dc.subject.keywordPlus	Speech processing	-
dc.subject.keywordPlus	Speech transmission	-
dc.subject.keywordPlus	Time domain analysis	-
dc.subject.keywordPlus	Time measurement	-
dc.subject.keywordAuthor	Time-scale modification	-
dc.subject.keywordAuthor	Adaptive time-scale modification	-
dc.subject.keywordAuthor	Phonetic posteriorgrams	-
dc.subject.keywordAuthor	Diffusion probabilistic model	-
dc.identifier.url	https://www.sciencedirect.com/science/article/pii/S0885230825001202?via%3Dihub	-

Files in This Item: Go to Link

Appears in Collections: 서울 공과대학 > 서울 융합전자공학부 > 1. Journal Articles

Show simple item record

qrcode

Related Researcher

Researcher Chang, Joon-Hyuk photo

Chang, Joon-Hyuk: COLLEGE OF ENGINEERING (SCHOOL OF ELECTRONIC ENGINEERING)

Read more

Altmetrics

Total Views & Downloads

RSS_1.0 RSS_2.0 ATOM_1.0

222, Wangsimni-ro, Seongdong-gu, Seoul, 04763, Korea+82-2-2220-1366

Certain data included herein are derived from the © Web of Science of Clarivate Analytics. All rights reserved.
You may not copy or re-distribute this material in whole or in part without the prior written consent of Clarivate Analytics.

Detailed Information

Related Researcher

Altmetrics

Total Views & Downloads

BROWSE