Cited 0 time in
DiffATSM: High quality adaptive time-scale modification using diffusion-based post-processing
| DC Field | Value | Language |
|---|---|---|
| dc.contributor.author | Jang, Sohee | - |
| dc.contributor.author | Kim, Yeon-Ju | - |
| dc.contributor.author | Chang, Joon-Hyuk | - |
| dc.date.accessioned | 2025-12-02T00:00:10Z | - |
| dc.date.available | 2025-12-02T00:00:10Z | - |
| dc.date.issued | 2026-03 | - |
| dc.identifier.issn | 0885-2308 | - |
| dc.identifier.issn | 1095-8363 | - |
| dc.identifier.uri | https://scholarworks.bwise.kr/hanyang/handle/2021.sw.hanyang/209408 | - |
| dc.description.abstract | The advent of adaptive time-scale modification (ATSM) has marked a significant evolution in audio processing, applying adaptive speaking rates that surpass the performance of conventional time-scale modification (TSM) systems employing a fixed speaking rate. However, ATSM requires audio transcriptions and additional phoneme localization modules, which limit its applicability when such resources are unavailable. Furthermore, traditional signal processing approaches in the time domain often degrade audio quality due to artifacts resulting from phase mismatches. To overcome these limitations, we propose DiffATSM, a novel deep learning-based TSM framework that directly generates time-scaled speech from raw waveforms without requiring transcription. DiffATSM comprises two main components: an adaptive neural generator and a post-processing network using a diffusion probabilistic model. The adaptive neural generator modulates the temporal scale of the mel spectrogram by conditioning on phonetic posteriorgrams (PPG), which are extracted from a self-supervised speech model. These PPG features serve as auxiliary information to preserve phonetic structure during time scaling. The generated spectrogram is further refined by the diffusion-based post-processing network, which enhances fidelity by modeling complex speech distributions. Our experimental results demonstrate that DiffATSM significantly outperforms existing TSM algorithms, including ATSM, in subjective and objective evaluations. | - |
| dc.format.extent | 11 | - |
| dc.language | 영어 | - |
| dc.language.iso | ENG | - |
| dc.publisher | Academic Press | - |
| dc.title | DiffATSM: High quality adaptive time-scale modification using diffusion-based post-processing | - |
| dc.type | Article | - |
| dc.publisher.location | 영국 | - |
| dc.identifier.doi | 10.1016/j.csl.2025.101895 | - |
| dc.identifier.scopusid | 2-s2.0-105021086812 | - |
| dc.identifier.wosid | 001614130800001 | - |
| dc.identifier.bibliographicCitation | Computer Speech and Language, v.97, pp 1 - 11 | - |
| dc.citation.title | Computer Speech and Language | - |
| dc.citation.volume | 97 | - |
| dc.citation.startPage | 1 | - |
| dc.citation.endPage | 11 | - |
| dc.type.docType | Article | - |
| dc.description.isOpenAccess | N | - |
| dc.description.journalRegisteredClass | scie | - |
| dc.description.journalRegisteredClass | scopus | - |
| dc.relation.journalResearchArea | Computer Science | - |
| dc.relation.journalWebOfScienceCategory | Computer Science, Artificial Intelligence | - |
| dc.subject.keywordPlus | Audio signal processing | - |
| dc.subject.keywordPlus | Audio systems | - |
| dc.subject.keywordPlus | Deep learning | - |
| dc.subject.keywordPlus | Diffusion | - |
| dc.subject.keywordPlus | Linguistics | - |
| dc.subject.keywordPlus | Neural networks | - |
| dc.subject.keywordPlus | Spectrographs | - |
| dc.subject.keywordPlus | Speech analysis | - |
| dc.subject.keywordPlus | Speech communication | - |
| dc.subject.keywordPlus | Speech processing | - |
| dc.subject.keywordPlus | Speech transmission | - |
| dc.subject.keywordPlus | Time domain analysis | - |
| dc.subject.keywordPlus | Time measurement | - |
| dc.subject.keywordAuthor | Time-scale modification | - |
| dc.subject.keywordAuthor | Adaptive time-scale modification | - |
| dc.subject.keywordAuthor | Phonetic posteriorgrams | - |
| dc.subject.keywordAuthor | Diffusion probabilistic model | - |
| dc.identifier.url | https://www.sciencedirect.com/science/article/pii/S0885230825001202?via%3Dihub | - |
Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.
222, Wangsimni-ro, Seongdong-gu, Seoul, 04763, Korea+82-2-2220-1366
COPYRIGHT © 2024 HANYANG UNIVERSITY.
Certain data included herein are derived from the © Web of Science of Clarivate Analytics. All rights reserved.
You may not copy or re-distribute this material in whole or in part without the prior written consent of Clarivate Analytics.
