Neural ATSM: Fully Neural Network-based Adaptive Time-Scale Modification Using Sentence-Specific Dynamic Control

Lee, Jaeuk; Jang, Sohee; Chang, Joon-Hyuk

doi:10.21437/Interspeech.2024-2380

Detailed Information

Cited 0 time in webofscience

Cited 0 time in scopus

Metadata Downloads

Neural ATSM: Fully Neural Network-based Adaptive Time-Scale Modification Using Sentence-Specific Dynamic Control

Authors: Lee, Jaeuk; Jang, Sohee; Chang, Joon-Hyuk

Issue Date: Sep-2024

Keywords: Adaptive time-scale modification; attention mechanism; Gaussian upsampling; speaking rate predictor

Citation: Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, pp 4903 - 4907

Pages: 5

Indexed: SCOPUS

Journal Title: Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH

Start Page: 4903

End Page: 4907

URI: https://scholarworks.bwise.kr/hanyang/handle/2021.sw.hanyang/206471

DOI: 10.21437/Interspeech.2024-2380

ISSN: 1990-9772

Abstract: Adaptive time-scale modification (ATSM) adaptively adjusts audio speed and improves upon previous systems by tailoring the scale for each phoneme in two steps: phoneme positioning via Montreal forced aligner (MFA) and reconstruction with adaptive speaking rate. However, ATSM's phoneme-specific rate is constant regardless of sentences, and MFA struggles with precise phoneme alignment in synthetic speech. Driven by this, we propose a fully neural networks-based ATSM (Neural ATSM) that dynamically controls each phoneme's speaking rate to vary from sentence to sentence. It predicts phoneme-level rates using a speaking rate predictor and flexibly modifies the scales to fit sentence context using Gaussian upsampling and attention mechanism, ensuring feature similarity with Soft-dynamic time warping (DTW) loss. We also integrate a variational autoencoder (VAE) and flow models for enhanced time-scaled signals. Experimental results show that Neural ATSM outperforms ATSM for real and synthesized speech.

Files in This Item: Go to Link

Appears in Collections: 서울 공과대학 > 서울 융합전자공학부 > 1. Journal Articles

Show full item record

qrcode

Related Researcher

Researcher Chang, Joon-Hyuk photo

Chang, Joon-Hyuk: COLLEGE OF ENGINEERING (SCHOOL OF ELECTRONIC ENGINEERING)

Read more

Altmetrics

Total Views & Downloads

RSS_1.0 RSS_2.0 ATOM_1.0

222, Wangsimni-ro, Seongdong-gu, Seoul, 04763, Korea+82-2-2220-1366

Certain data included herein are derived from the © Web of Science of Clarivate Analytics. All rights reserved.
You may not copy or re-distribute this material in whole or in part without the prior written consent of Clarivate Analytics.

Detailed Information

Related Researcher

Altmetrics

Total Views & Downloads

BROWSE