Task-Specific Optimization of Virtual Channel Linear Prediction-Based Speech Dereverberation Front-End for Far-Field Speaker Verification

Yang, Joon-Young; Chang, Joon-Hyuk

doi:10.1109/TASLP.2022.3205752

Detailed Information

Cited 0 time in webofscience

Cited 0 time in scopus

Metadata Downloads

Task-Specific Optimization of Virtual Channel Linear Prediction-Based Speech Dereverberation Front-End for Far-Field Speaker Verification

Authors: Yang, Joon-Young; Chang, Joon-Hyuk

Issue Date: Sep-2022

Publisher: IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

Keywords: Noise reduction; Training; Noise measurement; Task analysis; Optimization; Microphones; Reverberation; Deep neural network; offline processing; speaker verification; speech dereverberation; single microphone; virtual acoustic channel expansion; weighted prediction error

Citation: IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, v.30, pp.3144 - 3159

Indexed: SCIE
SCOPUS

Journal Title: IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING

Volume: 30

Start Page: 3144

End Page: 3159

URI: https://scholarworks.bwise.kr/hanyang/handle/2021.sw.hanyang/173113

DOI: 10.1109/TASLP.2022.3205752

ISSN: 2329-9290

Abstract: Developing a single-microphone speech denoising or dereverberation front-end for robust automatic speaker verification (ASV) in noisy far-field speaking scenarios is challenging. To address this problem, we present a novel front-end design that involves a recently proposed extension of the weighted prediction error (WPE) speech dereverberation algorithm, the virtual acoustic channel expansion (VACE)-WPE. It is demonstrated experimentally in this study that unlike the conventional WPE algorithm, the VACE-WPE can be explicitly trained to cancel out both late reverberation and background noise. To build the front-end, the VACE-WPE is first (pre)trained to preserve the noise components in the input signals and produce "noisy" dereverberated output signals, thus making the front-end to be inductively biased to preserve as much noise components as possible and perform dereverberation only. Subsequently, given a pretrained speaker embedding model, the VACE-WPE is additionally fine-tuned within a task-specific optimization (TSO) framework, causing the speaker embedding extracted from the processed signal to be similar to that extracted from the "noise-free" target signal. Consequently, the front-end is optimized not to perform unnecessarily excessive denoising, thus achieving "generally safe" dereverberation and denoising for far-field ASV. Moreover, to prevent the front-end from adversely affecting the unconstrained "in-the-wild" ASV performance under more general, non-far-field conditions, we propose a distortion regularization method within the TSO framework. The effectiveness of the proposed approach is verified on both far-field and in-the-wild ASV benchmarks, demonstrating its superiority over fully neural front-ends and other TSO methods in various cases.

Files in This Item: Go to Link

Appears in Collections: 서울 공과대학 > 서울 융합전자공학부 > 1. Journal Articles

Show full item record

qrcode

Related Researcher

Researcher Chang, Joon-Hyuk photo

Chang, Joon-Hyuk: COLLEGE OF ENGINEERING (SCHOOL OF ELECTRONIC ENGINEERING)

Read more

Altmetrics

Total Views & Downloads

STATISTICS: Total View :6,012,607; Today View :38,864

RSS_1.0 RSS_2.0 ATOM_1.0

222, Wangsimni-ro, Seongdong-gu, Seoul, 04763, Korea+82-2-2220-1365

Certain data included herein are derived from the © Web of Science of Clarivate Analytics. All rights reserved.
You may not copy or re-distribute this material in whole or in part without the prior written consent of Clarivate Analytics.

Detailed Information

Related Researcher

Altmetrics

Total Views & Downloads

BROWSE