Task-Specific Optimization of Virtual Channel Linear Prediction-Based Speech Dereverberation Front-End for Far-Field Speaker Verification
DC Field | Value | Language |
---|---|---|
dc.contributor.author | Yang, Joon-Young | - |
dc.contributor.author | Chang, Joon-Hyuk | - |
dc.date.accessioned | 2022-12-20T06:28:27Z | - |
dc.date.available | 2022-12-20T06:28:27Z | - |
dc.date.created | 2022-11-02 | - |
dc.date.issued | 2022-09 | - |
dc.identifier.issn | 2329-9290 | - |
dc.identifier.uri | https://scholarworks.bwise.kr/hanyang/handle/2021.sw.hanyang/173113 | - |
dc.description.abstract | Developing a single-microphone speech denoising or dereverberation front-end for robust automatic speaker verification (ASV) in noisy far-field speaking scenarios is challenging. To address this problem, we present a novel front-end design that involves a recently proposed extension of the weighted prediction error (WPE) speech dereverberation algorithm, the virtual acoustic channel expansion (VACE)-WPE. It is demonstrated experimentally in this study that unlike the conventional WPE algorithm, the VACE-WPE can be explicitly trained to cancel out both late reverberation and background noise. To build the front-end, the VACE-WPE is first (pre)trained to preserve the noise components in the input signals and produce "noisy" dereverberated output signals, thus making the front-end to be inductively biased to preserve as much noise components as possible and perform dereverberation only. Subsequently, given a pretrained speaker embedding model, the VACE-WPE is additionally fine-tuned within a task-specific optimization (TSO) framework, causing the speaker embedding extracted from the processed signal to be similar to that extracted from the "noise-free" target signal. Consequently, the front-end is optimized not to perform unnecessarily excessive denoising, thus achieving "generally safe" dereverberation and denoising for far-field ASV. Moreover, to prevent the front-end from adversely affecting the unconstrained "in-the-wild" ASV performance under more general, non-far-field conditions, we propose a distortion regularization method within the TSO framework. The effectiveness of the proposed approach is verified on both far-field and in-the-wild ASV benchmarks, demonstrating its superiority over fully neural front-ends and other TSO methods in various cases. | - |
dc.language | 영어 | - |
dc.language.iso | en | - |
dc.publisher | IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC | - |
dc.title | Task-Specific Optimization of Virtual Channel Linear Prediction-Based Speech Dereverberation Front-End for Far-Field Speaker Verification | - |
dc.type | Article | - |
dc.contributor.affiliatedAuthor | Chang, Joon-Hyuk | - |
dc.identifier.doi | 10.1109/TASLP.2022.3205752 | - |
dc.identifier.scopusid | 2-s2.0-85139437720 | - |
dc.identifier.wosid | 000865086500002 | - |
dc.identifier.bibliographicCitation | IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, v.30, pp.3144 - 3159 | - |
dc.relation.isPartOf | IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING | - |
dc.citation.title | IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING | - |
dc.citation.volume | 30 | - |
dc.citation.startPage | 3144 | - |
dc.citation.endPage | 3159 | - |
dc.type.rims | ART | - |
dc.type.docType | Article | - |
dc.description.journalClass | 1 | - |
dc.description.isOpenAccess | N | - |
dc.description.journalRegisteredClass | scie | - |
dc.description.journalRegisteredClass | scopus | - |
dc.relation.journalResearchArea | Acoustics | - |
dc.relation.journalResearchArea | Engineering | - |
dc.relation.journalWebOfScienceCategory | Acoustics | - |
dc.relation.journalWebOfScienceCategory | Engineering, Electrical & Electronic | - |
dc.subject.keywordPlus | NEURAL-NETWORKS | - |
dc.subject.keywordPlus | ENHANCEMENT | - |
dc.subject.keywordPlus | VOICES | - |
dc.subject.keywordAuthor | Noise reduction | - |
dc.subject.keywordAuthor | Training | - |
dc.subject.keywordAuthor | Noise measurement | - |
dc.subject.keywordAuthor | Task analysis | - |
dc.subject.keywordAuthor | Optimization | - |
dc.subject.keywordAuthor | Microphones | - |
dc.subject.keywordAuthor | Reverberation | - |
dc.subject.keywordAuthor | Deep neural network | - |
dc.subject.keywordAuthor | offline processing | - |
dc.subject.keywordAuthor | speaker verification | - |
dc.subject.keywordAuthor | speech dereverberation | - |
dc.subject.keywordAuthor | single microphone | - |
dc.subject.keywordAuthor | virtual acoustic channel expansion | - |
dc.subject.keywordAuthor | weighted prediction error | - |
dc.identifier.url | https://ieeexplore.ieee.org/document/9889165 | - |
Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.
222, Wangsimni-ro, Seongdong-gu, Seoul, 04763, Korea+82-2-2220-1365
COPYRIGHT © 2021 HANYANG UNIVERSITY.
Certain data included herein are derived from the © Web of Science of Clarivate Analytics. All rights reserved.
You may not copy or re-distribute this material in whole or in part without the prior written consent of Clarivate Analytics.