Detailed Information

Cited 0 time in webofscience Cited 0 time in scopus
Metadata Downloads

Task-Specific Optimization of Virtual Channel Linear Prediction-Based Speech Dereverberation Front-End for Far-Field Speaker Verification

Full metadata record
DC Field Value Language
dc.contributor.authorYang, Joon-Young-
dc.contributor.authorChang, Joon-Hyuk-
dc.date.accessioned2022-12-20T06:28:27Z-
dc.date.available2022-12-20T06:28:27Z-
dc.date.created2022-11-02-
dc.date.issued2022-09-
dc.identifier.issn2329-9290-
dc.identifier.urihttps://scholarworks.bwise.kr/hanyang/handle/2021.sw.hanyang/173113-
dc.description.abstractDeveloping a single-microphone speech denoising or dereverberation front-end for robust automatic speaker verification (ASV) in noisy far-field speaking scenarios is challenging. To address this problem, we present a novel front-end design that involves a recently proposed extension of the weighted prediction error (WPE) speech dereverberation algorithm, the virtual acoustic channel expansion (VACE)-WPE. It is demonstrated experimentally in this study that unlike the conventional WPE algorithm, the VACE-WPE can be explicitly trained to cancel out both late reverberation and background noise. To build the front-end, the VACE-WPE is first (pre)trained to preserve the noise components in the input signals and produce "noisy" dereverberated output signals, thus making the front-end to be inductively biased to preserve as much noise components as possible and perform dereverberation only. Subsequently, given a pretrained speaker embedding model, the VACE-WPE is additionally fine-tuned within a task-specific optimization (TSO) framework, causing the speaker embedding extracted from the processed signal to be similar to that extracted from the "noise-free" target signal. Consequently, the front-end is optimized not to perform unnecessarily excessive denoising, thus achieving "generally safe" dereverberation and denoising for far-field ASV. Moreover, to prevent the front-end from adversely affecting the unconstrained "in-the-wild" ASV performance under more general, non-far-field conditions, we propose a distortion regularization method within the TSO framework. The effectiveness of the proposed approach is verified on both far-field and in-the-wild ASV benchmarks, demonstrating its superiority over fully neural front-ends and other TSO methods in various cases.-
dc.language영어-
dc.language.isoen-
dc.publisherIEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC-
dc.titleTask-Specific Optimization of Virtual Channel Linear Prediction-Based Speech Dereverberation Front-End for Far-Field Speaker Verification-
dc.typeArticle-
dc.contributor.affiliatedAuthorChang, Joon-Hyuk-
dc.identifier.doi10.1109/TASLP.2022.3205752-
dc.identifier.scopusid2-s2.0-85139437720-
dc.identifier.wosid000865086500002-
dc.identifier.bibliographicCitationIEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, v.30, pp.3144 - 3159-
dc.relation.isPartOfIEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING-
dc.citation.titleIEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING-
dc.citation.volume30-
dc.citation.startPage3144-
dc.citation.endPage3159-
dc.type.rimsART-
dc.type.docTypeArticle-
dc.description.journalClass1-
dc.description.isOpenAccessN-
dc.description.journalRegisteredClassscie-
dc.description.journalRegisteredClassscopus-
dc.relation.journalResearchAreaAcoustics-
dc.relation.journalResearchAreaEngineering-
dc.relation.journalWebOfScienceCategoryAcoustics-
dc.relation.journalWebOfScienceCategoryEngineering, Electrical & Electronic-
dc.subject.keywordPlusNEURAL-NETWORKS-
dc.subject.keywordPlusENHANCEMENT-
dc.subject.keywordPlusVOICES-
dc.subject.keywordAuthorNoise reduction-
dc.subject.keywordAuthorTraining-
dc.subject.keywordAuthorNoise measurement-
dc.subject.keywordAuthorTask analysis-
dc.subject.keywordAuthorOptimization-
dc.subject.keywordAuthorMicrophones-
dc.subject.keywordAuthorReverberation-
dc.subject.keywordAuthorDeep neural network-
dc.subject.keywordAuthoroffline processing-
dc.subject.keywordAuthorspeaker verification-
dc.subject.keywordAuthorspeech dereverberation-
dc.subject.keywordAuthorsingle microphone-
dc.subject.keywordAuthorvirtual acoustic channel expansion-
dc.subject.keywordAuthorweighted prediction error-
dc.identifier.urlhttps://ieeexplore.ieee.org/document/9889165-
Files in This Item
Go to Link
Appears in
Collections
서울 공과대학 > 서울 융합전자공학부 > 1. Journal Articles

qrcode

Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.

Related Researcher

Researcher Chang, Joon-Hyuk photo

Chang, Joon-Hyuk
COLLEGE OF ENGINEERING (SCHOOL OF ELECTRONIC ENGINEERING)
Read more

Altmetrics

Total Views & Downloads

BROWSE