Detailed Information

Cited 0 time in webofscience Cited 0 time in scopus
Metadata Downloads

Improved CNN-Transformer Using Broadcasted Residual Learning for Text-Independent Speaker Verification

Full metadata record
DC Field Value Language
dc.contributor.authorChoi, Jeong-Hwan-
dc.contributor.authorYang, Joon-Young-
dc.contributor.authorJeoung, Ye-Rin-
dc.contributor.authorChang, Joon-Hyuk-
dc.date.accessioned2022-12-20T06:25:17Z-
dc.date.available2022-12-20T06:25:17Z-
dc.date.created2022-11-02-
dc.date.issued2022-09-
dc.identifier.issn2308-457X-
dc.identifier.urihttps://scholarworks.bwise.kr/hanyang/handle/2021.sw.hanyang/173091-
dc.description.abstractThis study proposes a novel speaker embedding extractor architecture that effectively combines convolutional neural networks (CNNs) and Transformers. Based on the recently proposed CNNs-meet-vision-Transformers (CMT) architecture, we propose two strategies for efficient speaker embedding extraction modeling. First, we apply broadcast residual learning techniques to the building blocks of the CMT, allowing us to extract frequency-aware temporal features shared across frequency dimensions with a reduced set of parameters. Second, frequency-statistics-dependent attentive statistics pooling is proposed to aggregate attentive temporal statistics acquired from the means and standard deviations of input feature maps weighted along the frequency axis using an attention mechanism. The experimental results on the VoxCeleb-1 dataset show that the proposed model outperforms several CNN- and Transformer-based models with a similar number of model parameters. Moreover, the effectiveness of the proposed modifications to the CMT architecture is validated through ablation studies.-
dc.language영어-
dc.language.isoen-
dc.publisherInternational Speech Communication Association-
dc.titleImproved CNN-Transformer Using Broadcasted Residual Learning for Text-Independent Speaker Verification-
dc.typeArticle-
dc.contributor.affiliatedAuthorChang, Joon-Hyuk-
dc.identifier.doi10.21437/Interspeech.2022-88-
dc.identifier.scopusid2-s2.0-85140099933-
dc.identifier.wosid000900724502080-
dc.identifier.bibliographicCitationProceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, v.2022-September, pp.2223 - 2227-
dc.relation.isPartOfProceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH-
dc.citation.titleProceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH-
dc.citation.volume2022-September-
dc.citation.startPage2223-
dc.citation.endPage2227-
dc.type.rimsART-
dc.type.docTypeProceedings Paper-
dc.description.journalClass1-
dc.description.isOpenAccessN-
dc.description.journalRegisteredClassscopus-
dc.relation.journalResearchAreaAcoustics-
dc.relation.journalResearchAreaAudiology & Speech-Language Pathology-
dc.relation.journalResearchAreaComputer Science-
dc.relation.journalResearchAreaEngineering-
dc.relation.journalWebOfScienceCategoryAcoustics-
dc.relation.journalWebOfScienceCategoryAudiology & Speech-Language Pathology-
dc.relation.journalWebOfScienceCategoryComputer Science, Artificial Intelligence-
dc.relation.journalWebOfScienceCategoryEngineering, Electrical & Electronic-
dc.subject.keywordPlusConvolutional neural networks-
dc.subject.keywordPlusEmbeddings-
dc.subject.keywordPlusNetwork architecture-
dc.subject.keywordPlusSpeech communication-
dc.subject.keywordPlusSpeech recognition-
dc.subject.keywordPlusDeep neural networks-
dc.subject.keywordPlusAttentive statistic pooling-
dc.subject.keywordPlusBuilding blockes-
dc.subject.keywordPlusConvolutional neural network-
dc.subject.keywordPlusEmbeddings-
dc.subject.keywordPlusExtraction modeling-
dc.subject.keywordPlusHybrid deep neural network-
dc.subject.keywordPlusLearning techniques-
dc.subject.keywordPlusTemporal features-
dc.subject.keywordPlusText-independent speaker verification-
dc.subject.keywordPlusTransformer-
dc.subject.keywordAuthorattentive statistics pooling-
dc.subject.keywordAuthorhybrid deep neural network-
dc.subject.keywordAuthorText-independent speaker verification-
dc.subject.keywordAuthorTransformer-
dc.identifier.urlhttps://www.isca-speech.org/archive/interspeech_2022/choi22_interspeech.html-
Files in This Item
Go to Link
Appears in
Collections
서울 공과대학 > 서울 융합전자공학부 > 1. Journal Articles

qrcode

Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.

Related Researcher

Researcher Chang, Joon-Hyuk photo

Chang, Joon-Hyuk
COLLEGE OF ENGINEERING (SCHOOL OF ELECTRONIC ENGINEERING)
Read more

Altmetrics

Total Views & Downloads

BROWSE