Improved CNN-Transformer Using Broadcasted Residual Learning for Text-Independent Speaker Verification
- Authors
- Choi, Jeong-Hwan; Yang, Joon-Young; Jeoung, Ye-Rin; Chang, Joon-Hyuk
- Issue Date
- Sep-2022
- Publisher
- International Speech Communication Association
- Keywords
- attentive statistics pooling; hybrid deep neural network; Text-independent speaker verification; Transformer
- Citation
- Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, v.2022-September, pp.2223 - 2227
- Indexed
- SCOPUS
- Journal Title
- Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
- Volume
- 2022-September
- Start Page
- 2223
- End Page
- 2227
- URI
- https://scholarworks.bwise.kr/hanyang/handle/2021.sw.hanyang/173091
- DOI
- 10.21437/Interspeech.2022-88
- ISSN
- 2308-457X
- Abstract
- This study proposes a novel speaker embedding extractor architecture that effectively combines convolutional neural networks (CNNs) and Transformers. Based on the recently proposed CNNs-meet-vision-Transformers (CMT) architecture, we propose two strategies for efficient speaker embedding extraction modeling. First, we apply broadcast residual learning techniques to the building blocks of the CMT, allowing us to extract frequency-aware temporal features shared across frequency dimensions with a reduced set of parameters. Second, frequency-statistics-dependent attentive statistics pooling is proposed to aggregate attentive temporal statistics acquired from the means and standard deviations of input feature maps weighted along the frequency axis using an attention mechanism. The experimental results on the VoxCeleb-1 dataset show that the proposed model outperforms several CNN- and Transformer-based models with a similar number of model parameters. Moreover, the effectiveness of the proposed modifications to the CMT architecture is validated through ablation studies.
- Files in This Item
-
Go to Link
- Appears in
Collections - 서울 공과대학 > 서울 융합전자공학부 > 1. Journal Articles
![qrcode](https://api.qrserver.com/v1/create-qr-code/?size=55x55&data=https://scholarworks.bwise.kr/hanyang/handle/2021.sw.hanyang/173091)
Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.