Experimental Case Study of Self-supervised Learning for Voice Spoofing Detection

Lee, Y.; Kim, N.; Jeong, J.; Kwak, I.

doi:10.1109/ACCESS.2023.3254880

Detailed Information

Cited 0 time in webofscience

Cited 0 time in scopus

Metadata Downloads

Experimental Case Study of Self-supervised Learning for Voice Spoofing Detectionopen access

Authors: Lee, Y.; Kim, N.; Jeong, J.; Kwak, I.

Issue Date: 2023

Publisher: Institute of Electrical and Electronics Engineers Inc.

Keywords: Contrastive learning; Deep learning; Microphones; Self-supervised learning; Speech processing; Spoofing detection; Supervised learning; Task analysis; Training; self-supervised learning

Citation: IEEE Access, v.11, pp 24216 - 24226

Pages: 11

Journal Title: IEEE Access

Volume: 11

Start Page: 24216

End Page: 24226

URI: https://scholarworks.bwise.kr/cau/handle/2019.sw.cau/69760

DOI: 10.1109/ACCESS.2023.3254880

ISSN: 2169-3536

Abstract: This study aims to improve the performance of voice spoofing attack detection through self-supervised pre-training. Supervised learning needs appropriate input variables and corresponding labels for constructing the machine learning models that are to be applied. It is necessary to secure a large number of labeled datasets to improve the performance of supervised learning processes. However, labeling requires substantial inputs of time and effort. One of the methods for managing this requirement is self-supervised learning, which uses pseudo-labeling without the necessity for substantial human input. This study experimented with contrastive learning, a well-performing self-supervised learning approach, to construct a voice spoofing detection model. We applied MoCo's dynamic dictionary, SimCLR's symmetric loss, and COLA's bilinear similarity in our contrastive learning framework. Our model was trained using VoxCeleb data and voice data extracted from YouTube videos. Our self-supervised model improved the performance of the baseline model from 6.93% to 5.26% for a logical access (LA) scenario and improved the performance of the baseline model from 0.60% to 0.40% for a physical access (PA) scenario. In the case of PA, the best performance was achieved when random crop augmentation was applied, and in the case of LA, the best performance was obtained when random crop and random shifting augmentations were considered.

Files in This Item

Experimental Case Study of Self-supervised Learning for Voice Spoofing Detection.pdf 1.4 MB

Appears in Collections: ETC > 1. Journal Articles

Show full item record

qrcode

Related Researcher

Researcher Kwak, Il-Youp photo

Kwak, Il-Youp: 대학원 (통계데이터사이언스학과)

Read more

Altmetrics

Total Views & Downloads

STATISTICS: Total View :7,450,664; Today View :34

RSS_1.0 RSS_2.0 ATOM_1.0

84, Heukseok-ro, Dongjak-gu, Seoul, Republic of Korea (06974)02-820-6194

Certain data included herein are derived from the © Web of Science of Clarivate Analytics. All rights reserved.
You may not copy or re-distribute this material in whole or in part without the prior written consent of Clarivate Analytics.

Detailed Information

Related Researcher

Altmetrics

Total Views & Downloads

BROWSE