Detailed Information

Cited 0 time in webofscience Cited 0 time in scopus
Metadata Downloads

Voice Spoofing Detection Through Residual Network, Max Feature Map, and Depthwise Separable Convolution

Full metadata record
DC Field Value Language
dc.contributor.authorKwak, Il-Youp-
dc.contributor.authorKwag, Sungsu-
dc.contributor.authorLee, Junhee-
dc.contributor.authorJeon, Youngbae-
dc.contributor.authorHwang, Jeonghwan-
dc.contributor.authorChoi, Hyo-Jung-
dc.contributor.authorYang, Jong-Hoon-
dc.contributor.authorHan, So-Yul-
dc.contributor.authorHuh, Jun Ho-
dc.contributor.authorLee, Choong-Hoon-
dc.contributor.authorYoon, Ji Won-
dc.date.accessioned2024-01-09T01:04:28Z-
dc.date.available2024-01-09T01:04:28Z-
dc.date.issued2023-
dc.identifier.issn2169-3536-
dc.identifier.urihttps://scholarworks.bwise.kr/cau/handle/2019.sw.cau/69709-
dc.description.abstractThe goal of the '2019 Automatic Speaker Verification Spoofing and Countermeasures Challenge' (ASVspoof) was to make it easier to create systems that could identify voice spoofing attacks with high levels of accuracy. However, model complexity and latency requirements were not emphasized in the competition, despite the fact that they are stringent requirements for implementation in the real world. The majority of the top-performing solutions from the competition used an ensemble technique that merged numerous sophisticated deep learning models to maximize detection accuracy. Those approaches struggle with real-world deployment restrictions for voice assistants which would have restricted resources. We merged skip connection (from ResNet) and max feature map (from Light CNN) to create a compact system, and we tested its performance using the ASVspoof 2019 dataset. Our single model achieved a replay attack detection equal error rate (EER) of 0.30% on the evaluation set using an optimized constant Q transform (CQT) feature, outperforming the top ensemble system in the competition, which scored an EER of 0.39%. We experimented using depthwise separable convolutions (from MobileNet) to reduce model sizes; this resulted in an 84.3 percent reduction in parameter count (from 286K to 45K), while maintaining similar performance (EER of 0.36%). Additionally, we used Grad-CAM to clarify which spectrogram regions significantly contribute to the detection of fake data. © 2013 IEEE.-
dc.format.extent13-
dc.language영어-
dc.language.isoENG-
dc.publisherInstitute of Electrical and Electronics Engineers Inc.-
dc.titleVoice Spoofing Detection Through Residual Network, Max Feature Map, and Depthwise Separable Convolution-
dc.typeArticle-
dc.identifier.doi10.1109/ACCESS.2023.3275790-
dc.identifier.bibliographicCitationIEEE Access, v.11, pp 49140 - 49152-
dc.description.isOpenAccessY-
dc.identifier.wosid001005689300001-
dc.identifier.scopusid2-s2.0-85161259283-
dc.citation.endPage49152-
dc.citation.startPage49140-
dc.citation.titleIEEE Access-
dc.citation.volume11-
dc.type.docTypeArticle-
dc.publisher.location미국-
dc.subject.keywordAuthorVoice assistant security-
dc.subject.keywordAuthorvoice presentation attack detection-
dc.subject.keywordAuthorvoice spoofing attack-
dc.subject.keywordAuthorvoice synthesis attack-
dc.relation.journalResearchAreaComputer Science-
dc.relation.journalResearchAreaEngineering-
dc.relation.journalResearchAreaTelecommunications-
dc.relation.journalWebOfScienceCategoryComputer Science, Information Systems-
dc.relation.journalWebOfScienceCategoryEngineering, Electrical & Electronic-
dc.relation.journalWebOfScienceCategoryTelecommunications-
dc.description.journalRegisteredClassscie-
dc.description.journalRegisteredClassscopus-
Files in This Item
Appears in
Collections
ETC > 1. Journal Articles

qrcode

Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.

Related Researcher

Researcher Kwak, Il-Youp photo

Kwak, Il-Youp
대학원 (통계데이터사이언스학과)
Read more

Altmetrics

Total Views & Downloads

BROWSE