Resolution Consistency Training on Time-Frequency Domain for Semi-Supervised Sound Event Detection
- Authors
- Choi, Won-Gook; Chang, Joon-Hyuk
- Issue Date
- Aug-2023
- Publisher
- International Speech Communication Association
- Keywords
- data augmentation; multi-resolutional training; semi-supervised learning; sound event detection
- Citation
- Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, v.2023-August, pp.286 - 290
- Indexed
- SCOPUS
- Journal Title
- Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
- Volume
- 2023-August
- Start Page
- 286
- End Page
- 290
- URI
- https://scholarworks.bwise.kr/hanyang/handle/2021.sw.hanyang/191789
- DOI
- 10.21437/Interspeech.2023-350
- ISSN
- 2308-457X
- Abstract
- The fact that unlabeled data can be used for supervised learning is of considerable relevance concerning polyphonic sound event detection (PSED) because of the high costs of frame-wise labeling. While semi-supervised learning (SSL) for image tasks has been extensively developed, SSL for PSED has not been substantially explored due to data augmentation limitations. In this paper, we propose a novel SSL strategy for PSED called resolution consistency training (ResCT), combining unsupervised terms with the mean teacher using different resolutions of a spectrogram for data augmentation. The proposed method regularizes the consistency between the model predictions for different resolutions by controlling the sampling rate and window size. Experimental results show that ResCT outperforms other SSL methods on various evaluation metrics: event-f1 score, intersection-f1 score, and PSDSs. Finally, we report on some ablation studies for the weak and strong augmentation policies.
- Files in This Item
-
Go to Link
- Appears in
Collections - 서울 공과대학 > 서울 융합전자공학부 > 1. Journal Articles
![qrcode](https://api.qrserver.com/v1/create-qr-code/?size=55x55&data=https://scholarworks.bwise.kr/hanyang/handle/2021.sw.hanyang/191789)
Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.