Resolution Consistency Training on Time-Frequency Domain for Semi-Supervised Sound Event Detection
DC Field | Value | Language |
---|---|---|
dc.contributor.author | Choi, Won-Gook | - |
dc.contributor.author | Chang, Joon-Hyuk | - |
dc.date.accessioned | 2023-10-10T02:35:13Z | - |
dc.date.available | 2023-10-10T02:35:13Z | - |
dc.date.created | 2023-10-04 | - |
dc.date.issued | 2023-08 | - |
dc.identifier.issn | 2308-457X | - |
dc.identifier.uri | https://scholarworks.bwise.kr/hanyang/handle/2021.sw.hanyang/191789 | - |
dc.description.abstract | The fact that unlabeled data can be used for supervised learning is of considerable relevance concerning polyphonic sound event detection (PSED) because of the high costs of frame-wise labeling. While semi-supervised learning (SSL) for image tasks has been extensively developed, SSL for PSED has not been substantially explored due to data augmentation limitations. In this paper, we propose a novel SSL strategy for PSED called resolution consistency training (ResCT), combining unsupervised terms with the mean teacher using different resolutions of a spectrogram for data augmentation. The proposed method regularizes the consistency between the model predictions for different resolutions by controlling the sampling rate and window size. Experimental results show that ResCT outperforms other SSL methods on various evaluation metrics: event-f1 score, intersection-f1 score, and PSDSs. Finally, we report on some ablation studies for the weak and strong augmentation policies. | - |
dc.language | 영어 | - |
dc.language.iso | en | - |
dc.publisher | International Speech Communication Association | - |
dc.title | Resolution Consistency Training on Time-Frequency Domain for Semi-Supervised Sound Event Detection | - |
dc.type | Article | - |
dc.contributor.affiliatedAuthor | Chang, Joon-Hyuk | - |
dc.identifier.doi | 10.21437/Interspeech.2023-350 | - |
dc.identifier.scopusid | 2-s2.0-85171547365 | - |
dc.identifier.bibliographicCitation | Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, v.2023-August, pp.286 - 290 | - |
dc.relation.isPartOf | Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH | - |
dc.citation.title | Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH | - |
dc.citation.volume | 2023-August | - |
dc.citation.startPage | 286 | - |
dc.citation.endPage | 290 | - |
dc.type.rims | ART | - |
dc.type.docType | Conference paper | - |
dc.description.journalClass | 1 | - |
dc.description.isOpenAccess | N | - |
dc.description.journalRegisteredClass | scopus | - |
dc.subject.keywordPlus | Frequency domain analysis | - |
dc.subject.keywordPlus | Speech communication | - |
dc.subject.keywordPlus | Supervised learning | - |
dc.subject.keywordPlus | Personnel training | - |
dc.subject.keywordPlus | Data augmentation | - |
dc.subject.keywordPlus | Different resolutions | - |
dc.subject.keywordPlus | F1 scores | - |
dc.subject.keywordPlus | Multi-resolutional training | - |
dc.subject.keywordPlus | Polyphonic sounds | - |
dc.subject.keywordPlus | Semi-supervised | - |
dc.subject.keywordPlus | Semi-supervised learning | - |
dc.subject.keywordPlus | Sound event detection | - |
dc.subject.keywordPlus | Time frequency domain | - |
dc.subject.keywordPlus | Unlabeled data | - |
dc.subject.keywordAuthor | data augmentation | - |
dc.subject.keywordAuthor | multi-resolutional training | - |
dc.subject.keywordAuthor | semi-supervised learning | - |
dc.subject.keywordAuthor | sound event detection | - |
dc.identifier.url | https://www.isca-speech.org/archive/interspeech_2023/choi23b_interspeech.html | - |
Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.
222, Wangsimni-ro, Seongdong-gu, Seoul, 04763, Korea+82-2-2220-1365
COPYRIGHT © 2021 HANYANG UNIVERSITY.
Certain data included herein are derived from the © Web of Science of Clarivate Analytics. All rights reserved.
You may not copy or re-distribute this material in whole or in part without the prior written consent of Clarivate Analytics.