Detailed Information

Cited 0 time in webofscience Cited 0 time in scopus
Metadata Downloads

Resolution Consistency Training on Time-Frequency Domain for Semi-Supervised Sound Event Detection

Full metadata record
DC Field Value Language
dc.contributor.authorChoi, Won-Gook-
dc.contributor.authorChang, Joon-Hyuk-
dc.date.accessioned2023-10-10T02:35:13Z-
dc.date.available2023-10-10T02:35:13Z-
dc.date.created2023-10-04-
dc.date.issued2023-08-
dc.identifier.issn2308-457X-
dc.identifier.urihttps://scholarworks.bwise.kr/hanyang/handle/2021.sw.hanyang/191789-
dc.description.abstractThe fact that unlabeled data can be used for supervised learning is of considerable relevance concerning polyphonic sound event detection (PSED) because of the high costs of frame-wise labeling. While semi-supervised learning (SSL) for image tasks has been extensively developed, SSL for PSED has not been substantially explored due to data augmentation limitations. In this paper, we propose a novel SSL strategy for PSED called resolution consistency training (ResCT), combining unsupervised terms with the mean teacher using different resolutions of a spectrogram for data augmentation. The proposed method regularizes the consistency between the model predictions for different resolutions by controlling the sampling rate and window size. Experimental results show that ResCT outperforms other SSL methods on various evaluation metrics: event-f1 score, intersection-f1 score, and PSDSs. Finally, we report on some ablation studies for the weak and strong augmentation policies.-
dc.language영어-
dc.language.isoen-
dc.publisherInternational Speech Communication Association-
dc.titleResolution Consistency Training on Time-Frequency Domain for Semi-Supervised Sound Event Detection-
dc.typeArticle-
dc.contributor.affiliatedAuthorChang, Joon-Hyuk-
dc.identifier.doi10.21437/Interspeech.2023-350-
dc.identifier.scopusid2-s2.0-85171547365-
dc.identifier.bibliographicCitationProceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, v.2023-August, pp.286 - 290-
dc.relation.isPartOfProceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH-
dc.citation.titleProceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH-
dc.citation.volume2023-August-
dc.citation.startPage286-
dc.citation.endPage290-
dc.type.rimsART-
dc.type.docTypeConference paper-
dc.description.journalClass1-
dc.description.isOpenAccessN-
dc.description.journalRegisteredClassscopus-
dc.subject.keywordPlusFrequency domain analysis-
dc.subject.keywordPlusSpeech communication-
dc.subject.keywordPlusSupervised learning-
dc.subject.keywordPlusPersonnel training-
dc.subject.keywordPlusData augmentation-
dc.subject.keywordPlusDifferent resolutions-
dc.subject.keywordPlusF1 scores-
dc.subject.keywordPlusMulti-resolutional training-
dc.subject.keywordPlusPolyphonic sounds-
dc.subject.keywordPlusSemi-supervised-
dc.subject.keywordPlusSemi-supervised learning-
dc.subject.keywordPlusSound event detection-
dc.subject.keywordPlusTime frequency domain-
dc.subject.keywordPlusUnlabeled data-
dc.subject.keywordAuthordata augmentation-
dc.subject.keywordAuthormulti-resolutional training-
dc.subject.keywordAuthorsemi-supervised learning-
dc.subject.keywordAuthorsound event detection-
dc.identifier.urlhttps://www.isca-speech.org/archive/interspeech_2023/choi23b_interspeech.html-
Files in This Item
Go to Link
Appears in
Collections
서울 공과대학 > 서울 융합전자공학부 > 1. Journal Articles

qrcode

Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.

Related Researcher

Researcher Chang, Joon-Hyuk photo

Chang, Joon-Hyuk
COLLEGE OF ENGINEERING (SCHOOL OF ELECTRONIC ENGINEERING)
Read more

Altmetrics

Total Views & Downloads

BROWSE