Resolution Consistency Training on Time-Frequency Domain for Semi-Supervised Sound Event Detection

Choi, Won-Gook; Chang, Joon-Hyuk

doi:10.21437/Interspeech.2023-350

Detailed Information

Cited 0 time in webofscience

Cited 0 time in scopus

Metadata Downloads

Resolution Consistency Training on Time-Frequency Domain for Semi-Supervised Sound Event Detection

Full metadata record

DC Field	Value	Language
dc.contributor.author	Choi, Won-Gook	-
dc.contributor.author	Chang, Joon-Hyuk	-
dc.date.accessioned	2023-10-10T02:35:13Z	-
dc.date.available	2023-10-10T02:35:13Z	-
dc.date.created	2023-10-04	-
dc.date.issued	2023-08	-
dc.identifier.issn	2308-457X	-
dc.identifier.uri	https://scholarworks.bwise.kr/hanyang/handle/2021.sw.hanyang/191789	-
dc.description.abstract	The fact that unlabeled data can be used for supervised learning is of considerable relevance concerning polyphonic sound event detection (PSED) because of the high costs of frame-wise labeling. While semi-supervised learning (SSL) for image tasks has been extensively developed, SSL for PSED has not been substantially explored due to data augmentation limitations. In this paper, we propose a novel SSL strategy for PSED called resolution consistency training (ResCT), combining unsupervised terms with the mean teacher using different resolutions of a spectrogram for data augmentation. The proposed method regularizes the consistency between the model predictions for different resolutions by controlling the sampling rate and window size. Experimental results show that ResCT outperforms other SSL methods on various evaluation metrics: event-f1 score, intersection-f1 score, and PSDSs. Finally, we report on some ablation studies for the weak and strong augmentation policies.	-
dc.language	영어	-
dc.language.iso	en	-
dc.publisher	International Speech Communication Association	-
dc.title	Resolution Consistency Training on Time-Frequency Domain for Semi-Supervised Sound Event Detection	-
dc.type	Article	-
dc.contributor.affiliatedAuthor	Chang, Joon-Hyuk	-
dc.identifier.doi	10.21437/Interspeech.2023-350	-
dc.identifier.scopusid	2-s2.0-85171547365	-
dc.identifier.bibliographicCitation	Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, v.2023-August, pp.286 - 290	-
dc.relation.isPartOf	Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH	-
dc.citation.title	Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH	-
dc.citation.volume	2023-August	-
dc.citation.startPage	286	-
dc.citation.endPage	290	-
dc.type.rims	ART	-
dc.type.docType	Conference paper	-
dc.description.journalClass	1	-
dc.description.isOpenAccess	N	-
dc.description.journalRegisteredClass	scopus	-
dc.subject.keywordPlus	Frequency domain analysis	-
dc.subject.keywordPlus	Speech communication	-
dc.subject.keywordPlus	Supervised learning	-
dc.subject.keywordPlus	Personnel training	-
dc.subject.keywordPlus	Data augmentation	-
dc.subject.keywordPlus	Different resolutions	-
dc.subject.keywordPlus	F1 scores	-
dc.subject.keywordPlus	Multi-resolutional training	-
dc.subject.keywordPlus	Polyphonic sounds	-
dc.subject.keywordPlus	Semi-supervised	-
dc.subject.keywordPlus	Semi-supervised learning	-
dc.subject.keywordPlus	Sound event detection	-
dc.subject.keywordPlus	Time frequency domain	-
dc.subject.keywordPlus	Unlabeled data	-
dc.subject.keywordAuthor	data augmentation	-
dc.subject.keywordAuthor	multi-resolutional training	-
dc.subject.keywordAuthor	semi-supervised learning	-
dc.subject.keywordAuthor	sound event detection	-
dc.identifier.url	https://www.isca-speech.org/archive/interspeech_2023/choi23b_interspeech.html	-

Files in This Item: Go to Link

Appears in Collections: 서울 공과대학 > 서울 융합전자공학부 > 1. Journal Articles

Show simple item record

qrcode

Related Researcher

Researcher Chang, Joon-Hyuk photo

Chang, Joon-Hyuk: COLLEGE OF ENGINEERING (SCHOOL OF ELECTRONIC ENGINEERING)

Read more

Altmetrics

Total Views & Downloads

STATISTICS: Total View :6,007,935; Today View :34,677

RSS_1.0 RSS_2.0 ATOM_1.0

222, Wangsimni-ro, Seongdong-gu, Seoul, 04763, Korea+82-2-2220-1365

Certain data included herein are derived from the © Web of Science of Clarivate Analytics. All rights reserved.
You may not copy or re-distribute this material in whole or in part without the prior written consent of Clarivate Analytics.

Detailed Information

Related Researcher

Altmetrics

Total Views & Downloads

BROWSE