Detailed Information

Cited 0 time in webofscience Cited 0 time in scopus
Metadata Downloads

Retrieval-Augmented Classifier Guidance for Audio Generation

Full metadata record
DC Field Value Language
dc.contributor.authorChoi, Ho-Young-
dc.contributor.authorChoi, Won-Gook-
dc.contributor.authorChang, Joon-Hyuk-
dc.date.accessioned2025-02-12T08:00:30Z-
dc.date.available2025-02-12T08:00:30Z-
dc.date.issued2024-09-
dc.identifier.issn1990-9772-
dc.identifier.urihttps://scholarworks.bwise.kr/hanyang/handle/2021.sw.hanyang/206469-
dc.description.abstractMost audio datasets utilized for training in the audio generation fields are low-quality, leading to difficulties in the generation of high-quality, single-event audio. However, to acquire single-event audio with noise-free, high costs are incurred. In this paper, we propose a simple retrieval-augmented classifier-guided sampling strategy for foley sound synthesis. Specifically, to guide the diffusion model during sampling with classifier guidance, given an input class, we first retrieve relevant audio features by utilizing a Contrastive Language-Audio Pretraining model. The gradients from a classifier for the retrieved audio features are then calculated to serve as additional guidance. Our evaluation, conducted on the DCASE 2023 challenge task 7 dataset, demonstrates that our proposed method overall improves a Frechet audio distance score.-
dc.format.extent5-
dc.language영어-
dc.language.isoENG-
dc.titleRetrieval-Augmented Classifier Guidance for Audio Generation-
dc.typeArticle-
dc.identifier.doi10.21437/Interspeech.2024-1456-
dc.identifier.scopusid2-s2.0-85214809514-
dc.identifier.wosid001331850103085-
dc.identifier.bibliographicCitationProceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, pp 3310 - 3314-
dc.citation.titleProceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH-
dc.citation.startPage3310-
dc.citation.endPage3314-
dc.type.docTypeProceedings Paper-
dc.description.isOpenAccessN-
dc.description.journalRegisteredClassscopus-
dc.relation.journalResearchAreaComputer Science-
dc.relation.journalWebOfScienceCategoryComputer Science, Artificial Intelligence-
dc.subject.keywordPlusAudio features-
dc.subject.keywordPlusAudio generation-
dc.subject.keywordPlusClassifier guidance-
dc.subject.keywordPlusDataset scarcity-
dc.subject.keywordPlusHigh costs-
dc.subject.keywordPlusHigh quality-
dc.subject.keywordPlusLow qualities-
dc.subject.keywordPlusRetrieval augmented classifier-guided sampling-
dc.subject.keywordPlusSimple++-
dc.subject.keywordPlusSingle event-
dc.subject.keywordAuthorAudio generation-
dc.subject.keywordAuthorclassifier guidance-
dc.subject.keywordAuthordataset scarcity-
dc.subject.keywordAuthorretrieval augmented classifier-guided sampling-
dc.identifier.urlhttps://www.isca-archive.org/interspeech_2024/choi24c_interspeech.html-
Files in This Item
Go to Link
Appears in
Collections
서울 공과대학 > 서울 융합전자공학부 > 1. Journal Articles

qrcode

Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.

Related Researcher

Researcher Chang, Joon-Hyuk photo

Chang, Joon-Hyuk
COLLEGE OF ENGINEERING (SCHOOL OF ELECTRONIC ENGINEERING)
Read more

Altmetrics

Total Views & Downloads

BROWSE