Detailed Information

Cited 0 time in webofscience Cited 0 time in scopus
Metadata Downloads

CatchPhrase: EXPrompt-Guided Encoder Adaptation for Audio-to-Image Generationopen access

Authors
Oh, HyunwooCha, Seung-juLee, KwanyoungKim, Si-wooKim, Dongjin
Issue Date
Oct-2025
Publisher
Association for Computing Machinery, Inc
Keywords
audio to image generation; diffusion model; language-guided generation; multi-modal representation
Citation
MM 2025 - Proceedings of the 33rd ACM International Conference on Multimedia, Co-Located with MM 2025, pp 9773 - 9782
Pages
10
Indexed
SCOPUS
Journal Title
MM 2025 - Proceedings of the 33rd ACM International Conference on Multimedia, Co-Located with MM 2025
Start Page
9773
End Page
9782
URI
https://scholarworks.bwise.kr/hanyang/handle/2021.sw.hanyang/209918
DOI
10.1145/3746027.3755130
Abstract
We propose CatchPhrase, a novel audio-to-image generation framework designed to mitigate semantic misalignment between audio inputs and generated images. While recent advances in multi-modal encoders have enabled progress in cross-modal generation, ambiguity stemming from homographs and auditory illusions continues to hinder accurate alignment. To address this issue, CatchPhrase generates enriched cross-modal semantic prompts (EXPrompt Mining ) from weak class labels by leveraging large language models (LLMs) and audio captioning models (ACMs). To address both class-level and instance-level misalignment, we apply multi-modal filtering and retrieval to select the most semantically aligned prompt for each audio sample (EXPrompt Selector ). A lightweight mapping network is then trained to adapt pre-trained text-to-image generation models to audio input. Extensive experiments on multiple audio classification datasets demonstrate that CatchPhrase improves audio-to-image alignment and consistently enhances generation quality by mitigating semantic misalignment.
Files in This Item
Go to Link
Appears in
Collections
서울 공과대학 > ETC > 1. Journal Articles

qrcode

Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.

Related Researcher

Researcher Kim, Dong Jin photo

Kim, Dong Jin
COLLEGE OF ENGINEERING (DEPARTMENT OF INTELLIGENCE COMPUTING)
Read more

Altmetrics

Total Views & Downloads

BROWSE