Detailed Information

Cited 0 time in webofscience Cited 0 time in scopus
Metadata Downloads

Sound of Vision: Audio Generation from Visual Text Embedding through Training Domain Discriminator

Authors
Kim, JaewonChoi, Won-GookAhn, SeyunChang, Joon-Hyuk
Issue Date
Sep-2024
Keywords
audio generation; multi-modal; text embedding
Citation
Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, pp 3305 - 3309
Pages
5
Indexed
SCOPUS
Journal Title
Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
Start Page
3305
End Page
3309
URI
https://scholarworks.bwise.kr/hanyang/handle/2021.sw.hanyang/206467
DOI
10.21437/Interspeech.2024-1451
ISSN
1990-9772
Abstract
Recent advancements in text-to-audio (TTA) models have demonstrated their ability to generate sound that aligns with user intentions. Despite this advancement, a notable limitation arises from the models' inability to effectively synthesize audio from visual-domain texts. In this study, we address this challenge by utilizing a novel dataset that pairs visual and acoustic-domain texts, derived using ChatGPT-3.5, and encoding switch through a domain discriminator. This approach ensures not only computational efficiency but also enhances the model's generalization, adaptability, and flexibility. It addresses concerns that training exclusively with visual texts might compromise audio generation quality from audio texts. This study presents a novel methodology for enhancing text-to-audio synthesis, demonstrating significant improvements in audio output fidelity from visual-text inputs.
Files in This Item
There are no files associated with this item.
Appears in
Collections
서울 공과대학 > 서울 융합전자공학부 > 1. Journal Articles

qrcode

Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.

Related Researcher

Researcher Chang, Joon-Hyuk photo

Chang, Joon-Hyuk
COLLEGE OF ENGINEERING (SCHOOL OF ELECTRONIC ENGINEERING)
Read more

Altmetrics

Total Views & Downloads

BROWSE