Optimizing CLAP Reward with LLM Feedback for Semantically Aligned and Diverse Automated Audio Captioning

Ahn, Seyun; Byun, Pil Moo; Choi, Won-Gook; Chang, Joon-Hyuk

doi:10.21437/Interspeech.2025-1313

Detailed Information

Cited 0 time in webofscience

Cited 0 time in scopus

Metadata Downloads

Optimizing CLAP Reward with LLM Feedback for Semantically Aligned and Diverse Automated Audio Captioning

Authors: Ahn, Seyun; Byun, Pil Moo; Choi, Won-Gook; Chang, Joon-Hyuk

Issue Date: Aug-2025

Publisher: International Speech Communication Association

Keywords: Automated Audio Captioning; LLM; Pre-trained Model; Reinforcement Learning

Citation: Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, pp 3140 - 3144

Pages: 5

Indexed: SCOPUS

Journal Title: Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH

Start Page: 3140

End Page: 3144

URI: https://scholarworks.bwise.kr/hanyang/handle/2021.sw.hanyang/209224

DOI: 10.21437/Interspeech.2025-1313

ISSN: 2958-1796

Abstract: Deep learning-based automated audio captioning (AAC) systems describe audio well, yet they often overfit to reference styles. To address this, reinforcement learning (RL) techniques have been adopted to directly optimize evaluation metrics, but these methods often suffer from word repetition and contextual distortion. Embedding-based rewards, such as those derived from contrastive language-audio pretraining (CLAP), may bias the model toward specific words or phrases that human evaluators find unnatural. In this paper, we propose a novel reward system that combines a CLAP-based reward with a repetition penalty (CRRP) and a large language model (LLM) evaluator. CRRP computes rewards using CLAP similarity, applies a repetition penalty and reward clipping to stabilize training, and uses LLM feedback to enhance naturalness. Our method shows outstanding performance in semantic evaluations and both human and AI-based assessments, with results available at https://yunniya097.github.io/CRRP/.

Files in This Item: Go to Link

Appears in Collections: 서울 공과대학 > 서울 융합전자공학부 > 1. Journal Articles

Show full item record

qrcode

Related Researcher

Researcher Chang, Joon-Hyuk photo

Chang, Joon-Hyuk: COLLEGE OF ENGINEERING (SCHOOL OF ELECTRONIC ENGINEERING)

Read more

Altmetrics

Total Views & Downloads

RSS_1.0 RSS_2.0 ATOM_1.0

222, Wangsimni-ro, Seongdong-gu, Seoul, 04763, Korea+82-2-2220-1366

Certain data included herein are derived from the © Web of Science of Clarivate Analytics. All rights reserved.
You may not copy or re-distribute this material in whole or in part without the prior written consent of Clarivate Analytics.

Detailed Information

Related Researcher

Altmetrics

Total Views & Downloads

BROWSE