Detailed Information

Cited 0 time in webofscience Cited 0 time in scopus
Metadata Downloads

Enhancing Multimodal Emotion Recognition through ASR Error Compensation and LLM Fine-Tuning

Full metadata record
DC Field Value Language
dc.contributor.authorKyung, Jehyun-
dc.contributor.authorHeo, Serin-
dc.contributor.authorChang, Joon-Hyuk-
dc.date.accessioned2025-02-12T07:00:42Z-
dc.date.available2025-02-12T07:00:42Z-
dc.date.issued2024-09-
dc.identifier.issn1990-9772-
dc.identifier.urihttps://scholarworks.bwise.kr/hanyang/handle/2021.sw.hanyang/206468-
dc.description.abstractMultimodal emotion recognition (MER), particularly using speech and text, is promising for enhancing human-computer interaction. However, the efficacy of such systems is often compromised by inaccuracies introduced during the automatic speech recognition (ASR) process. Addressing this, we present a comprehensive MER system that incorporates ways to make up for errors in ASR-generated text. Our system capitalizes on the strengths of speech signals and ASR-generated text, employing a cross-modal transformer (CMT) to blend these modalities effectively. We introduce a novel error compensation technique to counteract the detrimental effects of ASR inaccuracies and employ preference learning to fine-tune a large language model (LLM), thus improving its ability to distinguish slight emotional nuances in text. Performance of our proposed MER system is evaluated on the IEMOCAP dataset, demonstrating significant advancements in emotion recognition accuracy over conventional methods.-
dc.format.extent5-
dc.language영어-
dc.language.isoENG-
dc.titleEnhancing Multimodal Emotion Recognition through ASR Error Compensation and LLM Fine-Tuning-
dc.typeArticle-
dc.identifier.doi10.21437/Interspeech.2024-2364-
dc.identifier.scopusid2-s2.0-85214836130-
dc.identifier.wosid001331850104158-
dc.identifier.bibliographicCitationProceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, pp 4683 - 4687-
dc.citation.titleProceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH-
dc.citation.startPage4683-
dc.citation.endPage4687-
dc.type.docTypeProceedings Paper-
dc.description.isOpenAccessN-
dc.description.journalRegisteredClassscopus-
dc.relation.journalResearchAreaComputer Science-
dc.relation.journalWebOfScienceCategoryComputer Science, Artificial Intelligence-
dc.subject.keywordPlusCharacter recognition-
dc.subject.keywordPlusEmotion Recognition-
dc.subject.keywordPlusSpeech enhancement-
dc.subject.keywordPlusSpeech recognition-
dc.subject.keywordAuthorASR error compensation-
dc.subject.keywordAuthorautomatic speech recognition (ASR)-
dc.subject.keywordAuthorcross-modal transformer-
dc.subject.keywordAuthorlarge language model-
dc.subject.keywordAuthormultimodal emotion recognition-
Files in This Item
There are no files associated with this item.
Appears in
Collections
서울 공과대학 > 서울 융합전자공학부 > 1. Journal Articles

qrcode

Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.

Related Researcher

Researcher Chang, Joon-Hyuk photo

Chang, Joon-Hyuk
COLLEGE OF ENGINEERING (SCHOOL OF ELECTRONIC ENGINEERING)
Read more

Altmetrics

Total Views & Downloads

BROWSE