Enhancing Multimodal Emotion Recognition through ASR Error Compensation and LLM Fine-Tuning

Kyung, Jehyun; Heo, Serin; Chang, Joon-Hyuk

doi:10.21437/Interspeech.2024-2364

Detailed Information

Cited 0 time in webofscience

Cited 0 time in scopus

Metadata Downloads

Enhancing Multimodal Emotion Recognition through ASR Error Compensation and LLM Fine-Tuning

Full metadata record

DC Field	Value	Language
dc.contributor.author	Kyung, Jehyun	-
dc.contributor.author	Heo, Serin	-
dc.contributor.author	Chang, Joon-Hyuk	-
dc.date.accessioned	2025-02-12T07:00:42Z	-
dc.date.available	2025-02-12T07:00:42Z	-
dc.date.issued	2024-09	-
dc.identifier.issn	1990-9772	-
dc.identifier.uri	https://scholarworks.bwise.kr/hanyang/handle/2021.sw.hanyang/206468	-
dc.description.abstract	Multimodal emotion recognition (MER), particularly using speech and text, is promising for enhancing human-computer interaction. However, the efficacy of such systems is often compromised by inaccuracies introduced during the automatic speech recognition (ASR) process. Addressing this, we present a comprehensive MER system that incorporates ways to make up for errors in ASR-generated text. Our system capitalizes on the strengths of speech signals and ASR-generated text, employing a cross-modal transformer (CMT) to blend these modalities effectively. We introduce a novel error compensation technique to counteract the detrimental effects of ASR inaccuracies and employ preference learning to fine-tune a large language model (LLM), thus improving its ability to distinguish slight emotional nuances in text. Performance of our proposed MER system is evaluated on the IEMOCAP dataset, demonstrating significant advancements in emotion recognition accuracy over conventional methods.	-
dc.format.extent	5	-
dc.language	영어	-
dc.language.iso	ENG	-
dc.title	Enhancing Multimodal Emotion Recognition through ASR Error Compensation and LLM Fine-Tuning	-
dc.type	Article	-
dc.identifier.doi	10.21437/Interspeech.2024-2364	-
dc.identifier.scopusid	2-s2.0-85214836130	-
dc.identifier.wosid	001331850104158	-
dc.identifier.bibliographicCitation	Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, pp 4683 - 4687	-
dc.citation.title	Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH	-
dc.citation.startPage	4683	-
dc.citation.endPage	4687	-
dc.type.docType	Proceedings Paper	-
dc.description.isOpenAccess	N	-
dc.description.journalRegisteredClass	scopus	-
dc.relation.journalResearchArea	Computer Science	-
dc.relation.journalWebOfScienceCategory	Computer Science, Artificial Intelligence	-
dc.subject.keywordPlus	Character recognition	-
dc.subject.keywordPlus	Emotion Recognition	-
dc.subject.keywordPlus	Speech enhancement	-
dc.subject.keywordPlus	Speech recognition	-
dc.subject.keywordAuthor	ASR error compensation	-
dc.subject.keywordAuthor	automatic speech recognition (ASR)	-
dc.subject.keywordAuthor	cross-modal transformer	-
dc.subject.keywordAuthor	large language model	-
dc.subject.keywordAuthor	multimodal emotion recognition	-

Files in This Item: There are no files associated with this item.

Appears in Collections: 서울 공과대학 > 서울 융합전자공학부 > 1. Journal Articles

Show simple item record

qrcode

Related Researcher

Researcher Chang, Joon-Hyuk photo

Chang, Joon-Hyuk: COLLEGE OF ENGINEERING (SCHOOL OF ELECTRONIC ENGINEERING)

Read more

Altmetrics

Total Views & Downloads

RSS_1.0 RSS_2.0 ATOM_1.0

222, Wangsimni-ro, Seongdong-gu, Seoul, 04763, Korea+82-2-2220-1366

Certain data included herein are derived from the © Web of Science of Clarivate Analytics. All rights reserved.
You may not copy or re-distribute this material in whole or in part without the prior written consent of Clarivate Analytics.

Detailed Information

Related Researcher

Altmetrics

Total Views & Downloads

BROWSE