Cited 0 time in
Enhancing Multimodal Emotion Recognition through ASR Error Compensation and LLM Fine-Tuning
| DC Field | Value | Language |
|---|---|---|
| dc.contributor.author | Kyung, Jehyun | - |
| dc.contributor.author | Heo, Serin | - |
| dc.contributor.author | Chang, Joon-Hyuk | - |
| dc.date.accessioned | 2025-02-12T07:00:42Z | - |
| dc.date.available | 2025-02-12T07:00:42Z | - |
| dc.date.issued | 2024-09 | - |
| dc.identifier.issn | 1990-9772 | - |
| dc.identifier.uri | https://scholarworks.bwise.kr/hanyang/handle/2021.sw.hanyang/206468 | - |
| dc.description.abstract | Multimodal emotion recognition (MER), particularly using speech and text, is promising for enhancing human-computer interaction. However, the efficacy of such systems is often compromised by inaccuracies introduced during the automatic speech recognition (ASR) process. Addressing this, we present a comprehensive MER system that incorporates ways to make up for errors in ASR-generated text. Our system capitalizes on the strengths of speech signals and ASR-generated text, employing a cross-modal transformer (CMT) to blend these modalities effectively. We introduce a novel error compensation technique to counteract the detrimental effects of ASR inaccuracies and employ preference learning to fine-tune a large language model (LLM), thus improving its ability to distinguish slight emotional nuances in text. Performance of our proposed MER system is evaluated on the IEMOCAP dataset, demonstrating significant advancements in emotion recognition accuracy over conventional methods. | - |
| dc.format.extent | 5 | - |
| dc.language | 영어 | - |
| dc.language.iso | ENG | - |
| dc.title | Enhancing Multimodal Emotion Recognition through ASR Error Compensation and LLM Fine-Tuning | - |
| dc.type | Article | - |
| dc.identifier.doi | 10.21437/Interspeech.2024-2364 | - |
| dc.identifier.scopusid | 2-s2.0-85214836130 | - |
| dc.identifier.wosid | 001331850104158 | - |
| dc.identifier.bibliographicCitation | Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, pp 4683 - 4687 | - |
| dc.citation.title | Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH | - |
| dc.citation.startPage | 4683 | - |
| dc.citation.endPage | 4687 | - |
| dc.type.docType | Proceedings Paper | - |
| dc.description.isOpenAccess | N | - |
| dc.description.journalRegisteredClass | scopus | - |
| dc.relation.journalResearchArea | Computer Science | - |
| dc.relation.journalWebOfScienceCategory | Computer Science, Artificial Intelligence | - |
| dc.subject.keywordPlus | Character recognition | - |
| dc.subject.keywordPlus | Emotion Recognition | - |
| dc.subject.keywordPlus | Speech enhancement | - |
| dc.subject.keywordPlus | Speech recognition | - |
| dc.subject.keywordAuthor | ASR error compensation | - |
| dc.subject.keywordAuthor | automatic speech recognition (ASR) | - |
| dc.subject.keywordAuthor | cross-modal transformer | - |
| dc.subject.keywordAuthor | large language model | - |
| dc.subject.keywordAuthor | multimodal emotion recognition | - |
Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.
222, Wangsimni-ro, Seongdong-gu, Seoul, 04763, Korea+82-2-2220-1366
COPYRIGHT © 2024 HANYANG UNIVERSITY.
Certain data included herein are derived from the © Web of Science of Clarivate Analytics. All rights reserved.
You may not copy or re-distribute this material in whole or in part without the prior written consent of Clarivate Analytics.
