Detailed Information

Cited 0 time in webofscience Cited 0 time in scopus
Metadata Downloads

Multimodal Emotion Recognition with Target Speaker-Based Facial Embeddings

Full metadata record
DC Field Value Language
dc.contributor.authorHeo, Serin-
dc.contributor.authorKyung, Jehyun-
dc.contributor.authorChang, Joon-Hyuk-
dc.date.accessioned2025-05-28T02:00:10Z-
dc.date.available2025-05-28T02:00:10Z-
dc.date.issued2025-03-
dc.identifier.issn0736-7791-
dc.identifier.issn1520-6149-
dc.identifier.urihttps://scholarworks.bwise.kr/hanyang/handle/2021.sw.hanyang/207455-
dc.description.abstractEffectively recognizing emotions requires sophisticated approaches for interpreting diverse modalities, particularly in real-world scenarios where multiple data sources, such as speech, text, and visual cues, are often noisy and incomplete. This study proposes an advanced multimodal emotion recognition system that integrates these three modalities by adding the speaker detection and extraction algorithm within visual data. The pre-trained Q-Former used in the proposed system then captures and interprets visual signals supported with designated prompts, resulting in facial-related features that significantly improve emotion recognition performance. We then utilize a cross-modal transformer to unify the visual, speech, and text embeddings for accurate emotion classification. We achieved a 2.9% and 3.3% improvement in accuracy and F1 score, respectively, on the MELD dataset compared to the baseline.-
dc.format.extent5-
dc.language영어-
dc.language.isoENG-
dc.publisherInstitute of Electrical and Electronics Engineers Inc.-
dc.titleMultimodal Emotion Recognition with Target Speaker-Based Facial Embeddings-
dc.typeArticle-
dc.publisher.location미국-
dc.identifier.doi10.1109/ICASSP49660.2025.10888205-
dc.identifier.scopusid2-s2.0-105003892031-
dc.identifier.bibliographicCitationICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings, pp 1 - 5-
dc.citation.titleICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings-
dc.citation.startPage1-
dc.citation.endPage5-
dc.type.docTypeConference paper-
dc.description.isOpenAccessN-
dc.description.journalRegisteredClassscopus-
dc.subject.keywordAuthorcross-modal attention-
dc.subject.keywordAuthormultimodal emotion recognition-
dc.subject.keywordAuthorquery transformer-
dc.subject.keywordAuthortarget speaker-
dc.identifier.urlhttps://ieeexplore.ieee.org/document/10888205-
Files in This Item
Go to Link
Appears in
Collections
서울 공과대학 > 서울 융합전자공학부 > 1. Journal Articles

qrcode

Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.

Related Researcher

Researcher Chang, Joon-Hyuk photo

Chang, Joon-Hyuk
COLLEGE OF ENGINEERING (SCHOOL OF ELECTRONIC ENGINEERING)
Read more

Altmetrics

Total Views & Downloads

BROWSE