Multimodal Emotion Recognition with Target Speaker-Based Facial Embeddings

Heo, Serin; Kyung, Jehyun; Chang, Joon-Hyuk

doi:10.1109/ICASSP49660.2025.10888205

Detailed Information

Cited 0 time in webofscience

Cited 0 time in scopus

Metadata Downloads

Multimodal Emotion Recognition with Target Speaker-Based Facial Embeddings

Full metadata record

DC Field	Value	Language
dc.contributor.author	Heo, Serin	-
dc.contributor.author	Kyung, Jehyun	-
dc.contributor.author	Chang, Joon-Hyuk	-
dc.date.accessioned	2025-05-28T02:00:10Z	-
dc.date.available	2025-05-28T02:00:10Z	-
dc.date.issued	2025-03	-
dc.identifier.issn	0736-7791	-
dc.identifier.issn	1520-6149	-
dc.identifier.uri	https://scholarworks.bwise.kr/hanyang/handle/2021.sw.hanyang/207455	-
dc.description.abstract	Effectively recognizing emotions requires sophisticated approaches for interpreting diverse modalities, particularly in real-world scenarios where multiple data sources, such as speech, text, and visual cues, are often noisy and incomplete. This study proposes an advanced multimodal emotion recognition system that integrates these three modalities by adding the speaker detection and extraction algorithm within visual data. The pre-trained Q-Former used in the proposed system then captures and interprets visual signals supported with designated prompts, resulting in facial-related features that significantly improve emotion recognition performance. We then utilize a cross-modal transformer to unify the visual, speech, and text embeddings for accurate emotion classification. We achieved a 2.9% and 3.3% improvement in accuracy and F1 score, respectively, on the MELD dataset compared to the baseline.	-
dc.format.extent	5	-
dc.language	영어	-
dc.language.iso	ENG	-
dc.publisher	Institute of Electrical and Electronics Engineers Inc.	-
dc.title	Multimodal Emotion Recognition with Target Speaker-Based Facial Embeddings	-
dc.type	Article	-
dc.publisher.location	미국	-
dc.identifier.doi	10.1109/ICASSP49660.2025.10888205	-
dc.identifier.scopusid	2-s2.0-105003892031	-
dc.identifier.bibliographicCitation	ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings, pp 1 - 5	-
dc.citation.title	ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings	-
dc.citation.startPage	1	-
dc.citation.endPage	5	-
dc.type.docType	Conference paper	-
dc.description.isOpenAccess	N	-
dc.description.journalRegisteredClass	scopus	-
dc.subject.keywordAuthor	cross-modal attention	-
dc.subject.keywordAuthor	multimodal emotion recognition	-
dc.subject.keywordAuthor	query transformer	-
dc.subject.keywordAuthor	target speaker	-
dc.identifier.url	https://ieeexplore.ieee.org/document/10888205	-

Files in This Item: Go to Link

Appears in Collections: 서울 공과대학 > 서울 융합전자공학부 > 1. Journal Articles

Show simple item record

qrcode

Related Researcher

Researcher Chang, Joon-Hyuk photo

Chang, Joon-Hyuk: COLLEGE OF ENGINEERING (SCHOOL OF ELECTRONIC ENGINEERING)

Read more

Altmetrics

Total Views & Downloads

RSS_1.0 RSS_2.0 ATOM_1.0

222, Wangsimni-ro, Seongdong-gu, Seoul, 04763, Korea+82-2-2220-1366

Certain data included herein are derived from the © Web of Science of Clarivate Analytics. All rights reserved.
You may not copy or re-distribute this material in whole or in part without the prior written consent of Clarivate Analytics.

Detailed Information

Related Researcher

Altmetrics

Total Views & Downloads

BROWSE