Real-Time Multimodal Turn-taking Prediction to Enhance Cooperative Dialogue during Human-Agent Interaction

Bae, Young-Ho; Bennett, Casey C.

doi:10.1109/RO-MAN57019.2023.10309569

Detailed Information

Cited 0 time in webofscience

Cited 0 time in scopus

Metadata Downloads

Real-Time Multimodal Turn-taking Prediction to Enhance Cooperative Dialogue during Human-Agent Interaction

Full metadata record

DC Field	Value	Language
dc.contributor.author	Bae, Young-Ho	-
dc.contributor.author	Bennett, Casey C.	-
dc.date.accessioned	2024-11-28T13:31:13Z	-
dc.date.available	2024-11-28T13:31:13Z	-
dc.date.issued	2023-08	-
dc.identifier.issn	1944-9445	-
dc.identifier.uri	https://scholarworks.bwise.kr/hanyang/handle/2021.sw.hanyang/196579	-
dc.description.abstract	Predicting when it is an artificial agent's turn to speak/act during human-agent interaction (HAI) poses a significant challenge due to the necessity of real-time processing, context sensitivity, capturing complex human behavior, effectively integrating multiple modalities, and addressing class imbalance. In this paper, we present a novel deep learning network-based approach for predicting turn-taking events in HAI that leverages information from multiple modalities, including text, audio, vision, and context data. Our study demonstrates that incorporating additional modalities, including in-game context data, enables a more comprehensive understanding of interaction dynamics leading to enhanced prediction accuracy for the artificial agent. The efficiency of the model also permits potential real-time applications. We evaluated our proposed model on an imbalanced dataset of both successful and failed turn-taking attempts during an HAI cooperative gameplay scenario, comprising over 125,000 instances, and employed a focal loss function to address class imbalance. Our model outperformed baseline models, such as Early Fusion LSTM (EF-LSTM), Late Fusion LSTM (LF-LSTM), and the state-of-the-art Multimodal Transformer (Mult). Additionally, we conducted an ablation study to investigate the contributions of individual modality components within our model, revealing the significant role of speech content cues. In conclusion, our proposed approach demonstrates considerable potential in predicting turn-taking events within HAI, providing a foundation for future research with physical robots during human-robot interaction (HRI).	-
dc.format.extent	8	-
dc.language	영어	-
dc.language.iso	ENG	-
dc.publisher	IEEE	-
dc.title	Real-Time Multimodal Turn-taking Prediction to Enhance Cooperative Dialogue during Human-Agent Interaction	-
dc.type	Article	-
dc.publisher.location	미국	-
dc.identifier.doi	10.1109/RO-MAN57019.2023.10309569	-
dc.identifier.scopusid	2-s2.0-85187014734	-
dc.identifier.wosid	001108678600265	-
dc.identifier.bibliographicCitation	2023 32ND IEEE INTERNATIONAL CONFERENCE ON ROBOT AND HUMAN INTERACTIVE COMMUNICATION, RO-MAN, pp 2037 - 2044	-
dc.citation.title	2023 32ND IEEE INTERNATIONAL CONFERENCE ON ROBOT AND HUMAN INTERACTIVE COMMUNICATION, RO-MAN	-
dc.citation.startPage	2037	-
dc.citation.endPage	2044	-
dc.type.docType	Proceedings Paper	-
dc.description.isOpenAccess	N	-
dc.description.journalRegisteredClass	scie	-
dc.relation.journalResearchArea	Computer Science	-
dc.relation.journalResearchArea	Engineering	-
dc.relation.journalResearchArea	Robotics	-
dc.relation.journalWebOfScienceCategory	Computer Science, Artificial Intelligence	-
dc.relation.journalWebOfScienceCategory	Computer Science, Cybernetics	-
dc.relation.journalWebOfScienceCategory	Ergonomics	-
dc.relation.journalWebOfScienceCategory	Robotics	-
dc.subject.keywordPlus	BACKCHANNELS	-
dc.subject.keywordPlus	ORGANIZATION	-
dc.subject.keywordPlus	FEATURES	-
dc.subject.keywordPlus	FACE	-
dc.identifier.url	https://ieeexplore.ieee.org/document/10309569	-

Files in This Item: Go to Link

Appears in Collections: 서울 공과대학 > ETC > 1. Journal Articles

Show simple item record

qrcode

Altmetrics

Total Views & Downloads

RSS_1.0 RSS_2.0 ATOM_1.0

222, Wangsimni-ro, Seongdong-gu, Seoul, 04763, Korea+82-2-2220-1366

Certain data included herein are derived from the © Web of Science of Clarivate Analytics. All rights reserved.
You may not copy or re-distribute this material in whole or in part without the prior written consent of Clarivate Analytics.

Detailed Information

Altmetrics

Total Views & Downloads

BROWSE