Cited 0 time in
Real-Time Multimodal Turn-taking Prediction to Enhance Cooperative Dialogue during Human-Agent Interaction
| DC Field | Value | Language |
|---|---|---|
| dc.contributor.author | Bae, Young-Ho | - |
| dc.contributor.author | Bennett, Casey C. | - |
| dc.date.accessioned | 2024-11-28T13:31:13Z | - |
| dc.date.available | 2024-11-28T13:31:13Z | - |
| dc.date.issued | 2023-08 | - |
| dc.identifier.issn | 1944-9445 | - |
| dc.identifier.uri | https://scholarworks.bwise.kr/hanyang/handle/2021.sw.hanyang/196579 | - |
| dc.description.abstract | Predicting when it is an artificial agent's turn to speak/act during human-agent interaction (HAI) poses a significant challenge due to the necessity of real-time processing, context sensitivity, capturing complex human behavior, effectively integrating multiple modalities, and addressing class imbalance. In this paper, we present a novel deep learning network-based approach for predicting turn-taking events in HAI that leverages information from multiple modalities, including text, audio, vision, and context data. Our study demonstrates that incorporating additional modalities, including in-game context data, enables a more comprehensive understanding of interaction dynamics leading to enhanced prediction accuracy for the artificial agent. The efficiency of the model also permits potential real-time applications. We evaluated our proposed model on an imbalanced dataset of both successful and failed turn-taking attempts during an HAI cooperative gameplay scenario, comprising over 125,000 instances, and employed a focal loss function to address class imbalance. Our model outperformed baseline models, such as Early Fusion LSTM (EF-LSTM), Late Fusion LSTM (LF-LSTM), and the state-of-the-art Multimodal Transformer (Mult). Additionally, we conducted an ablation study to investigate the contributions of individual modality components within our model, revealing the significant role of speech content cues. In conclusion, our proposed approach demonstrates considerable potential in predicting turn-taking events within HAI, providing a foundation for future research with physical robots during human-robot interaction (HRI). | - |
| dc.format.extent | 8 | - |
| dc.language | 영어 | - |
| dc.language.iso | ENG | - |
| dc.publisher | IEEE | - |
| dc.title | Real-Time Multimodal Turn-taking Prediction to Enhance Cooperative Dialogue during Human-Agent Interaction | - |
| dc.type | Article | - |
| dc.publisher.location | 미국 | - |
| dc.identifier.doi | 10.1109/RO-MAN57019.2023.10309569 | - |
| dc.identifier.scopusid | 2-s2.0-85187014734 | - |
| dc.identifier.wosid | 001108678600265 | - |
| dc.identifier.bibliographicCitation | 2023 32ND IEEE INTERNATIONAL CONFERENCE ON ROBOT AND HUMAN INTERACTIVE COMMUNICATION, RO-MAN, pp 2037 - 2044 | - |
| dc.citation.title | 2023 32ND IEEE INTERNATIONAL CONFERENCE ON ROBOT AND HUMAN INTERACTIVE COMMUNICATION, RO-MAN | - |
| dc.citation.startPage | 2037 | - |
| dc.citation.endPage | 2044 | - |
| dc.type.docType | Proceedings Paper | - |
| dc.description.isOpenAccess | N | - |
| dc.description.journalRegisteredClass | scie | - |
| dc.relation.journalResearchArea | Computer Science | - |
| dc.relation.journalResearchArea | Engineering | - |
| dc.relation.journalResearchArea | Robotics | - |
| dc.relation.journalWebOfScienceCategory | Computer Science, Artificial Intelligence | - |
| dc.relation.journalWebOfScienceCategory | Computer Science, Cybernetics | - |
| dc.relation.journalWebOfScienceCategory | Ergonomics | - |
| dc.relation.journalWebOfScienceCategory | Robotics | - |
| dc.subject.keywordPlus | BACKCHANNELS | - |
| dc.subject.keywordPlus | ORGANIZATION | - |
| dc.subject.keywordPlus | FEATURES | - |
| dc.subject.keywordPlus | FACE | - |
| dc.identifier.url | https://ieeexplore.ieee.org/document/10309569 | - |
Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.
222, Wangsimni-ro, Seongdong-gu, Seoul, 04763, Korea+82-2-2220-1366
COPYRIGHT © 2024 HANYANG UNIVERSITY.
Certain data included herein are derived from the © Web of Science of Clarivate Analytics. All rights reserved.
You may not copy or re-distribute this material in whole or in part without the prior written consent of Clarivate Analytics.
