Cited 0 time in
X-RiSAWOZ: High-Quality End-to-End Multilingual Dialogue Datasets and Few-shot Agents
| DC Field | Value | Language |
|---|---|---|
| dc.contributor.author | Moradshahi, Mehrad | - |
| dc.contributor.author | Shen, Tianhao | - |
| dc.contributor.author | Bali, Kalika | - |
| dc.contributor.author | Choudhury, Monojit | - |
| dc.contributor.author | de Chalendar, Gaël | - |
| dc.contributor.author | Goel, Anmol | - |
| dc.contributor.author | Kim, Sungkyun | - |
| dc.contributor.author | Kodali, Prashant | - |
| dc.contributor.author | Kumaraguru, Ponnurangam | - |
| dc.contributor.author | Semmar, Nasredine | - |
| dc.contributor.author | Semnani, Sina J. | - |
| dc.contributor.author | Seo, Jiwon | - |
| dc.contributor.author | Seshadri, Vivek | - |
| dc.contributor.author | Shrivastava, Manish | - |
| dc.contributor.author | Sun, Michael | - |
| dc.contributor.author | Yadavalli, Aditya | - |
| dc.contributor.author | You, Chaobin | - |
| dc.contributor.author | Xiong, Deyi | - |
| dc.contributor.author | Lam, Monica S. | - |
| dc.date.accessioned | 2023-11-24T03:06:27Z | - |
| dc.date.available | 2023-11-24T03:06:27Z | - |
| dc.date.issued | 2023-07 | - |
| dc.identifier.issn | 0736-587X | - |
| dc.identifier.uri | https://scholarworks.bwise.kr/hanyang/handle/2021.sw.hanyang/192898 | - |
| dc.description.abstract | Task-oriented dialogue research has mainly focused on a few popular languages like English and Chinese, due to the high dataset creation cost for a new language. To reduce the cost, we apply manual editing to automatically translated data. We create a new multilingual benchmark, X-RiSAWOZ, by translating the Chinese RiSAWOZ to 4 languages: English, French, Hindi, Korean; and a code-mixed English-Hindi language. X-RiSAWOZ has more than 18,000 human-verified dialogue utterances for each language, and unlike most multilingual prior work, is an end-to-end dataset for building fully-functioning agents. The many difficulties we encountered in creating X-RiSAWOZ led us to develop a toolset to accelerate the post-editing of a new language dataset after translation. This toolset improves machine translation with a hybrid entity alignment technique that combines neural with dictionary-based methods, along with many automated and semi-automated validation checks. We establish strong baselines for X-RiSAWOZ by training dialogue agents in the zero- and few-shot settings where limited gold data is available in the target language. Our results suggest that our translation and post-editing methodology and toolset can be used to create new high-quality multilingual dialogue agents cost-effectively. Our dataset, code, and toolkit are released open-source. | - |
| dc.format.extent | 22 | - |
| dc.language | 영어 | - |
| dc.language.iso | ENG | - |
| dc.title | X-RiSAWOZ: High-Quality End-to-End Multilingual Dialogue Datasets and Few-shot Agents | - |
| dc.type | Article | - |
| dc.publisher.location | 영국 | - |
| dc.identifier.doi | 10.18653/v1/2023.findings-acl.174 | - |
| dc.identifier.scopusid | 2-s2.0-85175470671 | - |
| dc.identifier.bibliographicCitation | Association for Computational Linguistics (ACL). Annual Meeting Conference Proceedings, pp 2773 - 2794 | - |
| dc.citation.title | Association for Computational Linguistics (ACL). Annual Meeting Conference Proceedings | - |
| dc.citation.startPage | 2773 | - |
| dc.citation.endPage | 2794 | - |
| dc.type.docType | Conference paper | - |
| dc.description.isOpenAccess | N | - |
| dc.description.journalRegisteredClass | scopus | - |
| dc.subject.keywordPlus | Alignment technique | - |
| dc.subject.keywordPlus | End to end | - |
| dc.subject.keywordPlus | High quality | - |
| dc.subject.keywordPlus | Machine translations | - |
| dc.subject.keywordPlus | New high | - |
| dc.subject.keywordPlus | Post-editing | - |
| dc.subject.keywordPlus | Target language | - |
| dc.subject.keywordPlus | Task-oriented | - |
| dc.subject.keywordPlus | Toolsets | - |
| dc.subject.keywordPlus | Validation checks | - |
| dc.identifier.url | https://aclanthology.org/2023.findings-acl.174/ | - |
Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.
222, Wangsimni-ro, Seongdong-gu, Seoul, 04763, Korea+82-2-2220-1366
COPYRIGHT © 2024 HANYANG UNIVERSITY.
Certain data included herein are derived from the © Web of Science of Clarivate Analytics. All rights reserved.
You may not copy or re-distribute this material in whole or in part without the prior written consent of Clarivate Analytics.
