X-RiSAWOZ: High-Quality End-to-End Multilingual Dialogue Datasets and Few-shot Agents

Moradshahi, Mehrad; Shen, Tianhao; Bali, Kalika; Choudhury, Monojit; de Chalendar, Gaël; Goel, Anmol; Kim, Sungkyun; Kodali, Prashant; Kumaraguru, Ponnurangam; Semmar, Nasredine; Semnani, Sina J.; Seo, Jiwon; Seshadri, Vivek; Shrivastava, Manish; Sun, Michael; Yadavalli, Aditya; You, Chaobin; Xiong, Deyi; Lam, Monica S.

doi:10.18653/v1/2023.findings-acl.174

Detailed Information

Cited 0 time in webofscience

Cited 0 time in scopus

Metadata Downloads

X-RiSAWOZ: High-Quality End-to-End Multilingual Dialogue Datasets and Few-shot Agents

Full metadata record

DC Field	Value	Language
dc.contributor.author	Moradshahi, Mehrad	-
dc.contributor.author	Shen, Tianhao	-
dc.contributor.author	Bali, Kalika	-
dc.contributor.author	Choudhury, Monojit	-
dc.contributor.author	de Chalendar, Gaël	-
dc.contributor.author	Goel, Anmol	-
dc.contributor.author	Kim, Sungkyun	-
dc.contributor.author	Kodali, Prashant	-
dc.contributor.author	Kumaraguru, Ponnurangam	-
dc.contributor.author	Semmar, Nasredine	-
dc.contributor.author	Semnani, Sina J.	-
dc.contributor.author	Seo, Jiwon	-
dc.contributor.author	Seshadri, Vivek	-
dc.contributor.author	Shrivastava, Manish	-
dc.contributor.author	Sun, Michael	-
dc.contributor.author	Yadavalli, Aditya	-
dc.contributor.author	You, Chaobin	-
dc.contributor.author	Xiong, Deyi	-
dc.contributor.author	Lam, Monica S.	-
dc.date.accessioned	2023-11-24T03:06:27Z	-
dc.date.available	2023-11-24T03:06:27Z	-
dc.date.issued	2023-07	-
dc.identifier.issn	0736-587X	-
dc.identifier.uri	https://scholarworks.bwise.kr/hanyang/handle/2021.sw.hanyang/192898	-
dc.description.abstract	Task-oriented dialogue research has mainly focused on a few popular languages like English and Chinese, due to the high dataset creation cost for a new language. To reduce the cost, we apply manual editing to automatically translated data. We create a new multilingual benchmark, X-RiSAWOZ, by translating the Chinese RiSAWOZ to 4 languages: English, French, Hindi, Korean; and a code-mixed English-Hindi language. X-RiSAWOZ has more than 18,000 human-verified dialogue utterances for each language, and unlike most multilingual prior work, is an end-to-end dataset for building fully-functioning agents. The many difficulties we encountered in creating X-RiSAWOZ led us to develop a toolset to accelerate the post-editing of a new language dataset after translation. This toolset improves machine translation with a hybrid entity alignment technique that combines neural with dictionary-based methods, along with many automated and semi-automated validation checks. We establish strong baselines for X-RiSAWOZ by training dialogue agents in the zero- and few-shot settings where limited gold data is available in the target language. Our results suggest that our translation and post-editing methodology and toolset can be used to create new high-quality multilingual dialogue agents cost-effectively. Our dataset, code, and toolkit are released open-source.	-
dc.format.extent	22	-
dc.language	영어	-
dc.language.iso	ENG	-
dc.title	X-RiSAWOZ: High-Quality End-to-End Multilingual Dialogue Datasets and Few-shot Agents	-
dc.type	Article	-
dc.publisher.location	영국	-
dc.identifier.doi	10.18653/v1/2023.findings-acl.174	-
dc.identifier.scopusid	2-s2.0-85175470671	-
dc.identifier.bibliographicCitation	Association for Computational Linguistics (ACL). Annual Meeting Conference Proceedings, pp 2773 - 2794	-
dc.citation.title	Association for Computational Linguistics (ACL). Annual Meeting Conference Proceedings	-
dc.citation.startPage	2773	-
dc.citation.endPage	2794	-
dc.type.docType	Conference paper	-
dc.description.isOpenAccess	N	-
dc.description.journalRegisteredClass	scopus	-
dc.subject.keywordPlus	Alignment technique	-
dc.subject.keywordPlus	End to end	-
dc.subject.keywordPlus	High quality	-
dc.subject.keywordPlus	Machine translations	-
dc.subject.keywordPlus	New high	-
dc.subject.keywordPlus	Post-editing	-
dc.subject.keywordPlus	Target language	-
dc.subject.keywordPlus	Task-oriented	-
dc.subject.keywordPlus	Toolsets	-
dc.subject.keywordPlus	Validation checks	-
dc.identifier.url	https://aclanthology.org/2023.findings-acl.174/	-

Files in This Item: Go to Link

Appears in Collections: 서울 공과대학 > 서울 컴퓨터소프트웨어학부 > 1. Journal Articles

Show simple item record

qrcode

Altmetrics

Total Views & Downloads

RSS_1.0 RSS_2.0 ATOM_1.0

222, Wangsimni-ro, Seongdong-gu, Seoul, 04763, Korea+82-2-2220-1366

Certain data included herein are derived from the © Web of Science of Clarivate Analytics. All rights reserved.
You may not copy or re-distribute this material in whole or in part without the prior written consent of Clarivate Analytics.

Detailed Information

Altmetrics

Total Views & Downloads

BROWSE