Detailed Information

Cited 0 time in webofscience Cited 0 time in scopus
Metadata Downloads

X-RiSAWOZ: High-Quality End-to-End Multilingual Dialogue Datasets and Few-shot Agents

Full metadata record
DC Field Value Language
dc.contributor.authorMoradshahi, Mehrad-
dc.contributor.authorShen, Tianhao-
dc.contributor.authorBali, Kalika-
dc.contributor.authorChoudhury, Monojit-
dc.contributor.authorde Chalendar, Gaël-
dc.contributor.authorGoel, Anmol-
dc.contributor.authorKim, Sungkyun-
dc.contributor.authorKodali, Prashant-
dc.contributor.authorKumaraguru, Ponnurangam-
dc.contributor.authorSemmar, Nasredine-
dc.contributor.authorSemnani, Sina J.-
dc.contributor.authorSeo, Jiwon-
dc.contributor.authorSeshadri, Vivek-
dc.contributor.authorShrivastava, Manish-
dc.contributor.authorSun, Michael-
dc.contributor.authorYadavalli, Aditya-
dc.contributor.authorYou, Chaobin-
dc.contributor.authorXiong, Deyi-
dc.contributor.authorLam, Monica S.-
dc.date.accessioned2023-11-24T03:06:27Z-
dc.date.available2023-11-24T03:06:27Z-
dc.date.issued2023-07-
dc.identifier.issn0736-587X-
dc.identifier.urihttps://scholarworks.bwise.kr/hanyang/handle/2021.sw.hanyang/192898-
dc.description.abstractTask-oriented dialogue research has mainly focused on a few popular languages like English and Chinese, due to the high dataset creation cost for a new language. To reduce the cost, we apply manual editing to automatically translated data. We create a new multilingual benchmark, X-RiSAWOZ, by translating the Chinese RiSAWOZ to 4 languages: English, French, Hindi, Korean; and a code-mixed English-Hindi language. X-RiSAWOZ has more than 18,000 human-verified dialogue utterances for each language, and unlike most multilingual prior work, is an end-to-end dataset for building fully-functioning agents. The many difficulties we encountered in creating X-RiSAWOZ led us to develop a toolset to accelerate the post-editing of a new language dataset after translation. This toolset improves machine translation with a hybrid entity alignment technique that combines neural with dictionary-based methods, along with many automated and semi-automated validation checks. We establish strong baselines for X-RiSAWOZ by training dialogue agents in the zero- and few-shot settings where limited gold data is available in the target language. Our results suggest that our translation and post-editing methodology and toolset can be used to create new high-quality multilingual dialogue agents cost-effectively. Our dataset, code, and toolkit are released open-source.-
dc.format.extent22-
dc.language영어-
dc.language.isoENG-
dc.titleX-RiSAWOZ: High-Quality End-to-End Multilingual Dialogue Datasets and Few-shot Agents-
dc.typeArticle-
dc.publisher.location영국-
dc.identifier.doi10.18653/v1/2023.findings-acl.174-
dc.identifier.scopusid2-s2.0-85175470671-
dc.identifier.bibliographicCitationAssociation for Computational Linguistics (ACL). Annual Meeting Conference Proceedings, pp 2773 - 2794-
dc.citation.titleAssociation for Computational Linguistics (ACL). Annual Meeting Conference Proceedings-
dc.citation.startPage2773-
dc.citation.endPage2794-
dc.type.docTypeConference paper-
dc.description.isOpenAccessN-
dc.description.journalRegisteredClassscopus-
dc.subject.keywordPlusAlignment technique-
dc.subject.keywordPlusEnd to end-
dc.subject.keywordPlusHigh quality-
dc.subject.keywordPlusMachine translations-
dc.subject.keywordPlusNew high-
dc.subject.keywordPlusPost-editing-
dc.subject.keywordPlusTarget language-
dc.subject.keywordPlusTask-oriented-
dc.subject.keywordPlusToolsets-
dc.subject.keywordPlusValidation checks-
dc.identifier.urlhttps://aclanthology.org/2023.findings-acl.174/-
Files in This Item
Go to Link
Appears in
Collections
서울 공과대학 > 서울 컴퓨터소프트웨어학부 > 1. Journal Articles

qrcode

Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.

Altmetrics

Total Views & Downloads

BROWSE