Beyond Reference: Evaluating High Quality Translations Better than Human References

Noh, Keonwoong; Oh, Seokjin; Jung, Woohwan

doi:10.18653/v1/2024.emnlp-main.294

Detailed Information

Cited 0 time in webofscience

Cited 0 time in scopus

Metadata Downloads

Beyond Reference: Evaluating High Quality Translations Better than Human References

Full metadata record

DC Field	Value	Language
dc.contributor.author	Noh, Keonwoong	-
dc.contributor.author	Oh, Seokjin	-
dc.contributor.author	Jung, Woohwan	-
dc.date.accessioned	2025-07-25T05:00:15Z	-
dc.date.available	2025-07-25T05:00:15Z	-
dc.date.issued	2025-06	-
dc.identifier.uri	https://scholarworks.bwise.kr/erica/handle/2021.sw.erica/126170	-
dc.description.abstract	In Machine Translation (MT) evaluations, the conventional approach is to compare a translated sentence against its human-created reference sentence. MT metrics provide an absolute score (e.g., from 0 to 1) to a candidate sentence based on the similarity with the reference sentence. Thus, existing MT metrics give the maximum score to the reference sentence. However, this approach overlooks the potential for a candidate sentence to exceed the reference sentence in terms of quality. In particular, recent advancements in Large Language Models (LLMs) have highlighted this issue, as LLM-generated sentences often exceed the quality of human-written sentences. To address the problem, we introduce the Residual score Metric (RESUME), which evaluates the relative quality between reference and candidate sentences. RESUME assigns a positive score to candidate sentences that outperform their reference sentences, and a negative score when they fall short. By adding the residual scores from RESUME to the absolute scores from MT metrics, it can be possible to allocate higher scores to candidate sentences than what reference sentences are received from MT metrics. Experimental results demonstrate that RESUME enhances the alignments between MT metrics and human judgments both at the segment-level and the system-level.	-
dc.format.extent	17	-
dc.language	영어	-
dc.language.iso	ENG	-
dc.publisher	ASSOC COMPUTATIONAL LINGUISTICS-ACL	-
dc.title	Beyond Reference: Evaluating High Quality Translations Better than Human References	-
dc.type	Article	-
dc.publisher.location	미국	-
dc.identifier.doi	10.18653/v1/2024.emnlp-main.294	-
dc.identifier.scopusid	2-s2.0-85217809319	-
dc.identifier.wosid	001431695500294	-
dc.identifier.bibliographicCitation	2024 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING, EMNLP 2024, pp 5111 - 5127	-
dc.citation.title	2024 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING, EMNLP 2024	-
dc.citation.startPage	5111	-
dc.citation.endPage	5127	-
dc.type.docType	Proceedings Paper	-
dc.description.isOpenAccess	N	-
dc.description.journalRegisteredClass	scie	-
dc.relation.journalResearchArea	Computer Science	-
dc.relation.journalResearchArea	Linguistics	-
dc.relation.journalWebOfScienceCategory	Computer Science, Artificial Intelligence	-
dc.relation.journalWebOfScienceCategory	Computer Science, Information Systems	-
dc.relation.journalWebOfScienceCategory	Computer Science, Interdisciplinary Applications	-
dc.relation.journalWebOfScienceCategory	Linguistics	-
dc.identifier.url	https://aclanthology.org/2024.emnlp-main.294/	-

Files in This Item: Go to Link

Appears in Collections: COLLEGE OF COMPUTING > DEPARTMENT OF ARTIFICIAL INTELLIGENCE > 1. Journal Articles

Show simple item record

qrcode

Related Researcher

Researcher Jung, Woohwan photo

Jung, Woohwan: ERICA 소프트웨어융합대학 (DEPARTMENT OF ARTIFICIAL INTELLIGENCE)

Read more

Altmetrics

Total Views & Downloads

RSS_1.0 RSS_2.0 ATOM_1.0

55 Hanyangdeahak-ro, Sangnok-gu, Ansan, Gyeonggi-do, 15588, Korea+82-31-400-4269 sweetbrain@hanyang.ac.kr

Certain data included herein are derived from the © Web of Science of Clarivate Analytics. All rights reserved.
You may not copy or re-distribute this material in whole or in part without the prior written consent of Clarivate Analytics.

Detailed Information

Related Researcher

Altmetrics

Total Views & Downloads

BROWSE