Research on Machine Translation Evaluation Metrics

Mou Zhengshuai Xiruiqi  Chengjia

doi:10.35534/wtt.0402006

Current Views: 624147

Current Downloads: 152626

Way to Translation

ISSN Print: 2709-667X

ISSN Online: 2709-6688

Contact Editorial Office

submission@sciscanpub.com

机器翻译评估指标研究

Research on Machine Translation Evaluation Metrics

Way to Translation / 2024,4(2): 38-48 / 2024-06-18 look

6982

1825

Authors: 牟正帅¹ 奚瑞祺¹ ² 成嘉垠¹ ²

Information:

1．中南财经政法大学外国语学院AI翻译产教融合创新实践基地，武汉；
2．中南财经政法大学统计与数学学院，武汉

Keywords:
Machine translation; Pre-trained language models; Translation quality; Translation evaluation

机器翻译; 预训练语言模型; 翻译质量; 翻译评估
Abstract: Evaluation metrics for machine translation, as a crucial research focus for measuring the performance of translation systems, significantly impact the judgment of translation quality and the advancement of translation technology. With the progress of deep learning, large language models have achieved remarkable advancements in the field of machine translation, showcasing strong multimodal translation capabilities. However, these new models also pose challenges to traditional translation evaluation metrics. This study systematically analyzes machine translation evaluation metrics based on pre-trained language models, including metrics based on word overlap, word vectors, and distance, as well as other metrics like CIDEr and SPICE. From a linguistic perspective, the study discusses and analyzes each metric. It also categorizes machine translation quality evaluation schemes into single-scheme evaluation, multiple-scheme comparative evaluation, and combined-scheme evaluation, analyzing their respective advantages and disadvantages. Finally, for evaluating the translation quality of large language models, the study suggests a combined-scheme evaluation to more comprehensively assess various aspects such as semantic retention, cultural adaptability, and grammatical correctness, aiming to provide important theoretical references for evaluating the performance of large language models in translation. 机器翻译评估指标作为衡量翻译系统性能的重点研究对象之一，其研究不仅关系到翻译质量的判断，也对翻译技术的进步具有深远影响。随着深度学习技术的发展，大语言模型在机器翻译领域取得了显著进展，展现出强大的多模态翻译能力。然而，这些新模型也对传统翻译评估指标提出了新的挑战。针对这一问题，本研究系统性地分析了基于预训练语言模型的机器翻译评估指标。研究涵盖了基于词重叠、词向量、距离的评价指标，以及CIDEr、SPICE等其他指标，并从语言学的角度对各个指标进行了探讨分析。同时，对机器翻译质量评估方案进行了分类，包括单一方案评估、多方案对照评估和多方案结合评估，并对不同方案的优缺点进行了分析。最后，针对大语言模型翻译质量评估，提出了多方案结合评估的建议，以更全面地评估翻译的语义保持、文化适应性和语法正确性等诸多方面。本研究旨在为评估大语言模型翻译性能提供重要的理论参考。
DOI: https://doi.org/10.35534/wtt.0402006
Cite: 牟正帅，奚瑞祺，成嘉垠．机器翻译评估指标研究［J］．译道，2024，4（2）：38-48．

International Open Access Journal Platform

Way to Translation

机器翻译评估指标研究

Research on Machine Translation Evaluation Metrics

周梦娇Sciscan

张杰Sciscan

柳编辑Sciscan