国际中文开源期刊平台

logo
open
cover
当前浏览量 37888
当前下载量 40783

译道

Way to Translation

ISSN Print: 2709-667X
ISSN Online: 2709-6688
联系编辑部
加入我们
友情链接
邮箱订阅
选择期刊索引
选择期刊
您的邮箱地址

机器翻译评估指标研究

Research on Machine Translation Evaluation Metrics

译道 / 2024,4(2):38-48 / 2024-06-18 look69 look35
  • 作者: 牟正帅¹      奚瑞祺¹      ²      成嘉垠¹      ²     
  • 单位:
    1.中南财经政法大学外国语学院AI翻译产教融合创新实践基地,武汉;
    2.中南财经政法大学统计与数学学院,武汉
  • 关键词: 机器翻译;预训练语言模型;翻译质量;翻译评估
  • Machine translation; Pre-trained language models; Translation quality; Translation evaluation
  • 摘要: 机器翻译评估指标作为衡量翻译系统性能的重点研究对象之一,其研究不仅关系到翻译质量的判断,也对翻译技术的进步具有深远影响。随着深度学习技术的发展,大语言模型在机器翻译领域取得了显著进展,展现出强大的多模态翻译能力。然而,这些新模型也对传统翻译评估指标提出了新的挑战。针对这一问题,本研究系统性地分析了基于预训练语言模型的机器翻译评估指标。研究涵盖了基于词重叠、词向量、距离的评价指标,以及CIDEr、SPICE等其他指标,并从语言学的角度对各个指标进行了探讨分析。同时,对机器翻译质量评估方案进行了分类,包括单一方案评估、多方案对照评估和多方案结合评估,并对不同方案的优缺点进行了分析。最后,针对大语言模型翻译质量评估,提出了多方案结合评估的建议,以更全面地评估翻译的语义保持、文化适应性和语法正确性等诸多方面。本研究旨在为评估大语言模型翻译性能提供重要的理论参考。
  • Evaluation metrics for machine translation, as a crucial research focus for measuring the performance of translation systems, significantly impact the judgment of translation quality and the advancement of translation technology. With the progress of deep learning, large language models have achieved remarkable advancements in the field of machine translation, showcasing strong multimodal translation capabilities. However, these new models also pose challenges to traditional translation evaluation metrics. This study systematically analyzes machine translation evaluation metrics based on pre-trained language models, including metrics based on word overlap, word vectors, and distance, as well as other metrics like CIDEr and SPICE. From a linguistic perspective, the study discusses and analyzes each metric. It also categorizes machine translation quality evaluation schemes into single-scheme evaluation, multiple-scheme comparative evaluation, and combined-scheme evaluation, analyzing their respective advantages and disadvantages. Finally, for evaluating the translation quality of large language models, the study suggests a combined-scheme evaluation to more comprehensively assess various aspects such as semantic retention, cultural adaptability, and grammatical correctness, aiming to provide important theoretical references for evaluating the performance of large language models in translation.
  • DOI: https://doi.org/10.35534/wtt.0402006
  • 引用: 牟正帅,奚瑞祺,成嘉垠.机器翻译评估指标研究[J].译道,2024,4(2):38-48.
已有账号
027-59302486