Evaluation metrics for machine translation, as a crucial research focus for measuring the performance of translation systems, significantly impact the judgment of translation quality and the advancement of translation technology. With the progress of deep learning, large language models have achieved remarkable advancements in the field of machine translation, showcasing strong multimodal translation capabilities. However, these new models also pose challenges to traditional translation evaluation metrics. This study systematically analyzes machine translation evaluation metrics based on pre-trained language models, including metrics based on word overlap, word vectors, and distance, as well as other metrics like CIDEr and SPICE. From a linguistic perspective, the study discusses and analyzes each metric. It also categorizes machine translation quality evaluation schemes into single-scheme evaluation, multiple-scheme comparative evaluation, and combined-scheme evaluation, analyzing their respective advantages and disadvantages. Finally, for evaluating the translation quality of large language models, the study suggests a combined-scheme evaluation to more comprehensively assess various aspects such as semantic retention, cultural adaptability, and grammatical correctness, aiming to provide important theoretical references for evaluating the performance of large language models in translation.