Legal translation
Translate legal text with terminology consistency and format safety.
Provisional leader
claude-sonnet-4
Current leader based on limited benchmark evidence. Treat this ranking as directional until coverage improves.
33.3%
Best benchmark score
40.6%
Confidence
All ranked models โ top 3
Ranked Models
30
Evidence Quality
81%
Evidence Points
28
Top Signal
LanguageBench Translation Official (Split): translation_to:bleu
All Ranked Models
| Rank | Model | Score |
|---|---|---|
| ๐ฅ | claude-sonnet-4 Strong on LanguageBench Translation Official (Split) translation_to:bleu and LanguageBench overall:mean | 33.3% |
| ๐ฅ | gemini-2.5-flash Strong on LanguageBench Translation Official (Split) translation_to:bleu and LanguageBench overall:mean | 30.4% |
| #4 | gemini-2.5-pro Strong on LEXam Leaderboard average_score_pct and Galileo Agent Leaderboard v2 Avg TSQ | 25.3% |
| #5 | gpt-4.1 Strong on LanguageBench Translation Official (Split) translation_to:bleu and LanguageBench overall:mean | 24.7% |
| #6 | gpt-4.1-20250414 Strong on Vals Case Law v2 overall_accuracy_pct and Vals Legal Bench overall_accuracy_pct | 23.8% |
| #7 | gpt-5-mini-2025-08-07 Strong on Vals Case Law v2 overall_accuracy_pct and Vals Legal Bench overall_accuracy_pct | 23.6% |
| #8 | gpt-5-2025-08-07 Strong on LEXam Leaderboard average_score_pct and Vals Legal Bench overall_accuracy_pct | 23.4% |
| #10 | gemini-2.0-flash-001 Strong on LanguageBench Translation Official (Split) translation_to:bleu and LanguageBench overall:mean | 22.6% |
| #11 | Claude-3.5-Sonnet Strong on LanguageBench Translation Official (Split) translation_to:bleu and LanguageBench overall:mean | 21.8% |
| #13 | deepseek-r1 Strong on LanguageBench Translation Official (Split) translation_to:bleu and LEXam Leaderboard average_score_pct | 19.3% |
| #15 | gemini-3.1-pro-preview Strong on Vals Legal Bench overall_accuracy_pct and Vals Case Law v2 overall_accuracy_pct | 18.8% |
| #17 | gemini-3-pro-preview Strong on Vals Legal Bench overall_accuracy_pct and LEXam Leaderboard average_score_pct | 17.4% |
| #21 | Grok-4-0709 Strong on Vals Legal Bench overall_accuracy_pct and Vals Case Law v2 overall_accuracy_pct | 15.7% |
| #24 | gpt-5.4-2026-03-05 Strong on Vals Legal Bench overall_accuracy_pct and Vals Case Law v2 overall_accuracy_pct | 14.9% |
| #26 | claude-sonnet-4.6 Strong on Vals Legal Bench overall_accuracy_pct and Vals Case Law v2 overall_accuracy_pct | 14.6% |
| #27 | gpt-4.1-mini-20250414 Strong on Vals Legal Bench overall_accuracy_pct and OpenVLM OCRBench Official ocrbench_score_pct | 14.4% |
| #28 | gemini-3-flash-preview Strong on Vals Legal Bench overall_accuracy_pct and Vectara HHEM Leaderboard law_hallucination_error_pct | 14.1% |
| #29 | grok-4-fast-reasoning Strong on Vals Legal Bench overall_accuracy_pct and Vals Case Law v2 overall_accuracy_pct | 13.2% |
| #30 | Llama-3.3-70B-Instruct Strong on LanguageBench Translation Official (Split) translation_to:bleu and LanguageBench overall:mean | 13.2% |
| #31 | gpt-5.1-2025-11-13 Strong on Vals Case Law v2 overall_accuracy_pct and Vals Legal Bench overall_accuracy_pct | 13.1% |
| #32 | gemini-3.1-flash-lite-preview Strong on Vals Legal Bench overall_accuracy_pct and Vectara HHEM Leaderboard law_hallucination_error_pct | 13.1% |
| #33 | Llama-3.1-70B-Instruct Strong on LanguageBench Translation Official (Split) translation_to:bleu and LanguageBench overall:mean | 13.1% |
| #35 | claude-opus-4-5-20251101 Strong on Vals Legal Bench overall_accuracy_pct and Vectara HHEM Leaderboard law_hallucination_error_pct | 12.8% |
| #36 | gpt-5.2-2025-12-11 Strong on Vals Legal Bench overall_accuracy_pct and Vals Case Law v2 overall_accuracy_pct | 12.6% |
| #37 | grok-4-1-fast-reasoning Strong on Vals Legal Bench overall_accuracy_pct and Vals Case Law v2 overall_accuracy_pct | 11.5% |
| #38 | gpt-4o Strong on LEXam Leaderboard average_score_pct and OpenVLM OCRBench Official ocrbench_score_pct | 11.4% |
| #39 | phi-4 Strong on LanguageBench overall:mean and Vectara HHEM Leaderboard law_hallucination_error_pct | 11.2% |
| #43 | claude-opus-4-1-20250805 Strong on Vals Legal Bench overall_accuracy_pct and Vectara HHEM Leaderboard law_hallucination_error_pct | 10.8% |
| #44 | o3-20250416 Strong on Vals Legal Bench overall_accuracy_pct and SimpleQA Verified simpleqa_verified_score_pct | 10.5% |
| #45 | mistral-large-2512 Strong on Vals Legal Bench overall_accuracy_pct and Vals Case Law v2 overall_accuracy_pct | 10.4% |
Compare Models
โถRanking diagnostics & missing models
Source lift
Ranked
60
Sources
8
Quality
Low
Vals Legal Bench
Vals LiveCodeBench
Vals Tax Eval v2
Vals MedQA
Missing frontier models
No obvious gaps right now.
โถTaxonomy & task details
Core tasks
Required modes
Domains
Related in Legal
Contract Drafting & Redlining
Drafting, reviewing, and suggesting edits to legal contracts and agreements.
Contract Q&A (RAG grounded)
Answer contract questions grounded in the actual contract text.
Regulatory summary
Summarize and compare regulatory text with conservative interpretation.
Contract redline summary
Summarize material changes between contract versions with clause refs.