Legal

Contract redline summary

Summarize material changes between contract versions with clause refs.

task.compare_docs_difftask.summarize_legal_contract

Evidence quality is currently limited for this use case. Rankings below are useful for exploration, not a strong winner claim.

Provisional leader

gemini-2.5-pro

Current leader based on limited benchmark evidence. Treat this ranking as directional until coverage improves.

32.8%

Best benchmark score

50.2%

Confidence

All ranked models — top 3

🥇

gemini-2.5-pro

32.8%

🥈

gpt-5-2025-08-07

31.6%

🥉

gpt-5-mini-2025-08-07

31.5%

Ranked Models

Evidence Quality

83%

Evidence Points

Top Signal

LEXam Leaderboard: average_score_pct

All Ranked Models

30 of 30 models

Rank	Model	Score	Confidence	Price / 1M	Evidence sources
🥇	gemini-2.5-pro Strong on LEXam Leaderboard average_score_pct and FACTS Benchmark Suite facts_grounding_score_pct	32.8%	50%	$3.44	LEXam LeaderboardFACTS Benchmark Suite
🥈	gpt-5-2025-08-07 Strong on LEXam Leaderboard average_score_pct and FACTS Benchmark Suite facts_grounding_score_pct	31.6%	40%	—	LEXam LeaderboardFACTS Benchmark Suite
🥉	gpt-5-mini-2025-08-07 Strong on LEXam Leaderboard average_score_pct and Vals Case Law v2 overall_accuracy_pct	31.5%	46%	—	LEXam LeaderboardVals Case Law v2
#4	gemini-3.1-pro-preview Strong on SimpleQA Verified simpleqa_verified_score_pct and FACTS Benchmark Suite facts_grounding_score_pct	30.5%	35%	$4.50	SimpleQA VerifiedFACTS Benchmark Suite
#5	gemini-3-pro-preview Strong on LEXam Leaderboard average_score_pct and SimpleQA Verified simpleqa_verified_score_pct	28.1%	39%	$4.50	LEXam LeaderboardSimpleQA Verified
#6	claude-sonnet-4 Strong on Galileo Agent Leaderboard v2 Avg TSQ and Vals Legal Bench overall_accuracy_pct	25.9%	37%	$6.00	Galileo Agent Leaderboard v2Vals Legal Bench
#7	gemini-2.5-flash Strong on FACTS Benchmark Suite facts_grounding_score_pct and Galileo Agent Leaderboard v2 Avg TSQ	23.2%	34%	$0.17	FACTS Benchmark SuiteGalileo Agent Leaderboard v2
#8	gpt-4.1-20250414 Strong on Vals Case Law v2 overall_accuracy_pct and Vectara HHEM Leaderboard overall_hallucination_error_pct	23.1%	33%	—	Vals Case Law v2Vectara HHEM Leaderboard
#9	Grok-4-0709 Strong on Vals Legal Bench overall_accuracy_pct and Vals Case Law v2 overall_accuracy_pct	22.9%	33%	—	Vals Legal BenchVals Case Law v2
#10	gemini-3-flash-preview Strong on Vals Legal Bench overall_accuracy_pct and FACTS Benchmark Suite facts_grounding_score_pct	22.8%	31%	$1.13	Vals Legal BenchFACTS Benchmark Suite
#11	claude-sonnet-4.6 Strong on Vals Finance Agent overall_accuracy_pct and Vals Legal Bench overall_accuracy_pct	22.2%	28%	$6.00	Vals Finance AgentVals Legal Bench
#12	gemini-3.1-flash-lite-preview Strong on FACTS Benchmark Suite facts_grounding_score_pct and Vals Legal Bench overall_accuracy_pct	21.5%	31%	$0.56	FACTS Benchmark SuiteVals Legal Bench
#13	gpt-5.4-2026-03-05 Strong on Vals Legal Bench overall_accuracy_pct and Vectara HHEM Leaderboard overall_hallucination_error_pct	21.3%	27%	—	Vals Legal BenchVectara HHEM Leaderboard
#14	gpt-5.2-2025-12-11 Strong on FACTS Benchmark Suite facts_grounding_score_pct and Vals Legal Bench overall_accuracy_pct	21.0%	25%	—	FACTS Benchmark SuiteVals Legal Bench
#15	claude-opus-4-5-20251101 Strong on FACTS Benchmark Suite facts_grounding_score_pct and Vals Legal Bench overall_accuracy_pct	19.9%	29%	—	FACTS Benchmark SuiteVals Legal Bench
#16	grok-4-fast-reasoning Strong on Vals CorpFin v2 overall_accuracy_pct and Vals Legal Bench overall_accuracy_pct	19.4%	35%	$0.28	Vals CorpFin v2Vals Legal Bench
#17	gpt-5.1-2025-11-13 Strong on Vals Case Law v2 overall_accuracy_pct and Vals Legal Bench overall_accuracy_pct	18.6%	27%	—	Vals Case Law v2Vals Legal Bench
#18	deepseek-r1 Strong on LEXam Leaderboard average_score_pct and SYCON Bench (Table 2) sycon_unethical_tof_pct	18.1%	29%	$0.27	LEXam LeaderboardSYCON Bench (Table 2)
#19	grok-4-1-fast-reasoning Strong on Vals Legal Bench overall_accuracy_pct and Vals CorpFin v2 overall_accuracy_pct	17.1%	25%	$0.28	Vals Legal BenchVals CorpFin v2
#20	o3-20250416 Strong on Vals Legal Bench overall_accuracy_pct and SimpleQA Verified simpleqa_verified_score_pct	15.7%	25%	$3.50	Vals Legal BenchSimpleQA Verified
#22	gpt-4.1 Strong on LEXam Leaderboard average_score_pct and LEXam Leaderboard open_question_judge_score_pct	15.4%	20%	$3.50	LEXam LeaderboardLEXam Leaderboard
#23	grok-3 Strong on Vectara HHEM Leaderboard overall_hallucination_error_pct and Vals Legal Bench overall_accuracy_pct	15.0%	20%	$6.00	Vectara HHEM LeaderboardVals Legal Bench
#24	claude-opus-4-6-thinking Strong on Vals Legal Bench overall_accuracy_pct and Vals CorpFin v2 overall_accuracy_pct	14.2%	16%	—	Vals Legal BenchVals CorpFin v2
#25	mistral-large-2512 Strong on Vals Legal Bench overall_accuracy_pct and Vals CorpFin v2 overall_accuracy_pct	13.8%	23%	—	Vals Legal BenchVals CorpFin v2
#26	claude-opus-4-1-20250805 Strong on Vals Legal Bench overall_accuracy_pct and FACTS Benchmark Suite facts_grounding_score_pct	13.6%	23%	—	Vals Legal BenchFACTS Benchmark Suite
#27	claude-opus-4-5-20251101-thinking Strong on Vals Legal Bench overall_accuracy_pct and Vals Finance Agent overall_accuracy_pct	13.4%	16%	—	Vals Legal BenchVals Finance Agent
#28	claude-sonnet-4-5-20250929-thinking Strong on Vals Legal Bench overall_accuracy_pct and Vals Finance Agent overall_accuracy_pct	12.7%	16%	—	Vals Legal BenchVals Finance Agent
#29	grok-4-1-fast-non-reasoning Strong on Vals Legal Bench overall_accuracy_pct and Vals Finance Agent overall_accuracy_pct	12.7%	22%	$0.28	Vals Legal BenchVals Finance Agent
#32	glm-5-thinking Strong on Vals Legal Bench overall_accuracy_pct and Vals CorpFin v2 overall_accuracy_pct	12.0%	19%	—	Vals Legal BenchVals CorpFin v2
#33	deepseek-v3 Strong on Vectara HHEM Leaderboard overall_hallucination_error_pct and SYCON Bench (Table 2) sycon_unethical_tof_pct	11.8%	18%	—	Vectara HHEM LeaderboardSYCON Bench (Table 2)

Compare Models

Select two different models above to compare their evidence side by side.

▶Ranking diagnostics & missing models

Source lift

Ranked

Sources

Quality

Low

Vals CorpFin v2

44 rows · 1.1% avg lift

Vals Legal Bench

44 rows · 1.6% avg lift

Vals Finance Agent

31 rows · 1.0% avg lift

Vals Case Law v2

30 rows · 1.2% avg lift

Missing frontier models

No obvious gaps right now.

▶Taxonomy & task details

Core tasks

task.compare_docs_difftask.summarize_legal_contract

Required modes

mode.long_contextmode.citations

Domains

domain.legal_contracts

Related in Legal

Contract Drafting & Redlining

Drafting, reviewing, and suggesting edits to legal contracts and agreements.

Contract Q&A (RAG grounded)

Answer contract questions grounded in the actual contract text.

Regulatory summary

Summarize and compare regulatory text with conservative interpretation.

Clause playbook check

Check extracted terms against a playbook and flag deviations.