Contract Q&A (RAG grounded)
Answer contract questions grounded in the actual contract text.
Provisional leader
gemini-3.1-pro-preview
Current leader based on limited benchmark evidence. Treat this ranking as directional until coverage improves.
32.4%
Best benchmark score
37.1%
Confidence
All ranked models โ top 3
Ranked Models
30
Evidence Quality
84%
Evidence Points
24
Top Signal
SimpleQA Verified: simpleqa_verified_score_pct
All Ranked Models
| Rank | Model | Score |
|---|---|---|
| ๐ฅ | gemini-3.1-pro-preview Strong on SimpleQA Verified simpleqa_verified_score_pct and FACTS Benchmark Suite facts_grounding_score_pct | 32.4% |
| ๐ฅ | gemini-2.5-pro Strong on FACTS Benchmark Suite facts_grounding_score_pct and LEXam Leaderboard average_score_pct | 31.6% |
| ๐ฅ | gpt-5-mini-2025-08-07 Strong on Vals Case Law v2 overall_accuracy_pct and FACTS Benchmark Suite facts_grounding_score_pct | 30.5% |
| #4 | gpt-5-2025-08-07 Strong on FACTS Benchmark Suite facts_grounding_score_pct and LEXam Leaderboard average_score_pct | 30.1% |
| #5 | claude-sonnet-4 Strong on Galileo Agent Leaderboard v2 Avg TSQ and Vals Legal Bench overall_accuracy_pct | 27.5% |
| #6 | gemini-3-pro-preview Strong on SimpleQA Verified simpleqa_verified_score_pct and Vals Legal Bench overall_accuracy_pct | 27.2% |
| #7 | gemini-2.5-flash Strong on FACTS Benchmark Suite facts_grounding_score_pct and Galileo Agent Leaderboard v2 Avg TSQ | 24.7% |
| #8 | gpt-4.1-20250414 Strong on Vals Case Law v2 overall_accuracy_pct and Vectara HHEM Leaderboard overall_hallucination_error_pct | 24.6% |
| #9 | Grok-4-0709 Strong on Vals Legal Bench overall_accuracy_pct and Vals Case Law v2 overall_accuracy_pct | 24.4% |
| #10 | gemini-3-flash-preview Strong on Vals Legal Bench overall_accuracy_pct and FACTS Benchmark Suite facts_grounding_score_pct | 24.2% |
| #11 | claude-sonnet-4.6 Strong on Vals Finance Agent overall_accuracy_pct and Vals Legal Bench overall_accuracy_pct | 23.6% |
| #12 | gemini-3.1-flash-lite-preview Strong on FACTS Benchmark Suite facts_grounding_score_pct and Vals Legal Bench overall_accuracy_pct | 22.9% |
| #13 | gpt-5.4-2026-03-05 Strong on Vals Legal Bench overall_accuracy_pct and Vectara HHEM Leaderboard overall_hallucination_error_pct | 22.6% |
| #14 | gpt-5.2-2025-12-11 Strong on FACTS Benchmark Suite facts_grounding_score_pct and Vals Legal Bench overall_accuracy_pct | 22.4% |
| #15 | claude-opus-4-5-20251101 Strong on FACTS Benchmark Suite facts_grounding_score_pct and Vals Legal Bench overall_accuracy_pct | 21.1% |
| #16 | grok-4-fast-reasoning Strong on Vals CorpFin v2 overall_accuracy_pct and Vals Legal Bench overall_accuracy_pct | 20.6% |
| #17 | gpt-5.1-2025-11-13 Strong on Vals Case Law v2 overall_accuracy_pct and Vals Legal Bench overall_accuracy_pct | 19.8% |
| #18 | grok-4-1-fast-reasoning Strong on Vals Legal Bench overall_accuracy_pct and Vals CorpFin v2 overall_accuracy_pct | 18.2% |
| #19 | o3-20250416 Strong on Vals Legal Bench overall_accuracy_pct and SimpleQA Verified simpleqa_verified_score_pct | 16.7% |
| #20 | grok-3 Strong on Vectara HHEM Leaderboard overall_hallucination_error_pct and Vals Legal Bench overall_accuracy_pct | 15.9% |
| #21 | deepseek-r1 Strong on SYCON Bench (Table 2) sycon_unethical_tof_pct and LEXam Leaderboard average_score_pct | 15.3% |
| #22 | claude-opus-4-6-thinking Strong on Vals Legal Bench overall_accuracy_pct and Vals CorpFin v2 overall_accuracy_pct | 15.1% |
| #23 | mistral-large-2512 Strong on Vals Legal Bench overall_accuracy_pct and Vals CorpFin v2 overall_accuracy_pct | 14.7% |
| #24 | claude-opus-4-1-20250805 Strong on Vals Legal Bench overall_accuracy_pct and FACTS Benchmark Suite facts_grounding_score_pct | 14.5% |
| #25 | claude-opus-4-5-20251101-thinking Strong on Vals Legal Bench overall_accuracy_pct and Vals Finance Agent overall_accuracy_pct | 14.3% |
| #26 | gpt-4.1 Strong on LEXam Leaderboard average_score_pct and LanguageBench translation_to:bleu | 13.7% |
| #27 | claude-sonnet-4-5-20250929-thinking Strong on Vals Legal Bench overall_accuracy_pct and Vals Finance Agent overall_accuracy_pct | 13.5% |
| #28 | grok-4-1-fast-non-reasoning Strong on Vals Legal Bench overall_accuracy_pct and Vals Finance Agent overall_accuracy_pct | 13.5% |
| #31 | glm-5-thinking Strong on Vals Legal Bench overall_accuracy_pct and Vals CorpFin v2 overall_accuracy_pct | 12.8% |
| #32 | deepseek-v3 Strong on Vectara HHEM Leaderboard overall_hallucination_error_pct and SYCON Bench (Table 2) sycon_unethical_tof_pct | 12.6% |
Compare Models
โถRanking diagnostics & missing models
Source lift
Ranked
59
Sources
8
Quality
Low
Vals CorpFin v2
Vals Legal Bench
Vals Finance Agent
Vals Case Law v2
Missing frontier models
No obvious gaps right now.
โถTaxonomy & task details
Core tasks
Required modes
Domains
Related in Legal
Contract Drafting & Redlining
Drafting, reviewing, and suggesting edits to legal contracts and agreements.
Regulatory summary
Summarize and compare regulatory text with conservative interpretation.
Contract redline summary
Summarize material changes between contract versions with clause refs.
Clause playbook check
Check extracted terms against a playbook and flag deviations.