Thesis red teaming
Stress-test an investment thesis with counterarguments and risk.
Best for this use case
gemini-3.1-pro-preview
Strong on Vals Finance Agent overall_accuracy_pct and Vals CorpFin v2 overall_accuracy_pct
48.9%
Best benchmark score
56.1%
Confidence
All ranked models โ top 3
Ranked Models
30
Evidence Quality
90%
Evidence Points
28
Top Signal
Vals Finance Agent: overall_accuracy_pct
All Ranked Models
| Rank | Model | Score |
|---|---|---|
| ๐ฅ | gemini-3.1-pro-preview Strong on Vals Finance Agent overall_accuracy_pct and Vals CorpFin v2 overall_accuracy_pct | 48.9% |
| ๐ฅ | Grok-4-0709 Strong on Vals CorpFin v2 overall_accuracy_pct and Vals Finance Agent overall_accuracy_pct | 40.8% |
| ๐ฅ | gemini-2.5-pro Strong on FACTS Benchmark Suite facts_grounding_score_pct and Vals CorpFin v2 overall_accuracy_pct | 40.6% |
| #4 | gpt-5-mini-2025-08-07 Strong on Vals Finance Agent overall_accuracy_pct and Vals CorpFin v2 overall_accuracy_pct | 39.7% |
| #5 | gpt-5-2025-08-07 Strong on Vals Finance Agent overall_accuracy_pct and Vals CorpFin v2 overall_accuracy_pct | 39.3% |
| #6 | gpt-5.2-2025-12-11 Strong on Vals Finance Agent overall_accuracy_pct and Vals CorpFin v2 overall_accuracy_pct | 38.8% |
| #7 | claude-sonnet-4.6 Strong on Vals Finance Agent overall_accuracy_pct and Vals CorpFin v2 overall_accuracy_pct | 38.7% |
| #8 | gemini-3-pro-preview Strong on Vals Finance Agent overall_accuracy_pct and Vals CorpFin v2 overall_accuracy_pct | 37.6% |
| #9 | gemini-3-flash-preview Strong on Vals CorpFin v2 overall_accuracy_pct and Vals Finance Agent overall_accuracy_pct | 37.5% |
| #10 | gpt-5.4-2026-03-05 Strong on Vals Finance Agent overall_accuracy_pct and Vals CorpFin v2 overall_accuracy_pct | 36.9% |
| #11 | gemini-3.1-flash-lite-preview Strong on Vals Finance Agent overall_accuracy_pct and Vals CorpFin v2 overall_accuracy_pct | 36.0% |
| #12 | grok-4-fast-reasoning Strong on Vals CorpFin v2 overall_accuracy_pct and Vals Finance Agent overall_accuracy_pct | 33.2% |
| #13 | gpt-5.1-2025-11-13 Strong on Vals Finance Agent overall_accuracy_pct and Vals CorpFin v2 overall_accuracy_pct | 33.1% |
| #14 | gpt-4.1-20250414 Strong on Vals CorpFin v2 overall_accuracy_pct and Vals Tax Eval v2 overall_accuracy_pct | 32.9% |
| #15 | claude-sonnet-4 Strong on Vals CorpFin v2 overall_accuracy_pct and FACTS Benchmark Suite facts_grounding_score_pct | 30.6% |
| #16 | claude-opus-4-6-thinking Strong on Vals Finance Agent overall_accuracy_pct and Vals CorpFin v2 overall_accuracy_pct | 30.5% |
| #17 | grok-4-1-fast-reasoning Strong on Vals CorpFin v2 overall_accuracy_pct and Vals Finance Agent overall_accuracy_pct | 29.7% |
| #18 | claude-opus-4-5-20251101-thinking Strong on Vals Finance Agent overall_accuracy_pct and Vals CorpFin v2 overall_accuracy_pct | 29.0% |
| #19 | claude-opus-4-5-20251101 Strong on Vals CorpFin v2 overall_accuracy_pct and FACTS Benchmark Suite facts_grounding_score_pct | 28.7% |
| #20 | kimi-k2.5-thinking Strong on Vals CorpFin v2 overall_accuracy_pct and Vals Finance Agent overall_accuracy_pct | 28.6% |
| #21 | claude-sonnet-4-5-20250929-thinking Strong on Vals Finance Agent overall_accuracy_pct and Vals CorpFin v2 overall_accuracy_pct | 27.0% |
| #22 | grok-4.20-0309-reasoning Strong on Vals Finance Agent overall_accuracy_pct and Vals CorpFin v2 overall_accuracy_pct | 26.6% |
| #24 | o3-20250416 Strong on Vals CorpFin v2 overall_accuracy_pct and Vals Tax Eval v2 overall_accuracy_pct | 25.0% |
| #25 | glm-5-thinking Strong on Vals Finance Agent overall_accuracy_pct and Vals CorpFin v2 overall_accuracy_pct | 24.8% |
| #26 | qwen3.5-flash Strong on Vals CorpFin v2 overall_accuracy_pct and Vals Finance Agent overall_accuracy_pct | 24.8% |
| #28 | claude-haiku-4-5-20251001-thinking Strong on Vals Finance Agent overall_accuracy_pct and Vals CorpFin v2 overall_accuracy_pct | 23.9% |
| #29 | gpt-5.4-nano-2026-03-17 Strong on Vals CorpFin v2 overall_accuracy_pct and Vals Finance Agent overall_accuracy_pct | 23.9% |
| #30 | MiniMax-M2.7 Strong on Vals Finance Agent overall_accuracy_pct and Vals CorpFin v2 overall_accuracy_pct | 23.1% |
| #31 | mistral-large-2512 Strong on Vals CorpFin v2 overall_accuracy_pct and Vals Tax Eval v2 overall_accuracy_pct | 21.2% |
| #32 | grok-4-1-fast-non-reasoning Strong on Vals Finance Agent overall_accuracy_pct and Vals CorpFin v2 overall_accuracy_pct | 21.0% |
Compare Models
โถRanking diagnostics & missing models
Source lift
Ranked
57
Sources
8
Quality
Good
Vals Tax Eval v2
Vals CorpFin v2
Vals GPQA
Vals Mortgage Tax
Missing frontier models
No obvious gaps right now.
โถTaxonomy & task details
Core tasks
Required modes
Domains
Related in Finance
Earnings call synthesis
Summarize earnings calls into key points, tone, and risks.
Transaction anomaly narrative
Summarize anomalies into hypotheses, evidence, and follow-up actions.
Accounts payable invoice extraction (text)
Extract structured fields from invoices/receipts for AP workflows.
AML alert triage
Triage AML alerts into severity, rationale, and next actions.