BasedAGIBasedAGI
Finance

Quant research code generation

Generate backtest or analysis code from trading hypotheses.

task.code_generationtask.reasoning_math

Best for this use case

gpt-5-2025-08-07

Strong on SWE-bench Verified Leaderboard swe_verified_resolved_pct and Vals Finance Agent overall_accuracy_pct

34.2%

Best benchmark score

41.7%

Confidence

Ranked Models

30

Evidence Quality

85%

Evidence Points

36

Top Signal

SWE-bench Verified Leaderboard: swe_verified_resolved_pct

All Ranked Models

30 of 30 models
RankModelScore
๐Ÿฅ‡gpt-5-2025-08-07

Strong on SWE-bench Verified Leaderboard swe_verified_resolved_pct and Vals Finance Agent overall_accuracy_pct

34.2%
๐Ÿฅˆclaude-sonnet-4

Strong on SWE-bench Verified Leaderboard swe_verified_resolved_pct and Sonar Java Quality Leaderboard functional_skill_pct

32.0%
๐Ÿฅ‰claude-sonnet-4.6

Strong on Vals Finance Agent overall_accuracy_pct and Vals CorpFin v2 overall_accuracy_pct

29.4%
#4gpt-5.2-2025-12-11

Strong on SWE-bench Verified Leaderboard swe_verified_resolved_pct and Vals Finance Agent overall_accuracy_pct

29.3%
#5gemini-2.5-pro

Strong on SWE-bench Verified Leaderboard swe_verified_resolved_pct and Vals CorpFin v2 overall_accuracy_pct

29.1%
#6gemini-3-pro-preview

Strong on SWE-bench Verified Leaderboard swe_verified_resolved_pct and Vals Finance Agent overall_accuracy_pct

28.9%
#7gemini-3.1-pro-preview

Strong on Vals Finance Agent overall_accuracy_pct and Vals CorpFin v2 overall_accuracy_pct

28.4%
#8gpt-5-mini-2025-08-07

Strong on SWE-bench Verified Leaderboard swe_verified_resolved_pct and Vals Finance Agent overall_accuracy_pct

27.4%
#9Grok-4-0709

Strong on Vals CorpFin v2 overall_accuracy_pct and Vals Finance Agent overall_accuracy_pct

27.1%
#10gpt-4.1-20250414

Strong on Vals Tax Eval v2 overall_accuracy_pct and SWE-bench Verified Leaderboard swe_verified_resolved_pct

25.9%
#11Kimi K2 Thinking

Strong on SWE-bench Verified Leaderboard swe_verified_resolved_pct and Sonar Java Quality Leaderboard functional_skill_pct

24.1%
#12gpt-5.4-2026-03-05

Strong on Vals Finance Agent overall_accuracy_pct and Vals CorpFin v2 overall_accuracy_pct

23.5%
#13gemini-3-flash-preview

Strong on Vals CorpFin v2 overall_accuracy_pct and Vals Tax Eval v2 overall_accuracy_pct

22.1%
#14o3-20250416

Strong on SWE-bench Verified Leaderboard swe_verified_resolved_pct and Vals Tax Eval v2 overall_accuracy_pct

21.7%
#15kimi-k2.5-thinking

Strong on Vals CorpFin v2 overall_accuracy_pct and Vals Finance Agent overall_accuracy_pct

21.6%
#16claude-opus-4-5-20251101

Strong on SWE-bench Verified Leaderboard swe_verified_resolved_pct and Vals Tax Eval v2 overall_accuracy_pct

21.5%
#17gpt-5.1-2025-11-13

Strong on Vals Finance Agent overall_accuracy_pct and Vals CorpFin v2 overall_accuracy_pct

21.4%
#18gemini-3.1-flash-lite-preview

Strong on Vals Finance Agent overall_accuracy_pct and Vals Tax Eval v2 overall_accuracy_pct

21.2%
#19claude-opus-4-6-thinking

Strong on Vals Finance Agent overall_accuracy_pct and Vals CorpFin v2 overall_accuracy_pct

21.0%
#20grok-4-fast-reasoning

Strong on Vals CorpFin v2 overall_accuracy_pct and Vals Tax Eval v2 overall_accuracy_pct

20.8%
#21grok-4-1-fast-reasoning

Strong on Vals CorpFin v2 overall_accuracy_pct and Vals Finance Agent overall_accuracy_pct

20.4%
#22claude-opus-4-5-20251101-thinking

Strong on Vals Finance Agent overall_accuracy_pct and Vals CorpFin v2 overall_accuracy_pct

20.3%
#23glm-4.7

Strong on Sonar Java Quality Leaderboard functional_skill_pct and Vals Finance Agent overall_accuracy_pct

19.7%
#24minimax-m2.1

Strong on Sonar Java Quality Leaderboard functional_skill_pct and Vals CorpFin v2 overall_accuracy_pct

18.9%
#25claude-sonnet-4-5-20250929-thinking

Strong on Vals Finance Agent overall_accuracy_pct and Vals CorpFin v2 overall_accuracy_pct

18.8%
#26grok-4.20-0309-reasoning

Strong on Vals Finance Agent overall_accuracy_pct and Vals CorpFin v2 overall_accuracy_pct

18.1%
#28claude-opus-4-6

Strong on SWE-bench Verified Leaderboard swe_verified_resolved_pct and OpenHands Issue Resolution issue_resolution_score_pct

17.7%
#29o4-mini

Strong on SWE-bench Verified Leaderboard swe_verified_resolved_pct and Vals Tax Eval v2 overall_accuracy_pct

17.4%
#30qwen3.5-flash

Strong on Vals CorpFin v2 overall_accuracy_pct and Vals Tax Eval v2 overall_accuracy_pct

17.1%
#33gpt-4.1-mini-20250414

Strong on Vals Tax Eval v2 overall_accuracy_pct and Galileo Agent Leaderboard v2 Banking AC

16.6%

Compare Models

Select two different models above to compare their evidence side by side.
โ–ถRanking diagnostics & missing models

Source lift

Ranked

59

Sources

8

Quality

Good

Vals Tax Eval v2

45 rows ยท 1.5% avg lift

Vals CorpFin v2

44 rows ยท 1.0% avg lift

Vals GPQA

42 rows ยท 0.7% avg lift

Vals Mortgage Tax

34 rows ยท 1.1% avg lift

Missing frontier models

No obvious gaps right now.

โ–ถTaxonomy & task details

Core tasks

task.code_generationtask.reasoning_math

Required modes

none

Domains

domain.finance_equity_research

Related in Finance