Finance

Quant research code generation

Generate backtest or analysis code from trading hypotheses.

task.code_generationtask.reasoning_math

Best for this use case

gpt-5-2025-08-07

Strong on SWE-bench Verified Leaderboard swe_verified_resolved_pct and Vals Finance Agent overall_accuracy_pct

34.2%

Best benchmark score

41.7%

Confidence

All ranked models — top 3

🥇

gpt-5-2025-08-07

34.2%

🥈

claude-sonnet-4

32.0%

🥉

claude-sonnet-4.6

29.4%

Ranked Models

Evidence Quality

85%

Evidence Points

Top Signal

SWE-bench Verified Leaderboard: swe_verified_resolved_pct

All Ranked Models

30 of 30 models

Rank	Model	Score	Confidence	Price / 1M	Evidence sources
🥇	gpt-5-2025-08-07 Strong on SWE-bench Verified Leaderboard swe_verified_resolved_pct and Vals Finance Agent overall_accuracy_pct	34.2%	42%	—	SWE-bench Verified LeaderboardVals Finance Agent
🥈	claude-sonnet-4 Strong on SWE-bench Verified Leaderboard swe_verified_resolved_pct and Sonar Java Quality Leaderboard functional_skill_pct	32.0%	46%	$6.00	SWE-bench Verified LeaderboardSonar Java Quality Leaderboard
🥉	claude-sonnet-4.6 Strong on Vals Finance Agent overall_accuracy_pct and Vals CorpFin v2 overall_accuracy_pct	29.4%	38%	$6.00	Vals Finance AgentVals CorpFin v2
#4	gpt-5.2-2025-12-11 Strong on SWE-bench Verified Leaderboard swe_verified_resolved_pct and Vals Finance Agent overall_accuracy_pct	29.3%	33%	—	SWE-bench Verified LeaderboardVals Finance Agent
#5	gemini-2.5-pro Strong on SWE-bench Verified Leaderboard swe_verified_resolved_pct and Vals CorpFin v2 overall_accuracy_pct	29.1%	41%	$3.44	SWE-bench Verified LeaderboardVals CorpFin v2
#6	gemini-3-pro-preview Strong on SWE-bench Verified Leaderboard swe_verified_resolved_pct and Vals Finance Agent overall_accuracy_pct	28.9%	37%	$4.50	SWE-bench Verified LeaderboardVals Finance Agent
#7	gemini-3.1-pro-preview Strong on Vals Finance Agent overall_accuracy_pct and Vals CorpFin v2 overall_accuracy_pct	28.4%	33%	$4.50	Vals Finance AgentVals CorpFin v2
#8	gpt-5-mini-2025-08-07 Strong on SWE-bench Verified Leaderboard swe_verified_resolved_pct and Vals Finance Agent overall_accuracy_pct	27.4%	37%	—	SWE-bench Verified LeaderboardVals Finance Agent
#9	Grok-4-0709 Strong on Vals CorpFin v2 overall_accuracy_pct and Vals Finance Agent overall_accuracy_pct	27.1%	38%	—	Vals CorpFin v2Vals Finance Agent
#10	gpt-4.1-20250414 Strong on Vals Tax Eval v2 overall_accuracy_pct and SWE-bench Verified Leaderboard swe_verified_resolved_pct	25.9%	37%	—	Vals Tax Eval v2SWE-bench Verified Leaderboard
#11	Kimi K2 Thinking Strong on SWE-bench Verified Leaderboard swe_verified_resolved_pct and Sonar Java Quality Leaderboard functional_skill_pct	24.1%	45%	$1.07	SWE-bench Verified LeaderboardSonar Java Quality Leaderboard
#12	gpt-5.4-2026-03-05 Strong on Vals Finance Agent overall_accuracy_pct and Vals CorpFin v2 overall_accuracy_pct	23.5%	28%	—	Vals Finance AgentVals CorpFin v2
#13	gemini-3-flash-preview Strong on Vals CorpFin v2 overall_accuracy_pct and Vals Tax Eval v2 overall_accuracy_pct	22.1%	29%	$1.13	Vals CorpFin v2Vals Tax Eval v2
#14	o3-20250416 Strong on SWE-bench Verified Leaderboard swe_verified_resolved_pct and Vals Tax Eval v2 overall_accuracy_pct	21.7%	30%	$3.50	SWE-bench Verified LeaderboardVals Tax Eval v2
#15	kimi-k2.5-thinking Strong on Vals CorpFin v2 overall_accuracy_pct and Vals Finance Agent overall_accuracy_pct	21.6%	34%	—	Vals CorpFin v2Vals Finance Agent
#16	claude-opus-4-5-20251101 Strong on SWE-bench Verified Leaderboard swe_verified_resolved_pct and Vals Tax Eval v2 overall_accuracy_pct	21.5%	28%	—	SWE-bench Verified LeaderboardVals Tax Eval v2
#17	gpt-5.1-2025-11-13 Strong on Vals Finance Agent overall_accuracy_pct and Vals CorpFin v2 overall_accuracy_pct	21.4%	29%	—	Vals Finance AgentVals CorpFin v2
#18	gemini-3.1-flash-lite-preview Strong on Vals Finance Agent overall_accuracy_pct and Vals Tax Eval v2 overall_accuracy_pct	21.2%	30%	$0.56	Vals Finance AgentVals Tax Eval v2
#19	claude-opus-4-6-thinking Strong on Vals Finance Agent overall_accuracy_pct and Vals CorpFin v2 overall_accuracy_pct	21.0%	23%	—	Vals Finance AgentVals CorpFin v2
#20	grok-4-fast-reasoning Strong on Vals CorpFin v2 overall_accuracy_pct and Vals Tax Eval v2 overall_accuracy_pct	20.8%	32%	$0.28	Vals CorpFin v2Vals Tax Eval v2
#21	grok-4-1-fast-reasoning Strong on Vals CorpFin v2 overall_accuracy_pct and Vals Finance Agent overall_accuracy_pct	20.4%	28%	$0.28	Vals CorpFin v2Vals Finance Agent
#22	claude-opus-4-5-20251101-thinking Strong on Vals Finance Agent overall_accuracy_pct and Vals CorpFin v2 overall_accuracy_pct	20.3%	23%	—	Vals Finance AgentVals CorpFin v2
#23	glm-4.7 Strong on Sonar Java Quality Leaderboard functional_skill_pct and Vals Finance Agent overall_accuracy_pct	19.7%	30%	—	Sonar Java Quality LeaderboardVals Finance Agent
#24	minimax-m2.1 Strong on Sonar Java Quality Leaderboard functional_skill_pct and Vals CorpFin v2 overall_accuracy_pct	18.9%	40%	$0.53	Sonar Java Quality LeaderboardVals CorpFin v2
#25	claude-sonnet-4-5-20250929-thinking Strong on Vals Finance Agent overall_accuracy_pct and Vals CorpFin v2 overall_accuracy_pct	18.8%	23%	—	Vals Finance AgentVals CorpFin v2
#26	grok-4.20-0309-reasoning Strong on Vals Finance Agent overall_accuracy_pct and Vals CorpFin v2 overall_accuracy_pct	18.1%	23%	—	Vals Finance AgentVals CorpFin v2
#28	claude-opus-4-6 Strong on SWE-bench Verified Leaderboard swe_verified_resolved_pct and OpenHands Issue Resolution issue_resolution_score_pct	17.7%	21%	$10.00	SWE-bench Verified LeaderboardOpenHands Issue Resolution
#29	o4-mini Strong on SWE-bench Verified Leaderboard swe_verified_resolved_pct and Vals Tax Eval v2 overall_accuracy_pct	17.4%	29%	$1.93	SWE-bench Verified LeaderboardVals Tax Eval v2
#30	qwen3.5-flash Strong on Vals CorpFin v2 overall_accuracy_pct and Vals Tax Eval v2 overall_accuracy_pct	17.1%	23%	—	Vals CorpFin v2Vals Tax Eval v2
#33	gpt-4.1-mini-20250414 Strong on Vals Tax Eval v2 overall_accuracy_pct and Galileo Agent Leaderboard v2 Banking AC	16.6%	25%	—	Vals Tax Eval v2Galileo Agent Leaderboard v2