Engineering

Component selection assistant

Recommend components under constraints with evidence and tradeoffs.

task.recommendation_matchingtask.tradeoff_analysis

Best for this use case

gemini-3.1-pro-preview

Strong on Vals SWE-bench overall_accuracy_pct and Vals Finance Agent overall_accuracy_pct

35.5%

Best benchmark score

39.5%

Confidence

All ranked models — top 3

🥇

gemini-3.1-pro-preview

35.5%

🥈

gpt-5-2025-08-07

34.9%

🥉

gemini-2.5-pro

30.4%

Ranked Models

Evidence Quality

85%

Evidence Points

Top Signal

Vals SWE-bench: overall_accuracy_pct

All Ranked Models

30 of 30 models

Rank	Model	Score	Confidence	Price / 1M	Evidence sources
🥇	gemini-3.1-pro-preview Strong on Vals SWE-bench overall_accuracy_pct and Vals Finance Agent overall_accuracy_pct	35.5%	40%	$4.50	Vals SWE-benchVals Finance Agent
🥈	gpt-5-2025-08-07 Strong on Aider Polyglot Leaderboard percent_correct_pct and SWE-bench Verified Leaderboard swe_verified_resolved_pct	34.9%	45%	—	Aider Polyglot LeaderboardSWE-bench Verified Leaderboard
🥉	gemini-2.5-pro Strong on FACTS Benchmark Suite facts_grounding_score_pct and SWE-bench Verified Leaderboard swe_verified_resolved_pct	30.4%	45%	$3.44	FACTS Benchmark SuiteSWE-bench Verified Leaderboard
#4	gemini-3-pro-preview Strong on SWE-bench Verified Leaderboard swe_verified_resolved_pct and Vals SWE-bench overall_accuracy_pct	29.6%	39%	$4.50	SWE-bench Verified LeaderboardVals SWE-bench
#5	gpt-5-mini-2025-08-07 Strong on Vals LiveCodeBench overall_accuracy_pct and SWE-bench Verified Leaderboard swe_verified_resolved_pct	29.6%	47%	—	Vals LiveCodeBenchSWE-bench Verified Leaderboard
#6	gpt-5.2-2025-12-11 Strong on FACTS Benchmark Suite facts_grounding_score_pct and SWE-bench Verified Leaderboard swe_verified_resolved_pct	29.2%	34%	—	FACTS Benchmark SuiteSWE-bench Verified Leaderboard
#7	gemini-3-flash-preview Strong on Vals CorpFin v2 overall_accuracy_pct and Vals SWE-bench overall_accuracy_pct	27.9%	36%	$1.13	Vals CorpFin v2Vals SWE-bench
#8	claude-sonnet-4.6 Strong on Vals Finance Agent overall_accuracy_pct and Vals SWE-bench overall_accuracy_pct	26.2%	32%	$6.00	Vals Finance AgentVals SWE-bench
#9	Grok-4-0709 Strong on Vals CorpFin v2 overall_accuracy_pct and Vals Finance Agent overall_accuracy_pct	25.8%	38%	—	Vals CorpFin v2Vals Finance Agent
#10	gpt-5.4-2026-03-05 Strong on Vals SWE-bench overall_accuracy_pct and Vectara HHEM Leaderboard overall_hallucination_error_pct	25.1%	30%	—	Vals SWE-benchVectara HHEM Leaderboard
#11	claude-opus-4-5-20251101 Strong on SWE-bench Verified Leaderboard swe_verified_resolved_pct and FACTS Benchmark Suite facts_grounding_score_pct	25.0%	34%	—	SWE-bench Verified LeaderboardFACTS Benchmark Suite
#12	claude-sonnet-4 Strong on SWE-bench Verified Leaderboard swe_verified_resolved_pct and Galileo Agent Leaderboard v2 Avg AC	24.9%	39%	$6.00	SWE-bench Verified LeaderboardGalileo Agent Leaderboard v2
#13	gemini-3.1-flash-lite-preview Strong on FACTS Benchmark Suite facts_grounding_score_pct and Vectara HHEM Leaderboard overall_hallucination_error_pct	24.5%	36%	$0.56	FACTS Benchmark SuiteVectara HHEM Leaderboard
#14	gpt-4.1-20250414 Strong on Vectara HHEM Leaderboard overall_hallucination_error_pct and Galileo Agent Leaderboard v2 Avg AC	23.6%	37%	—	Vectara HHEM LeaderboardGalileo Agent Leaderboard v2
#15	o3-20250416 Strong on Aider Polyglot Leaderboard percent_correct_pct and SWE-bench Verified Leaderboard swe_verified_resolved_pct	23.4%	34%	$3.50	Aider Polyglot LeaderboardSWE-bench Verified Leaderboard
#16	gpt-5.1-2025-11-13 Strong on Vals Finance Agent overall_accuracy_pct and Vals CorpFin v2 overall_accuracy_pct	22.4%	33%	—	Vals Finance AgentVals CorpFin v2
#17	grok-4-fast-reasoning Strong on Vals CorpFin v2 overall_accuracy_pct and Vals LiveCodeBench overall_accuracy_pct	20.3%	39%	$0.28	Vals CorpFin v2Vals LiveCodeBench
#18	claude-opus-4-6-thinking Strong on Vals SWE-bench overall_accuracy_pct and Vals CorpFin v2 overall_accuracy_pct	20.2%	22%	—	Vals SWE-benchVals CorpFin v2
#20	claude-opus-4-5-20251101-thinking Strong on Vals SWE-bench overall_accuracy_pct and Vals Finance Agent overall_accuracy_pct	18.9%	22%	—	Vals SWE-benchVals Finance Agent
#21	kimi-k2.5-thinking Strong on Vals CorpFin v2 overall_accuracy_pct and Vals LiveCodeBench overall_accuracy_pct	18.5%	25%	—	Vals CorpFin v2Vals LiveCodeBench
#22	glm-5-thinking Strong on Vals CorpFin v2 overall_accuracy_pct and Vals Finance Agent overall_accuracy_pct	17.3%	23%	—	Vals CorpFin v2Vals Finance Agent
#24	grok-4.20-0309-reasoning Strong on Vals CorpFin v2 overall_accuracy_pct and Vals SWE-bench overall_accuracy_pct	17.1%	22%	—	Vals CorpFin v2Vals SWE-bench
#25	claude-sonnet-4-5-20250929-thinking Strong on Vals Finance Agent overall_accuracy_pct and Vals CorpFin v2 overall_accuracy_pct	17.0%	22%	—	Vals Finance AgentVals CorpFin v2
#26	grok-4-1-fast-reasoning Strong on Vals CorpFin v2 overall_accuracy_pct and Vals Finance Agent overall_accuracy_pct	17.0%	28%	$0.28	Vals CorpFin v2Vals Finance Agent
#27	Kimi K2 Thinking Strong on SWE-bench Verified Leaderboard swe_verified_resolved_pct and Vals CorpFin v2 overall_accuracy_pct	16.7%	28%	$1.07	SWE-bench Verified LeaderboardVals CorpFin v2
#28	o4-mini Strong on Aider Polyglot Leaderboard percent_correct_pct and Vals LiveCodeBench overall_accuracy_pct	16.6%	33%	$1.93	Aider Polyglot LeaderboardVals LiveCodeBench
#29	MiniMax-M2.7 Strong on Vals SWE-bench overall_accuracy_pct and Vals CorpFin v2 overall_accuracy_pct	16.3%	21%	$0.53	Vals SWE-benchVals CorpFin v2
#30	gemini-2.5-flash Strong on FACTS Benchmark Suite facts_grounding_score_pct and Vectara HHEM Leaderboard overall_hallucination_error_pct	15.5%	26%	$0.17	FACTS Benchmark SuiteVectara HHEM Leaderboard
#31	gpt-5.4-nano-2026-03-17 Strong on Vals LiveCodeBench overall_accuracy_pct and Vals SWE-bench overall_accuracy_pct	15.5%	21%	—	Vals LiveCodeBenchVals SWE-bench
#32	qwen3.5-flash Strong on Vals CorpFin v2 overall_accuracy_pct and Vals LiveCodeBench overall_accuracy_pct	14.9%	21%	—	Vals CorpFin v2Vals LiveCodeBench

Compare Models

Select two different models above to compare their evidence side by side.

▶Ranking diagnostics & missing models

Source lift

Ranked

Sources

Quality

Good

Vals CorpFin v2

44 rows · 1.2% avg lift

Vals LiveCodeBench

44 rows · 1.2% avg lift

Vals Tax Eval v2

31 rows · 0.3% avg lift

Vals Finance Agent

30 rows · 1.1% avg lift

Missing frontier models

No obvious gaps right now.

▶Taxonomy & task details

Core tasks

task.recommendation_matchingtask.tradeoff_analysis

Required modes

none

Domains

domain.electrical_engineering

Related in Engineering

Simulation setup assistant

Turn design requirements into simulation setup checklists and boundary notes.

CAD scripting helper

Generate and debug CAD automation scripts and parametric geometry code.

Verilog/VHDL generation

Generate RTL code and testbenches from functional specs.