Marketing

Campaign brief

Draft a campaign brief with positioning, audience, and channels.

task.write_memo_brieftask.outline_generation

Evidence quality is currently limited for this use case. Rankings below are useful for exploration, not a strong winner claim.

Provisional leader

claude-sonnet-4

Current leader based on limited benchmark evidence. Treat this ranking as directional until coverage improves.

34.3%

Best benchmark score

44.7%

Confidence

All ranked models — top 3

🥇

claude-sonnet-4

34.3%

🥈

gemini-2.5-pro

33.2%

🥉

Grok-4-0709

31.4%

Ranked Models

Evidence Quality

82%

Evidence Points

Top Signal

Galileo Agent Leaderboard v2: Avg TSQ

All Ranked Models

30 of 30 models

Rank	Model	Score	Confidence	Price / 1M	Evidence sources
🥇	claude-sonnet-4 Strong on Galileo Agent Leaderboard v2 Avg TSQ and EQ-Bench Leaderboard eq_bench_score	34.3%	45%	$6.00	Galileo Agent Leaderboard v2EQ-Bench Leaderboard
🥈	gemini-2.5-pro Strong on EQ-Bench Leaderboard eq_bench_score and Galileo Agent Leaderboard v2 Avg TSQ	33.2%	46%	$3.44	EQ-Bench LeaderboardGalileo Agent Leaderboard v2
🥉	Grok-4-0709 Strong on Galileo Agent Leaderboard v2 Avg TSQ and EQ-Bench Leaderboard eq_bench_score	31.4%	41%	—	Galileo Agent Leaderboard v2EQ-Bench Leaderboard
#4	gpt-5-2025-08-07 Strong on EQ-Bench Leaderboard eq_bench_score and UGI Leaderboard Writing ✍️	31.0%	38%	—	EQ-Bench LeaderboardUGI Leaderboard
#5	o3-20250416 Strong on EQ-Bench Leaderboard eq_bench_score and UGI Leaderboard Writing ✍️	26.1%	32%	$3.50	EQ-Bench LeaderboardUGI Leaderboard
#6	gemini-3.1-pro-preview Strong on UGI Leaderboard Writing ✍️ and Vals Mortgage Tax overall_accuracy_pct	24.7%	27%	$4.50	UGI LeaderboardVals Mortgage Tax
#7	gpt-4.1-20250414 Strong on Galileo Agent Leaderboard v2 Avg TSQ and Galileo Agent Leaderboard v2 Avg AC	24.7%	33%	—	Galileo Agent Leaderboard v2Galileo Agent Leaderboard v2
#8	gpt-4o Strong on CRMArena Function Calling overall_score_pct and EQ-Bench Leaderboard eq_bench_score	20.8%	28%	$0.26	CRMArena Function CallingEQ-Bench Leaderboard
#9	gemini-3-flash-preview Strong on UGI Leaderboard Writing ✍️ and Vals Legal Bench overall_accuracy_pct	20.6%	26%	$1.13	UGI LeaderboardVals Legal Bench
#10	gemini-3-pro-preview Strong on UGI Leaderboard Writing ✍️ and Vals Mortgage Tax overall_accuracy_pct	19.9%	26%	$4.50	UGI LeaderboardVals Mortgage Tax
#11	o4-mini Strong on EQ-Bench Leaderboard eq_bench_score and UGI Leaderboard Writing ✍️	19.7%	29%	$1.93	EQ-Bench LeaderboardUGI Leaderboard
#12	gpt-5.4-2026-03-05 Strong on UGI Leaderboard Writing ✍️ and Vectara HHEM Leaderboard overall_hallucination_error_pct	19.7%	23%	—	UGI LeaderboardVectara HHEM Leaderboard
#13	gpt-5.2-2025-12-11 Strong on UGI Leaderboard Writing ✍️ and FACTS Benchmark Suite facts_grounding_score_pct	19.4%	23%	—	UGI LeaderboardFACTS Benchmark Suite
#14	gpt-5-mini-2025-08-07 Strong on Vals MedQA overall_accuracy_pct and Vals LiveCodeBench overall_accuracy_pct	19.2%	29%	—	Vals MedQAVals LiveCodeBench
#15	gemini-2.5-flash Strong on Galileo Agent Leaderboard v2 Avg TSQ and Galileo Agent Leaderboard v2 Avg AC	19.1%	26%	$0.17	Galileo Agent Leaderboard v2Galileo Agent Leaderboard v2
#16	claude-sonnet-4.6 Strong on UGI Leaderboard Writing ✍️ and Vals Tax Eval v2 overall_accuracy_pct	18.8%	23%	$6.00	UGI LeaderboardVals Tax Eval v2
#17	Kimi-K2-Instruct Strong on Galileo Agent Leaderboard v2 Avg TSQ and EQ-Bench Leaderboard eq_bench_score	18.6%	21%	—	Galileo Agent Leaderboard v2EQ-Bench Leaderboard
#18	gpt-5.1-2025-11-13 Strong on UGI Leaderboard Writing ✍️ and Vals Case Law v2 overall_accuracy_pct	18.6%	23%	—	UGI LeaderboardVals Case Law v2
#19	claude-opus-4 Strong on EQ-Bench Leaderboard eq_bench_score and UGI Leaderboard Writing ✍️	18.0%	22%	$10.00	EQ-Bench LeaderboardUGI Leaderboard
#20	claude-opus-4-5-20251101 Strong on UGI Leaderboard Writing ✍️ and Vals Mortgage Tax overall_accuracy_pct	17.9%	22%	—	UGI LeaderboardVals Mortgage Tax
#21	qwen-2.5-72b-instruct Strong on EQ-Bench Leaderboard eq_bench_score and Galileo Agent Leaderboard v2 Avg TSQ	17.8%	28%	—	EQ-Bench LeaderboardGalileo Agent Leaderboard v2
#22	grok-3 Strong on EQ-Bench Leaderboard eq_bench_score and UGI Leaderboard Writing ✍️	17.0%	22%	$6.00	EQ-Bench LeaderboardUGI Leaderboard
#23	kimi-k2.5-thinking Strong on UGI Leaderboard Writing ✍️ and Vals CorpFin v2 overall_accuracy_pct	16.7%	22%	—	UGI LeaderboardVals CorpFin v2
#24	Claude-3.5-Sonnet Strong on EQ-Bench Leaderboard eq_bench_score and CRMArena Function Calling overall_score_pct	16.4%	25%	$6.00	EQ-Bench LeaderboardCRMArena Function Calling
#25	grok-4-fast-reasoning Strong on UGI Leaderboard Writing ✍️ and Vals CorpFin v2 overall_accuracy_pct	15.8%	26%	$0.28	UGI LeaderboardVals CorpFin v2
#26	deepseek-r1 Strong on EQ-Bench Leaderboard eq_bench_score and DuckDB NSQL Leaderboard all_execution_accuracy	15.0%	32%	$0.27	EQ-Bench LeaderboardDuckDB NSQL Leaderboard
#27	claude-opus-4-6-thinking Strong on Vals SWE-bench overall_accuracy_pct and Vals Mortgage Tax overall_accuracy_pct	14.9%	17%	—	Vals SWE-benchVals Mortgage Tax
#29	gpt-4.1-mini-20250414 Strong on Galileo Agent Leaderboard v2 Avg TSQ and Galileo Agent Leaderboard v2 Avg AC	14.8%	20%	—	Galileo Agent Leaderboard v2Galileo Agent Leaderboard v2
#30	claude-opus-4-5-20251101-thinking Strong on Vals MedQA overall_accuracy_pct and Vals Mortgage Tax overall_accuracy_pct	14.5%	17%	—	Vals MedQAVals Mortgage Tax
#31	gemini-3.1-flash-lite-preview Strong on Vals Mortgage Tax overall_accuracy_pct and Vectara HHEM Leaderboard overall_hallucination_error_pct	14.5%	21%	$0.56	Vals Mortgage TaxVectara HHEM Leaderboard

Compare Models

Select two different models above to compare their evidence side by side.

▶Ranking diagnostics & missing models

Source lift

Ranked

Sources

Quality

Low

Vals Legal Bench

45 rows · 0.7% avg lift

Vals Tax Eval v2

43 rows · 0.7% avg lift

Vals LiveCodeBench

43 rows · 0.6% avg lift

Vals GPQA

43 rows · 0.6% avg lift

Missing frontier models

No obvious gaps right now.

▶Taxonomy & task details

Core tasks

task.write_memo_brieftask.outline_generation

Required modes

none

Domains

domain.marketing_sales

Related in Marketing

Social listening brief

Summarize social chatter into themes, risks, and recommendations.

Product positioning and messaging

Develop positioning, value props, and message pillars with tradeoffs.

Social post generation

Generate short channel-specific social posts and variations.

Landing page copy

Draft landing pages with clear positioning and structure.