Creative

Poetry and lyrics

Generate poems and lyrics with style control and variation.

task.poetry_lyrics

Evidence quality is currently limited for this use case. Rankings below are useful for exploration, not a strong winner claim.

Provisional leader

Grok-4-0709

Current leader based on limited benchmark evidence. Treat this ranking as directional until coverage improves.

31.7%

Best benchmark score

41.4%

Confidence

All ranked models — top 3

🥇

Grok-4-0709

31.7%

🥈

gemini-2.5-pro

31.3%

🥉

gpt-5-2025-08-07

28.6%

Ranked Models

Evidence Quality

82%

Evidence Points

Top Signal

UGI Leaderboard: Writing ✍️

All Ranked Models

30 of 30 models

Rank	Model	Score	Confidence	Price / 1M	Evidence sources
🥇	Grok-4-0709 Strong on UGI Leaderboard Writing ✍️ and UGI Leaderboard Entertainment	31.7%	41%	—	UGI LeaderboardUGI Leaderboard
🥈	gemini-2.5-pro Strong on UGI Leaderboard Writing ✍️ and MWS Vision Bench validation_overall_score	31.3%	43%	$3.44	UGI LeaderboardMWS Vision Bench
🥉	gpt-5-2025-08-07 Strong on UGI Leaderboard Writing ✍️ and UGI Leaderboard Entertainment	28.6%	37%	—	UGI LeaderboardUGI Leaderboard
#4	gemini-3-pro-preview Strong on UGI Leaderboard Writing ✍️ and BFCL Memory Official Memory Acc	28.3%	36%	$4.50	UGI LeaderboardBFCL Memory Official
#5	claude-sonnet-4 Strong on UGI Leaderboard Writing ✍️ and Galileo Agent Leaderboard v2 Avg AC	27.5%	38%	$6.00	UGI LeaderboardGalileo Agent Leaderboard v2
#6	o3-20250416 Strong on UGI Leaderboard Writing ✍️ and UGI Leaderboard Entertainment	27.2%	37%	$3.50	UGI LeaderboardUGI Leaderboard
#7	gemini-3.1-pro-preview Strong on UGI Leaderboard Writing ✍️ and UGI Leaderboard Entertainment	27.1%	31%	$4.50	UGI LeaderboardUGI Leaderboard
#8	gpt-4.1-20250414 Strong on UGI Leaderboard Writing ✍️ and Galileo Agent Leaderboard v2 Avg AC	26.7%	39%	—	UGI LeaderboardGalileo Agent Leaderboard v2
#9	gemini-3-flash-preview Strong on UGI Leaderboard Writing ✍️ and MWS Vision Bench validation_overall_score	26.1%	33%	$1.13	UGI LeaderboardMWS Vision Bench
#11	gpt-5.2-2025-12-11 Strong on UGI Leaderboard Writing ✍️ and UGI Leaderboard Entertainment	23.6%	34%	—	UGI LeaderboardUGI Leaderboard
#12	gpt-5.4-2026-03-05 Strong on UGI Leaderboard Writing ✍️ and UGI Leaderboard Entertainment	23.1%	27%	—	UGI LeaderboardUGI Leaderboard
#13	claude-sonnet-4.6 Strong on UGI Leaderboard Writing ✍️ and UGI Leaderboard Entertainment	22.7%	27%	$6.00	UGI LeaderboardUGI Leaderboard
#14	grok-4-1-fast-reasoning Strong on UGI Leaderboard Writing ✍️ and BFCL Memory Official Memory Acc	22.5%	33%	$0.28	UGI LeaderboardBFCL Memory Official
#15	qwen-2.5-72b-instruct Strong on EQ-Bench Leaderboard judgemark_score and Galileo Agent Leaderboard v2 Avg AC	22.0%	40%	—	EQ-Bench LeaderboardGalileo Agent Leaderboard v2
#17	claude-opus-4-5-20251101 Strong on UGI Leaderboard Writing ✍️ and UGI Leaderboard Entertainment	21.5%	33%	—	UGI LeaderboardUGI Leaderboard
#18	kimi-k2.5-thinking Strong on UGI Leaderboard Writing ✍️ and UGI Leaderboard Entertainment	21.3%	27%	—	UGI LeaderboardUGI Leaderboard
#19	gpt-5.1-2025-11-13 Strong on UGI Leaderboard Writing ✍️ and UGI Leaderboard Entertainment	20.2%	28%	—	UGI LeaderboardUGI Leaderboard
#20	gpt-5-mini-2025-08-07 Strong on MWS Vision Bench validation_overall_score and Vals MedQA overall_accuracy_pct	19.8%	29%	—	MWS Vision BenchVals MedQA
#21	o4-mini Strong on UGI Leaderboard Writing ✍️ and UGI Leaderboard Entertainment	19.8%	35%	$1.93	UGI LeaderboardUGI Leaderboard
#23	gpt-4o Strong on EQ-Bench Leaderboard judgemark_score and MEGA-Bench overall_score	19.7%	28%	$0.26	EQ-Bench LeaderboardMEGA-Bench
#24	gemini-2.5-flash Strong on MWS Vision Bench validation_overall_score and Galileo Agent Leaderboard v2 Avg TSQ	19.2%	29%	$0.17	MWS Vision BenchGalileo Agent Leaderboard v2
#25	grok-4-fast-reasoning Strong on UGI Leaderboard Writing ✍️ and UGI Leaderboard Entertainment	18.9%	30%	$0.28	UGI LeaderboardUGI Leaderboard
#27	Kimi-K2-Instruct Strong on UGI Leaderboard Entertainment and UGI Leaderboard Writing ✍️	18.0%	24%	—	UGI LeaderboardUGI Leaderboard
#28	Kimi K2 Thinking Strong on UGI Leaderboard Writing ✍️ and UGI Leaderboard Entertainment	18.0%	25%	$1.07	UGI LeaderboardUGI Leaderboard
#30	grok-3 Strong on UGI Leaderboard Writing ✍️ and UGI Leaderboard Entertainment	16.8%	22%	$6.00	UGI LeaderboardUGI Leaderboard
#33	grok-4-1-fast-non-reasoning Strong on UGI Leaderboard Writing ✍️ and UGI Leaderboard Entertainment	16.3%	31%	$0.28	UGI LeaderboardUGI Leaderboard
#34	claude-opus-4 Strong on UGI Leaderboard Writing ✍️ and UGI Leaderboard Entertainment	16.1%	22%	$10.00	UGI LeaderboardUGI Leaderboard
#35	claude-opus-4-1-20250805 Strong on UGI Leaderboard Writing ✍️ and UGI Leaderboard Entertainment	16.1%	24%	—	UGI LeaderboardUGI Leaderboard
#37	GLM-4.6 Strong on UGI Leaderboard Entertainment and UGI Leaderboard Writing ✍️	15.6%	19%	—	UGI LeaderboardUGI Leaderboard
#38	claude-opus-4-6 Strong on UGI Leaderboard Writing ✍️ and UGI Leaderboard Entertainment	15.4%	18%	$10.00	UGI LeaderboardUGI Leaderboard