Creative

NPC dialogue

Low-latency in-character dialogue suitable for games.

task.dialogue_character_voicetask.persona_consistency

Best for this use case

gemini-3-pro-preview

Strong on BFCL Memory Official Memory Acc and BFCL Multi-turn Official Multi Turn Acc

44.2%

Best benchmark score

53.2%

Confidence

All ranked models — top 3

🥇

gemini-3-pro-preview

44.2%

🥈

Grok-4-0709

44.1%

🥉

grok-4-1-fast-reasoning

40.0%

Ranked Models

Evidence Quality

85%

Evidence Points

Top Signal

BFCL Memory Official: Memory Acc

All Ranked Models

30 of 30 models

Rank	Model	Score	Confidence	Price / 1M	Evidence sources
🥇	gemini-3-pro-preview Strong on BFCL Memory Official Memory Acc and BFCL Multi-turn Official Multi Turn Acc	44.2%	53%	$4.50	BFCL Memory OfficialBFCL Multi-turn Official
🥈	Grok-4-0709 Strong on BFCL Memory Official Memory Acc and BFCL Relevance Detection Official Relevance Detection	44.1%	57%	—	BFCL Memory OfficialBFCL Relevance Detection Official
🥉	grok-4-1-fast-reasoning Strong on BFCL Memory Official Memory Acc and BFCL Multi-turn Official Multi Turn Acc	40.0%	51%	$0.28	BFCL Memory OfficialBFCL Multi-turn Official
#4	o3-20250416 Strong on BFCL Memory Official Memory Acc and BFCL Relevance Detection Official Relevance Detection	37.0%	54%	$3.50	BFCL Memory OfficialBFCL Relevance Detection Official
#5	GLM-4.6 Strong on BFCL Memory Official Memory Acc and BFCL Multi-turn Official Multi Turn Acc	36.4%	42%	—	BFCL Memory OfficialBFCL Multi-turn Official
#6	gpt-4.1-20250414 Strong on BFCL Relevance Detection Official Relevance Detection and BFCL Memory Official Memory Acc	32.0%	55%	—	BFCL Relevance Detection OfficialBFCL Memory Official
#7	Kimi-K2-Instruct Strong on BFCL Multi-turn Official Multi Turn Acc and BFCL Memory Official Memory Acc	30.7%	45%	—	BFCL Multi-turn OfficialBFCL Memory Official
#8	o4-mini Strong on BFCL Memory Official Memory Acc and BFCL Relevance Detection Official Relevance Detection	29.8%	52%	$1.93	BFCL Memory OfficialBFCL Relevance Detection Official
#9	gemini-2.5-flash Strong on BFCL Memory Official Memory Acc and BFCL Relevance Detection Official Relevance Detection	28.9%	49%	$0.17	BFCL Memory OfficialBFCL Relevance Detection Official
#10	grok-4-1-fast-non-reasoning Strong on BFCL Multi-turn Official Multi Turn Acc and BFCL Memory Official Memory Acc	28.7%	50%	$0.28	BFCL Multi-turn OfficialBFCL Memory Official
#11	gpt-5.2-2025-12-11 Strong on BFCL Multi-turn Official Multi Turn Acc and BFCL Relevance Detection Official Relevance Detection	28.6%	52%	—	BFCL Multi-turn OfficialBFCL Relevance Detection Official
#15	claude-opus-4-5-20251101 Strong on BFCL Relevance Detection Official Relevance Detection and UGI Leaderboard Writing ✍️	23.3%	51%	—	BFCL Relevance Detection OfficialUGI Leaderboard
#18	gemini-2.5-pro Strong on UGI Leaderboard Writing ✍️ and MWS Vision Bench validation_overall_score	21.3%	29%	$3.44	UGI LeaderboardMWS Vision Bench
#24	gpt-5-2025-08-07 Strong on UGI Leaderboard Writing ✍️ and UGI Leaderboard Entertainment	19.4%	25%	—	UGI LeaderboardUGI Leaderboard
#26	claude-sonnet-4 Strong on UGI Leaderboard Writing ✍️ and Galileo Agent Leaderboard v2 Avg AC	18.7%	26%	$6.00	UGI LeaderboardGalileo Agent Leaderboard v2
#27	gemini-3.1-pro-preview Strong on UGI Leaderboard Writing ✍️ and UGI Leaderboard Entertainment	18.4%	21%	$4.50	UGI LeaderboardUGI Leaderboard
#28	Arch-Agent-32B Strong on BFCL Multi-turn Official Multi Turn Acc and BFCL Relevance Detection Official Relevance Detection	18.3%	33%	—	BFCL Multi-turn OfficialBFCL Relevance Detection Official
#30	Llama 3.3 70B Instruct Strong on BFCL Relevance Detection Official Relevance Detection and BFCL Multi-turn Official Multi Turn Acc	17.9%	50%	—	BFCL Relevance Detection OfficialBFCL Multi-turn Official
#31	gemini-3-flash-preview Strong on UGI Leaderboard Writing ✍️ and MWS Vision Bench validation_overall_score	17.7%	23%	$1.13	UGI LeaderboardMWS Vision Bench
#39	qwen-2.5-72b-instruct Strong on EQ-Bench Leaderboard judgemark_score and Galileo Agent Leaderboard v2 Avg AC	15.8%	30%	—	EQ-Bench LeaderboardGalileo Agent Leaderboard v2
#41	gpt-5.4-2026-03-05 Strong on UGI Leaderboard Writing ✍️ and UGI Leaderboard Entertainment	15.7%	18%	—	UGI LeaderboardUGI Leaderboard
#45	claude-sonnet-4.6 Strong on UGI Leaderboard Writing ✍️ and UGI Leaderboard Entertainment	15.4%	18%	$6.00	UGI LeaderboardUGI Leaderboard
#52	Llama-4-Scout-17B-16E-Instruct Strong on BFCL Relevance Detection Official Relevance Detection and BFCL Memory Official Memory Acc	14.6%	41%	—	BFCL Relevance Detection OfficialBFCL Memory Official
#54	kimi-k2.5-thinking Strong on UGI Leaderboard Writing ✍️ and UGI Leaderboard Entertainment	14.5%	18%	—	UGI LeaderboardUGI Leaderboard
#56	gemini-2.5-flash-lite Strong on BFCL Relevance Detection Official Relevance Detection and BFCL Memory Official Memory Acc	14.4%	41%	$0.17	BFCL Relevance Detection OfficialBFCL Memory Official
#58	gpt-4o Strong on EQ-Bench Leaderboard judgemark_score and MEGA-Bench overall_score	14.2%	22%	$0.26	EQ-Bench LeaderboardMEGA-Bench
#59	gpt-5.1-2025-11-13 Strong on UGI Leaderboard Writing ✍️ and UGI Leaderboard Entertainment	13.7%	19%	—	UGI LeaderboardUGI Leaderboard
#63	gpt-5-mini-2025-08-07 Strong on MWS Vision Bench validation_overall_score and Vals MedQA overall_accuracy_pct	13.5%	20%	—	MWS Vision BenchVals MedQA
#72	grok-4-fast-reasoning Strong on UGI Leaderboard Writing ✍️ and UGI Leaderboard Entertainment	12.9%	20%	$0.28	UGI LeaderboardUGI Leaderboard
#73	Arch-Agent-3B Strong on BFCL Multi-turn Official Multi Turn Acc and BFCL Relevance Detection Official Relevance Detection	12.7%	33%	—	BFCL Multi-turn OfficialBFCL Relevance Detection Official