Companion

Mindfulness and meditation scripts

Generate calming scripts and exercises tailored to a user's context.

task.empathy_support_dialoguetask.rewrite_tone_style

Best for this use case

gemini-3-pro-preview

Strong on BFCL Memory Official Memory Acc and BFCL Multi-turn Official Multi Turn Acc

48.5%

Best benchmark score

58.5%

Confidence

All ranked models — top 3

🥇

gemini-3-pro-preview

48.5%

🥈

Grok-4-0709

48.1%

🥉

grok-4-1-fast-reasoning

44.7%

Ranked Models

Evidence Quality

88%

Evidence Points

Top Signal

BFCL Memory Official: Memory Acc

All Ranked Models

30 of 30 models

Rank	Model	Score	Confidence	Price / 1M	Evidence sources
🥇	gemini-3-pro-preview Strong on BFCL Memory Official Memory Acc and BFCL Multi-turn Official Multi Turn Acc	48.5%	59%	$4.50	BFCL Memory OfficialBFCL Multi-turn Official
🥈	Grok-4-0709 Strong on BFCL Memory Official Memory Acc and BFCL Relevance Detection Official Relevance Detection	48.1%	62%	—	BFCL Memory OfficialBFCL Relevance Detection Official
🥉	grok-4-1-fast-reasoning Strong on BFCL Memory Official Memory Acc and BFCL Multi-turn Official Multi Turn Acc	44.7%	57%	$0.28	BFCL Memory OfficialBFCL Multi-turn Official
#4	GLM-4.6 Strong on BFCL Memory Official Memory Acc and BFCL Multi-turn Official Multi Turn Acc	41.9%	48%	—	BFCL Memory OfficialBFCL Multi-turn Official
#5	o3-20250416 Strong on BFCL Memory Official Memory Acc and BFCL Relevance Detection Official Relevance Detection	40.5%	59%	$3.50	BFCL Memory OfficialBFCL Relevance Detection Official
#6	gpt-4.1-20250414 Strong on BFCL Relevance Detection Official Relevance Detection and BFCL Memory Official Memory Acc	35.4%	60%	—	BFCL Relevance Detection OfficialBFCL Memory Official
#7	Kimi-K2-Instruct Strong on BFCL Multi-turn Official Multi Turn Acc and BFCL Memory Official Memory Acc	35.1%	51%	—	BFCL Multi-turn OfficialBFCL Memory Official
#8	o4-mini Strong on BFCL Memory Official Memory Acc and BFCL Relevance Detection Official Relevance Detection	33.5%	58%	$1.93	BFCL Memory OfficialBFCL Relevance Detection Official
#9	grok-4-1-fast-non-reasoning Strong on BFCL Relevance Detection Official Relevance Detection and BFCL Multi-turn Official Multi Turn Acc	32.8%	56%	$0.28	BFCL Relevance Detection OfficialBFCL Multi-turn Official
#10	gpt-5.2-2025-12-11 Strong on BFCL Relevance Detection Official Relevance Detection and BFCL Multi-turn Official Multi Turn Acc	31.8%	57%	—	BFCL Relevance Detection OfficialBFCL Multi-turn Official
#12	gemini-2.5-flash Strong on BFCL Memory Official Memory Acc and BFCL Relevance Detection Official Relevance Detection	30.1%	52%	$0.17	BFCL Memory OfficialBFCL Relevance Detection Official
#18	claude-opus-4-5-20251101 Strong on BFCL Relevance Detection Official Relevance Detection and BFCL Relevance Detection Official Irrelevance Detection	25.9%	57%	—	BFCL Relevance Detection OfficialBFCL Relevance Detection Official
#24	Arch-Agent-32B Strong on BFCL Multi-turn Official Multi Turn Acc and BFCL Relevance Detection Official Relevance Detection	23.0%	40%	—	BFCL Multi-turn OfficialBFCL Relevance Detection Official
#28	Llama 3.3 70B Instruct Strong on BFCL Relevance Detection Official Relevance Detection and BFCL Multi-turn Official Multi Turn Acc	21.6%	56%	—	BFCL Relevance Detection OfficialBFCL Multi-turn Official
#44	gpt-5-2025-08-07 Strong on UGI Leaderboard Writing ✍️ and UGI Leaderboard Entertainment	18.0%	23%	—	UGI LeaderboardUGI Leaderboard
#46	Llama-4-Scout-17B-16E-Instruct Strong on BFCL Relevance Detection Official Relevance Detection and BFCL Memory Official Memory Acc	17.9%	47%	—	BFCL Relevance Detection OfficialBFCL Memory Official
#50	gemini-2.5-pro Strong on UGI Leaderboard Writing ✍️ and UGI Leaderboard Entertainment	17.4%	25%	$3.44	UGI LeaderboardUGI Leaderboard
#52	claude-sonnet-4 Strong on UGI Leaderboard Writing ✍️ and Galileo Agent Leaderboard v2 Avg AC	17.2%	24%	$6.00	UGI LeaderboardGalileo Agent Leaderboard v2
#54	gemini-2.5-flash-lite Strong on BFCL Relevance Detection Official Relevance Detection and BFCL Relevance Detection Official Irrelevance Detection	17.2%	47%	$0.17	BFCL Relevance Detection OfficialBFCL Relevance Detection Official
#55	gemini-3.1-pro-preview Strong on UGI Leaderboard Writing ✍️ and UGI Leaderboard Entertainment	17.0%	20%	$4.50	UGI LeaderboardUGI Leaderboard
#61	Arch-Agent-3B Strong on BFCL Relevance Detection Official Relevance Detection and BFCL Multi-turn Official Multi Turn Acc	16.5%	40%	—	BFCL Relevance Detection OfficialBFCL Multi-turn Official
#62	Arch-Agent-1.5B Strong on BFCL Relevance Detection Official Relevance Detection and BFCL Multi-turn Official Multi Turn Acc	16.1%	40%	—	BFCL Relevance Detection OfficialBFCL Multi-turn Official
#66	gpt-5.4-2026-03-05 Strong on UGI Leaderboard Writing ✍️ and UGI Leaderboard Entertainment	14.6%	17%	—	UGI LeaderboardUGI Leaderboard
#68	claude-sonnet-4.6 Strong on UGI Leaderboard Writing ✍️ and UGI Leaderboard Entertainment	14.3%	17%	$6.00	UGI LeaderboardUGI Leaderboard
#71	gemini-3-flash-preview Strong on UGI Leaderboard Writing ✍️ and UGI Leaderboard Entertainment	14.2%	19%	$1.13	UGI LeaderboardUGI Leaderboard
#75	kimi-k2.5-thinking Strong on UGI Leaderboard Entertainment and UGI Leaderboard Writing ✍️	13.6%	17%	—	UGI LeaderboardUGI Leaderboard
#83	gpt-5.1-2025-11-13 Strong on UGI Leaderboard Writing ✍️ and UGI Leaderboard Entertainment	12.7%	18%	—	UGI LeaderboardUGI Leaderboard
#85	qwen-2.5-72b-instruct Strong on EQ-Bench Leaderboard judgemark_score and Galileo Agent Leaderboard v2 Avg AC	12.4%	24%	—	EQ-Bench LeaderboardGalileo Agent Leaderboard v2
#91	grok-4-fast-reasoning Strong on UGI Leaderboard Writing ✍️ and UGI Leaderboard Entertainment	12.0%	19%	$0.28	UGI LeaderboardUGI Leaderboard
#95	Kimi K2 Thinking Strong on UGI Leaderboard Writing ✍️ and UGI Leaderboard Entertainment	11.4%	16%	$1.07	UGI LeaderboardUGI Leaderboard