Support bot (RAG grounded)
Support chatbot grounded in docs with optional citations and escalation.
Provisional leader
gpt-5-2025-08-07
Current leader based on limited benchmark evidence. Treat this ranking as directional until coverage improves.
32.0%
Best benchmark score
41.3%
Confidence
All ranked models โ top 3
Ranked Models
30
Evidence Quality
83%
Evidence Points
31
Top Signal
BasedAGI Support Bot Eval: overall_score_pct
All Ranked Models
| Rank | Model | Score |
|---|---|---|
| ๐ฅ | gpt-5-2025-08-07 Strong on BasedAGI Support Bot Eval overall_score_pct and FACTS Benchmark Suite facts_grounding_score_pct | 32.0% |
| ๐ฅ | gemini-2.5-pro Strong on FACTS Benchmark Suite facts_grounding_score_pct and Vectara HHEM Leaderboard overall_hallucination_error_pct | 30.8% |
| ๐ฅ | gemini-3.1-pro-preview Strong on SimpleQA Verified simpleqa_verified_score_pct and Vals Finance Agent overall_accuracy_pct | 29.7% |
| #4 | gemini-3-pro-preview Strong on BasedAGI Support Bot Eval overall_score_pct and SimpleQA Verified simpleqa_verified_score_pct | 28.6% |
| #5 | claude-sonnet-4.6 Strong on Vals Finance Agent overall_accuracy_pct and BasedAGI Support Bot Eval overall_score_pct | 26.5% |
| #6 | gpt-5-mini-2025-08-07 Strong on Vals Finance Agent overall_accuracy_pct and Vals Finance Agent complex_retrieval_accuracy_pct | 25.8% |
| #7 | Grok-4-0709 Strong on Vals Finance Agent overall_accuracy_pct and SimpleQA Verified simpleqa_verified_score_pct | 25.3% |
| #8 | gemini-3-flash-preview Strong on Vals Finance Agent overall_accuracy_pct and Vectara HHEM Leaderboard overall_answer_rate_pct | 21.3% |
| #9 | gemini-3.1-flash-lite-preview Strong on FACTS Benchmark Suite facts_grounding_score_pct and Vectara HHEM Leaderboard overall_hallucination_error_pct | 20.2% |
| #10 | gpt-5.2-2025-12-11 Strong on FACTS Benchmark Suite facts_grounding_score_pct and Vals Finance Agent overall_accuracy_pct | 19.7% |
| #11 | gpt-4.1-20250414 Strong on Vectara HHEM Leaderboard overall_hallucination_error_pct and Galileo Agent Leaderboard v2 Avg AC | 19.5% |
| #12 | gpt-5.4-2026-03-05 Strong on Vectara HHEM Leaderboard overall_hallucination_error_pct and Vals Finance Agent overall_accuracy_pct | 18.7% |
| #13 | claude-sonnet-4 Strong on Galileo Agent Leaderboard v2 Avg AC and Vectara HHEM Leaderboard overall_hallucination_error_pct | 18.5% |
| #14 | claude-opus-4-5-20251101 Strong on FACTS Benchmark Suite facts_grounding_score_pct and Vectara HHEM Leaderboard overall_hallucination_error_pct | 17.2% |
| #15 | gemini-2.5-flash Strong on FACTS Benchmark Suite facts_grounding_score_pct and Vectara HHEM Leaderboard overall_hallucination_error_pct | 16.9% |
| #16 | gpt-5.1-2025-11-13 Strong on Vals Finance Agent overall_accuracy_pct and Vals Finance Agent complex_retrieval_accuracy_pct | 16.8% |
| #17 | grok-4-fast-reasoning Strong on Vectara HHEM Leaderboard overall_answer_rate_pct and Vals Finance Agent overall_accuracy_pct | 16.7% |
| #18 | o3-20250416 Strong on SciArena Leaderboard rating_elo and SimpleQA Verified simpleqa_verified_score_pct | 15.5% |
| #19 | grok-4-1-fast-reasoning Strong on Vals Finance Agent overall_accuracy_pct and Vectara HHEM Leaderboard overall_answer_rate_pct | 13.2% |
| #20 | grok-3 Strong on Vectara HHEM Leaderboard overall_hallucination_error_pct and Vectara HHEM Leaderboard overall_answer_rate_pct | 12.8% |
| #21 | claude-opus-4-6-thinking Strong on Vals Finance Agent overall_accuracy_pct and Vals Finance Agent complex_retrieval_accuracy_pct | 12.7% |
| #22 | kimi-k2.5-thinking Strong on Vals Finance Agent overall_accuracy_pct and Vals CorpFin v2 overall_accuracy_pct | 12.6% |
| #25 | Qwen3-Embedding-4B Strong on MTEB Retrieval and Rerank (Official) retrieval_score_pct and BEIR-Style Retrieval (Official MTEB Slice) beir_average_score_pct | 11.8% |
| #29 | claude-opus-4-5-20251101-thinking Strong on Vals Finance Agent overall_accuracy_pct and Vals CorpFin v2 overall_accuracy_pct | 11.6% |
| #31 | claude-sonnet-4-5-20250929-thinking Strong on Vals Finance Agent overall_accuracy_pct and Vals Finance Agent complex_retrieval_accuracy_pct | 11.0% |
| #34 | claude-opus-4-1-20250805 Strong on Vectara HHEM Leaderboard overall_hallucination_error_pct and FACTS Benchmark Suite facts_grounding_score_pct | 10.9% |
| #37 | glm-5-thinking Strong on Vals Finance Agent overall_accuracy_pct and Vals CorpFin v2 overall_accuracy_pct | 10.8% |
| #38 | grok-4.20-0309-reasoning Strong on Vals Finance Agent overall_accuracy_pct and Vals Finance Agent complex_retrieval_accuracy_pct | 10.7% |
| #41 | deepseek-v3 Strong on Vectara HHEM Leaderboard overall_hallucination_error_pct and Vectara HHEM Leaderboard overall_answer_rate_pct | 10.6% |
| #44 | mistral-large-2512 Strong on Vectara HHEM Leaderboard overall_answer_rate_pct and Vectara HHEM Leaderboard overall_hallucination_error_pct | 10.4% |
Compare Models
โถRanking diagnostics & missing models
Source lift
Ranked
54
Sources
8
Quality
Low
Vals CorpFin v2
Vals Finance Agent
Vals Legal Bench
Vals Tax Eval v2
Missing frontier models
No obvious gaps right now.
โถTaxonomy & task details
Core tasks
Required modes
Domains
Related in CX
Agent-assist reply suggestions
Draft replies for human agents with tone and policy constraints.
Support dialogue agent
Multi-turn support conversations with escalation and policy awareness.
Customer feedback theme mining
Extract themes and trends from reviews, tickets, and surveys.
Support FAQ bot
Answer common support questions with safe troubleshooting steps.