Refusal profile (eval)
Measure refusal/overrefusal rates across predefined categories.
Provisional leader
Llama-2-7b-chat-hf
Current leader based on limited benchmark evidence. Treat this ranking as directional until coverage improves.
23.9%
Best benchmark score
29.8%
Confidence
All ranked models โ top 3
Ranked Models
30
Evidence Quality
81%
Evidence Points
5
Top Signal
LLM Trustworthy Leaderboard: fairness
All Ranked Models
| Rank | Model | Score |
|---|---|---|
| ๐ฅ | Llama-2-7b-chat-hf Strong on LLM Trustworthy Leaderboard fairness and LLM Trustworthy Leaderboard privacy | 23.9% |
| #4 | gpt-5-2025-08-07 Strong on UGI Leaderboard Hazardous and Aider Polyglot Leaderboard percent_correct_pct | 20.7% |
| #5 | gpt-4o-mini-2024-07-18 Strong on LLM Trustworthy Leaderboard privacy and LLM Trustworthy Leaderboard adv | 20.4% |
| #6 | Meta-Llama-3-8B-Instruct Strong on LLM Trustworthy Leaderboard adv and LLM Trustworthy Leaderboard privacy | 20.3% |
| #7 | gemini-2.5-pro Strong on UGI Leaderboard Hazardous and Galileo Agent Leaderboard v2 Avg AC | 20.3% |
| #8 | gemini-3.1-pro-preview Strong on UGI Leaderboard Hazardous and Vals MedCode overall_accuracy_pct | 20.2% |
| #10 | gemma-7b-it Strong on LLM Trustworthy Leaderboard fairness and LLM Trustworthy Leaderboard privacy | 19.4% |
| #11 | gemma-2b-it Strong on LLM Trustworthy Leaderboard fairness and LLM Trustworthy Leaderboard privacy | 19.4% |
| #12 | gpt-4o-2024-05-13 Strong on LLM Trustworthy Leaderboard privacy and LLM Trustworthy Leaderboard adv | 19.4% |
| #14 | claude-sonnet-4 Strong on Galileo Agent Leaderboard v2 Avg AC and Galileo Agent Leaderboard v2 Avg TSQ | 18.9% |
| #15 | Grok-4-0709 Strong on UGI Leaderboard Hazardous and Galileo Agent Leaderboard v2 Avg TSQ | 18.6% |
| #16 | falcon-7b-instruct Strong on LLM Trustworthy Leaderboard fairness and LLM Trustworthy Leaderboard privacy | 18.2% |
| #17 | gpt-4.1-20250414 Strong on Galileo Agent Leaderboard v2 Avg AC and UGI Leaderboard Hazardous | 17.9% |
| #18 | gemini-3-flash-preview Strong on UGI Leaderboard Hazardous and Vals Legal Bench overall_accuracy_pct | 17.8% |
| #19 | o3-20250416 Strong on UGI Leaderboard Hazardous and Aider Polyglot Leaderboard percent_correct_pct | 17.0% |
| #20 | zephyr-7b-beta Strong on LLM Trustworthy Leaderboard fairness and LLM Trustworthy Leaderboard privacy | 16.8% |
| #22 | claude-sonnet-4.6 Strong on UGI Leaderboard Hazardous and Vals Finance Agent overall_accuracy_pct | 16.3% |
| #23 | gpt-5.1-2025-11-13 Strong on UGI Leaderboard Hazardous and Vals Case Law v2 overall_accuracy_pct | 16.2% |
| #24 | gpt-5-mini-2025-08-07 Strong on Vals MedQA overall_accuracy_pct and Vals LiveCodeBench overall_accuracy_pct | 15.9% |
| #25 | gemini-3-pro-preview Strong on Vals Mortgage Tax overall_accuracy_pct and Vals Legal Bench overall_accuracy_pct | 14.8% |
| #27 | gpt-5.4-2026-03-05 Strong on Vectara HHEM Leaderboard overall_hallucination_error_pct and Vals MedQA overall_accuracy_pct | 14.4% |
| #28 | gpt-5.2-2025-12-11 Strong on FACTS Benchmark Suite facts_grounding_score_pct and Vals Tax Eval v2 overall_accuracy_pct | 14.1% |
| #29 | kimi-k2.5-thinking Strong on UGI Leaderboard Hazardous and Vals CorpFin v2 overall_accuracy_pct | 13.4% |
| #30 | grok-4-fast-reasoning Strong on UGI Leaderboard Hazardous and Vals CorpFin v2 overall_accuracy_pct | 13.2% |
| #31 | alpaca-native Strong on LLM Trustworthy Leaderboard fairness and LLM Trustworthy Leaderboard adv | 13.2% |
| #32 | o4-mini Strong on UGI Leaderboard Hazardous and Aider Polyglot Leaderboard percent_correct_pct | 13.2% |
| #33 | grok-4-1-fast-reasoning Strong on UGI Leaderboard Hazardous and Vals CorpFin v2 overall_accuracy_pct | 12.7% |
| #34 | Mistral-7B-OpenOrca Strong on LLM Trustworthy Leaderboard privacy and LLM Trustworthy Leaderboard adv | 12.5% |
| #35 | claude-opus-4-6-thinking Strong on Vals SWE-bench overall_accuracy_pct and Vals Mortgage Tax overall_accuracy_pct | 12.3% |
| #36 | gemini-2.5-flash Strong on Galileo Agent Leaderboard v2 Avg TSQ and Galileo Agent Leaderboard v2 Avg AC | 12.2% |
Compare Models
โถRanking diagnostics & missing models
Source lift
Ranked
65
Sources
8
Quality
Low
Vals Legal Bench
Vals Tax Eval v2
Vals GPQA
Vals LiveCodeBench
Missing frontier models
No obvious gaps right now.
โถTaxonomy & task details
Core tasks
Required modes
Domains
Related in Risk & Eval
Disinformation and manipulation resistance (eval)
Measure refusal and safe handling of deceptive content generation requests.
Crisis escalation protocol (eval)
Measure safe crisis escalation behavior under the selected policy.
Jailbreak resistance (eval)
Measure robustness to adversarial prompts that attempt to bypass policy.
Overrefusal (eval)
Measure how often benign requests are incorrectly refused.