Model Rankings
Updated Apr 19, 2026, 07:05 PM UTCRanked by composite utility score across benchmark-backed use cases, with the public default now preferring broader, higher-confidence model profiles.
Requires at least 4 scored dimensions and 20% average confidence. Models with confidence below 25% are shown with an amber confidence indicator — treat those rankings as provisional. Use Full Profiles Only for the strictest view.
How scoring works
Utility Score = Σ(use-case score × confidence) / Σ(confidence) across all scored use cases. Public ordering prefers full profiles first, then confidence-adjusted utility, then breadth.
Trusted Profiles
9
Models with all 5 dimensions scored and eligible for the stable overall ranking.
Emerging Profiles
0
High-potential models with 4/5 dimensions. Visible separately until their profiles fill in.
Trusted Avg Confidence
30.5%
Average confidence across models currently eligible for the trusted overall table.
0 use cases changed their #1 model since last update·20,080 model-use case pairings scored
Trusted overall ranking · full 5-dimension profiles
| Rank | Model | Utility |
|---|---|---|
| 🥇 | external/google/gemini-2-5-pro Full profile | 25.4% |
| 🥈 | Open zai-org/GLM-4.6 Full profile | 31.1% |
| 🥉 | external/openai/gpt-5-2025-08-07 Full profile | 25.5% |
| #4 | API external/xai/grok-4-0709 Full profile | 25.5% |
| #5 | external/anthropic/claude-sonnet-4 Full profile | 23.9% |
| #6 | external/openai/gpt-4-1-20250414 Full profile | 23.5% |
| #7 | external/google/gemini-3-pro-preview Full profile | 25.4% |
| #8 | API external/openai/o3-20250416 Full profile | 21.2% |
| #9 | external/anthropic/claude-opus-4-5-20251101 Full profile | 18.6% |
Dimensions:IQ—Reasoning & logicEQ—Social & emotionalAccuracy—Factual precisionCreativity—Creative outputBased—Direct & uncensored