Chart & Data Visualization Interpretation
Reading charts, graphs, and dashboards to extract insights and answer questions.
Provisional leader
gpt-5-2025-08-07
Current leader based on limited benchmark evidence. Treat this ranking as directional until coverage improves.
23.2%
Best benchmark score
31.2%
Confidence
All ranked models โ top 3
Ranked Models
30
Evidence Quality
81%
Evidence Points
29
Top Signal
SciArena Leaderboard: rating_elo
All Ranked Models
| Rank | Model | Score |
|---|---|---|
| ๐ฅ | gpt-5-2025-08-07 Strong on SciArena Leaderboard rating_elo and FACTS Benchmark Suite facts_grounding_score_pct | 23.2% |
| ๐ฅ | gemini-3.1-pro-preview Strong on Vals Finance Agent overall_accuracy_pct and FACTS Benchmark Suite facts_search_score_pct | 22.2% |
| ๐ฅ | gemini-2.5-pro Strong on MWS Vision Bench validation_overall_score and FACTS Benchmark Suite facts_grounding_score_pct | 22.1% |
| #4 | gpt-5-mini-2025-08-07 Strong on MWS Vision Bench validation_overall_score and SciArena Leaderboard rating_elo | 20.5% |
| #5 | gemini-3-flash-preview Strong on MWS Vision Bench validation_overall_score and Vals CorpFin v2 overall_accuracy_pct | 19.7% |
| #6 | o3-20250416 Strong on SciArena Leaderboard rating_elo and LiveSQLBench success_rate_pct | 18.4% |
| #7 | gemini-3-pro-preview Strong on SciArena Leaderboard rating_elo and Vals Finance Agent overall_accuracy_pct | 17.7% |
| #8 | claude-sonnet-4 Strong on LiveSQLBench success_rate_pct and Vectara HHEM Leaderboard overall_hallucination_error_pct | 17.4% |
| #9 | Grok-4-0709 Strong on Vals CorpFin v2 overall_accuracy_pct and Vals Finance Agent overall_accuracy_pct | 16.6% |
| #10 | gpt-4.1-20250414 Strong on Vectara HHEM Leaderboard overall_hallucination_error_pct and Vals CorpFin v2 overall_accuracy_pct | 16.4% |
| #11 | gpt-5.2-2025-12-11 Strong on FACTS Benchmark Suite facts_grounding_score_pct and Vals CorpFin v2 overall_accuracy_pct | 16.4% |
| #12 | gemini-3.1-flash-lite-preview Strong on FACTS Benchmark Suite facts_grounding_score_pct and Vectara HHEM Leaderboard overall_hallucination_error_pct | 15.6% |
| #13 | claude-sonnet-4.6 Strong on Vals Finance Agent overall_accuracy_pct and Vals CorpFin v2 overall_accuracy_pct | 15.4% |
| #14 | gemini-2.5-flash Strong on MWS Vision Bench validation_overall_score and FACTS Benchmark Suite facts_grounding_score_pct | 15.1% |
| #15 | gpt-5.4-2026-03-05 Strong on Vectara HHEM Leaderboard overall_hallucination_error_pct and Vals CorpFin v2 overall_accuracy_pct | 14.5% |
| #16 | claude-opus-4-5-20251101 Strong on FACTS Benchmark Suite facts_grounding_score_pct and Vals CorpFin v2 overall_accuracy_pct | 14.2% |
| #17 | deepseek-r1 Strong on DuckDB NSQL Leaderboard all_execution_accuracy and LiveSQLBench success_rate_pct | 14.1% |
| #18 | gpt-4o Strong on DuckDB NSQL Leaderboard all_execution_accuracy and MEGA-Bench overall_score | 13.5% |
| #19 | gpt-5.1-2025-11-13 Strong on Vals Finance Agent overall_accuracy_pct and Vals CorpFin v2 overall_accuracy_pct | 13.1% |
| #20 | grok-4-fast-reasoning Strong on Vals CorpFin v2 overall_accuracy_pct and Vals Finance Agent overall_accuracy_pct | 13.0% |
| #21 | qwen-2.5-72b-instruct Strong on DuckDB NSQL Leaderboard all_execution_accuracy and JSONSchemaBench Leaderboard medium_schema_compliance_pct | 12.7% |
| #22 | Claude-3.5-Sonnet Strong on DuckDB NSQL Leaderboard all_execution_accuracy and LLM-AggreFact Leaderboard average_score_pct | 12.5% |
| #23 | grok-4-1-fast-reasoning Strong on Vals CorpFin v2 overall_accuracy_pct and Vals Finance Agent overall_accuracy_pct | 11.7% |
| #25 | gpt-4o-20241120 Strong on DuckDB NSQL Leaderboard all_execution_accuracy and DuckDB NSQL Leaderboard hard_execution_accuracy | 11.4% |
| #26 | o4-mini Strong on LiveSQLBench success_rate_pct and Vals CorpFin v2 overall_accuracy_pct | 11.2% |
| #28 | claude-opus-4-6-thinking Strong on Vals CorpFin v2 overall_accuracy_pct and Vals Finance Agent overall_accuracy_pct | 10.8% |
| #29 | grok-3 Strong on Vectara HHEM Leaderboard overall_hallucination_error_pct and Vals CorpFin v2 overall_accuracy_pct | 10.5% |
| #30 | gpt-4o-2024-08-06 Strong on DuckDB NSQL Leaderboard all_execution_accuracy and Vectara HHEM Leaderboard overall_hallucination_error_pct | 10.3% |
| #31 | kimi-k2.5-thinking Strong on Vals CorpFin v2 overall_accuracy_pct and Vals Finance Agent overall_accuracy_pct | 10.3% |
| #32 | gpt-4.1 Strong on DuckDB NSQL Leaderboard all_execution_accuracy and SciArena Leaderboard rating_elo | 10.3% |
Compare Models
โถRanking diagnostics & missing models
Source lift
Ranked
56
Sources
8
Quality
Low
Vals CorpFin v2
Vals MedQA
Vals Tax Eval v2
Vals Finance Agent
Missing frontier models
No obvious gaps right now.
โถTaxonomy & task details
Core tasks
Required modes
Domains
Related in Data & Analytics
SQL debugging
Diagnose and fix SQL queries for correctness and performance.
Metric definition workshop
Turn ambiguous KPI definitions into precise, measurable specs.
Dashboard narratives
Generate weekly KPI narratives and investigation suggestions.
Text-to-SQL analyst assistant
Convert questions into SQL and explain the query.