Insight mining from text corpora
Extract themes and actions from large text datasets.
Provisional leader
gpt-4o
Current leader based on limited benchmark evidence. Treat this ranking as directional until coverage improves.
22.8%
Best benchmark score
37.8%
Confidence
All ranked models โ top 3
Ranked Models
30
Evidence Quality
80%
Evidence Points
17
Top Signal
DuckDB NSQL Leaderboard: all_execution_accuracy
All Ranked Models
| Rank | Model | Score |
|---|---|---|
| ๐ฅ | gpt-4o Strong on DuckDB NSQL Leaderboard all_execution_accuracy and JSONSchemaBench Leaderboard medium_schema_compliance_pct | 22.8% |
| ๐ฅ | qwen-2.5-72b-instruct Strong on DuckDB NSQL Leaderboard all_execution_accuracy and JSONSchemaBench Leaderboard medium_schema_compliance_pct | 21.8% |
| ๐ฅ | deepseek-r1 Strong on DuckDB NSQL Leaderboard all_execution_accuracy and LiveSQLBench success_rate_pct | 20.5% |
| #4 | gpt-5-2025-08-07 Strong on LiveSQLBench success_rate_pct and Spider2.0 Snow Text-to-SQL snow_text_to_sql_score_pct | 19.0% |
| #6 | gpt-4o-20241120 Strong on DuckDB NSQL Leaderboard all_execution_accuracy and DuckDB NSQL Leaderboard hard_execution_accuracy | 18.3% |
| #8 | o3-20250416 Strong on LiveSQLBench success_rate_pct and Spider2.0 Snow Text-to-SQL snow_text_to_sql_score_pct | 18.0% |
| #9 | claude-sonnet-4 Strong on LiveSQLBench success_rate_pct and Galileo Agent Leaderboard v2 Avg AC | 17.1% |
| #10 | Claude-3.5-Sonnet Strong on DuckDB NSQL Leaderboard all_execution_accuracy and LLM-AggreFact Leaderboard average_score_pct | 17.1% |
| #12 | gpt-4.1 Strong on DuckDB NSQL Leaderboard all_execution_accuracy and SciArena Leaderboard rating_elo | 14.9% |
| #14 | gpt-4o-mini-2024-07-18 Strong on DuckDB NSQL Leaderboard all_execution_accuracy and DuckDB NSQL Leaderboard hard_execution_accuracy | 13.1% |
| #16 | gpt-4o-2024-08-06 Strong on DuckDB NSQL Leaderboard all_execution_accuracy and DuckDB NSQL Leaderboard hard_execution_accuracy | 12.5% |
| #20 | gemini-2.5-pro Strong on Galileo Agent Leaderboard v2 Avg AC and Galileo Agent Leaderboard v2 Avg TSQ | 11.6% |
| #21 | o4-mini Strong on LiveSQLBench success_rate_pct and SciArena Leaderboard rating_elo | 11.3% |
| #22 | gpt-4.1-20250414 Strong on MMLongBench-Doc Leaderboard acc_score_pct and Galileo Agent Leaderboard v2 Avg AC | 11.3% |
| #23 | gemini-3-pro-preview Strong on BFCL Multi-turn Official Multi Turn Acc and SciArena Leaderboard rating_elo | 11.2% |
| #25 | gpt-5-mini-2025-08-07 Strong on SciArena Leaderboard rating_elo and Vals MedQA overall_accuracy_pct | 10.6% |
| #26 | Grok-4-0709 Strong on Galileo Agent Leaderboard v2 Avg TSQ and Galileo Agent Leaderboard v2 Avg AC | 10.6% |
| #27 | gemini-3.1-pro-preview Strong on Vals LiveCodeBench overall_accuracy_pct and Vals SWE-bench overall_accuracy_pct | 10.5% |
| #33 | gemini-2.0-flash-001 Strong on DuckDB NSQL Leaderboard all_execution_accuracy and DuckDB NSQL Leaderboard hard_execution_accuracy | 10.3% |
| #36 | Llama-3.3-70B-Instruct Strong on DuckDB NSQL Leaderboard all_execution_accuracy and DuckDB NSQL Leaderboard hard_execution_accuracy | 10.0% |
| #39 | gpt-5.2-2025-12-11 Strong on BFCL Multi-turn Official Multi Turn Acc and FACTS Benchmark Suite facts_grounding_score_pct | 9.6% |
| #41 | gemini-2.5-flash Strong on Galileo Agent Leaderboard v2 Avg TSQ and Galileo Agent Leaderboard v2 Avg AC | 9.1% |
| #43 | Qwen3-30B-A3B Strong on DuckDB NSQL Leaderboard all_execution_accuracy and DuckDB NSQL Leaderboard hard_execution_accuracy | 9.1% |
| #44 | gemma-2-27b-it Strong on DuckDB NSQL Leaderboard all_execution_accuracy and DuckDB NSQL Leaderboard hard_execution_accuracy | 9.0% |
| #45 | phi-4 Strong on DuckDB NSQL Leaderboard all_execution_accuracy and Vectara HHEM Leaderboard overall_hallucination_error_pct | 8.8% |
| #46 | Qwen3-32B Strong on DuckDB NSQL Leaderboard all_execution_accuracy and DuckDB NSQL Leaderboard hard_execution_accuracy | 8.6% |
| #48 | gemini-3-flash-preview Strong on Vals Legal Bench overall_accuracy_pct and Vals MedQA overall_accuracy_pct | 8.5% |
| #49 | Qwen2.5-Coder-7B Strong on DuckDB NSQL Leaderboard all_execution_accuracy and DuckDB NSQL Leaderboard hard_execution_accuracy | 8.4% |
| #50 | deepseek-v3 Strong on LiveSQLBench success_rate_pct and Galileo Agent Leaderboard v2 Avg AC | 8.2% |
| #52 | minimax-m2.1 Strong on LiveSQLBench success_rate_pct and Vals SWE-bench overall_accuracy_pct | 8.1% |
Compare Models
โถRanking diagnostics & missing models
Source lift
Ranked
46
Sources
8
Quality
Low
DuckDB NSQL Leaderboard
Vals Legal Bench
Vals MedQA
Vals LiveCodeBench
Missing frontier models
claude-sonnet-4.6
Thin evidence after weightingRank #11
20.0%
โถTaxonomy & task details
Core tasks
Required modes
Domains
Related in Data & Analytics
SQL debugging
Diagnose and fix SQL queries for correctness and performance.
Metric definition workshop
Turn ambiguous KPI definitions into precise, measurable specs.
Dashboard narratives
Generate weekly KPI narratives and investigation suggestions.
Chart & Data Visualization Interpretation
Reading charts, graphs, and dashboards to extract insights and answer questions.