Model Profile

gpt-4o-20241120

Name: gpt-4o-20241120
Rating: 2.4 (190 reviews)
Author: openai

External Benchmark Shadowexternal_benchmark_shadowpublic

4,096 ctx

Use this page to decide where this model is a strong fit. Rankings below are benchmark-backed by use case, with explicit confidence and contributor metrics.

Identity

ID: external/openai/gpt-4o-20241120

Author: openai

Origin: external_benchmark_shadow

Arch: unknown

Benchmark Coverage

Scored use cases: 12

Avg confidence: 34.6%

Evidence points: 190

Raw rows: 289

Weighted rows: 24

Catalog Metadata

Parameters: unknown

Context window: 4096

Downloads: 0

Intelligence Profile

Dimension Breakdown

IQ12 benchmarks

55.8%

EQ0 benchmarks

No eq benchmarks found

Insufficient data

Accuracy1 benchmark

26.2%*

Creativity0 benchmarks

No creativity benchmarks found

Insufficient data

Based0 benchmarks

No based benchmarks found

Insufficient data

* Low confidence — limited benchmark evidence for this dimension

2/5 dimensions scored · Last updated Apr 21, 2026

Benchmark Signals

Click through to the benchmark source behind this model profile.

DuckDB NSQL Leaderboard

all_execution_accuracy

10.1%

Normalized value 96.2% · confidence 100.0%

Strongest impact in Metric definition workshop

duckdb_nsql_leaderboard.all_execution_accuracy · Apr 1, 2026

DuckDB NSQL Leaderboard

hard_execution_accuracy

4.3%

Normalized value 75.0% · confidence 100.0%

Strongest impact in SQL debugging

duckdb_nsql_leaderboard.hard_execution_accuracy · Apr 1, 2026

BIRD-CRITIC

success_rate_open_pct

2.4%

Normalized value 55.6% · confidence 100.0%

Strongest impact in SQL debugging

bird_critic.success_rate_open_pct · Apr 1, 2026

BigCodeBench Official

bigcodebench_complete_pct

2.0%

Normalized value 93.6% · confidence 100.0%

Strongest impact in Verilog/VHDL generation

bigcodebench_official.bigcodebench_complete_pct · Apr 1, 2026

Aider Code Editing Leaderboard

percent_correct_pct

1.9%

Normalized value 80.0% · confidence 100.0%

Strongest impact in Simulation setup assistant

aider_code_editing.percent_correct_pct · Apr 1, 2026

BigCodeBench Official

bigcodebench_instruct_pct

1.6%

Normalized value 93.0% · confidence 100.0%

Strongest impact in Integration test generation

bigcodebench_official.bigcodebench_instruct_pct · Apr 1, 2026

Coverage Diagnostics

actively scored

Use-Case Scores

114

Total Measurements

289

Weighted Measurements

Weighted Sources

Raw Source Coverage

vals_mmlu_pro 60vals_mgsm 48corpfin_taxeval_public 24vals_legal_bench 18vals_lcb 16duckdb_nsql_leaderboard 12

Weighted Source Coverage

bigcodebench_official 3vals_corp_fin_v2 3aider_code_editing 2aider_polyglot 2duckdb_nsql_leaderboard 2mmlongbench_doc_leaderboard 2

Best Use Cases for This Model

Use Case	Vertical	Score	Confidence	Evidence	Top Contributor
SQL debugging use_case.data.sql_debugging	data_analytics	24.4%	44.4%	15	DuckDB NSQL Leaderboard: all_execution_accuracy
Metric definition workshop use_case.data.metric_definition_workshop	data_analytics	23.6%	40.0%	15	DuckDB NSQL Leaderboard: all_execution_accuracy
Integration test generation use_case.dev.integration_tests	developer_tools	22.1%	37.3%	17	BigCodeBench Official: bigcodebench_complete_pct
Verilog/VHDL generation use_case.eda.verilog_generation	engineering	21.4%	39.8%	16	BigCodeBench Official: bigcodebench_complete_pct
Simulation setup assistant use_case.eng.simulation_setup_assistant	engineering	20.1%	36.2%	15	Aider Code Editing Leaderboard: percent_correct_pct
Data quality assistant use_case.data.data_quality_assistant	data_analytics	20.1%	33.6%	15	DuckDB NSQL Leaderboard: all_execution_accuracy
Text-to-SQL analyst assistant use_case.data.text_to_sql	data_analytics	18.8%	35.2%	16	DuckDB NSQL Leaderboard: all_execution_accuracy
Insight mining from text corpora use_case.data.insight_mining	data_analytics	18.3%	29.5%	15	DuckDB NSQL Leaderboard: all_execution_accuracy
Executive brief from metrics use_case.data.exec_brief_from_metrics	data_analytics	18.1%	30.3%	15	DuckDB NSQL Leaderboard: all_execution_accuracy
Unit test generation use_case.dev.test_generation	developer_tools	17.5%	28.8%	17	Aider Code Editing Leaderboard: percent_correct_pct
Refactoring assistant use_case.dev.refactoring	developer_tools	17.4%	31.0%	17	Aider Code Editing Leaderboard: percent_correct_pct
Debugging assistant use_case.dev.debugging	developer_tools	16.6%	29.5%	17	Aider Code Editing Leaderboard: percent_correct_pct