BasedAGIBasedAGI

Model Profile

openai/gpt-4o-mini-2024-07-18

External Benchmark Shadowexternal_benchmark_shadowpublic
4,096 ctx

Use this page to decide where this model is a strong fit. Rankings below are benchmark-backed by use case, with explicit confidence and contributor metrics.

Identity

ID: external/openai/gpt-4o-mini-2024-07-18

Author: openai

Origin: external_benchmark_shadow

Arch: unknown

Benchmark Coverage

Scored use cases: 12

Avg confidence: 30.6%

Evidence points: 153

Raw rows: 328

Weighted rows: 24

Catalog Metadata

Parameters: unknown

Context window: 4096

Downloads: 0

Intelligence Profile

IQ43%EQAccuracyCreativityBased *58%

Dimension Breakdown

IQ14 benchmarks
43.4%
EQ0 benchmarks

No eq benchmarks found

Insufficient data
Accuracy0 benchmarks

No accuracy benchmarks found

Insufficient data
Creativity0 benchmarks

No creativity benchmarks found

Insufficient data
Based1 benchmark
57.9%*

* Low confidence — limited benchmark evidence for this dimension

2/5 dimensions scored · Last updated Apr 21, 2026

Benchmark Signals

Click through to the benchmark source behind this model profile.

Some fit rows have limited benchmark evidence.

4 of 12 scored use cases have low confidence or thin contributor coverage.

Coverage Diagnostics

actively scored

Use-Case Scores

141

Total Measurements

328

Weighted Measurements

24

Weighted Sources

14

Raw Source Coverage

vals_mmlu_pro 60vals_mgsm 48corpfin_taxeval_public 28vals_medqa 28vals_legal_bench 18vals_corp_fin_v2 16

Weighted Source Coverage

llm_trustworthy_leaderboard 5bigcodebench_official 3vals_corp_fin_v2 3duckdb_nsql_leaderboard 2gaia_results_public 2icelandic_llm_leaderboard 1

Best Use Cases for This Model

Use CaseScore
Jailbreak resistance (eval)

use_case.security.jailbreak_resistance_eval

20.4%
Refusal profile (eval)

use_case.security.refusal_profile_eval

20.4%
Overrefusal (eval)

use_case.security.overrefusal_eval

20.4%
Scam and social engineering resistance (eval)

use_case.security.scam_social_engineering_resistance_eval

20.4%
Crisis escalation protocol (eval)

use_case.safety.crisis_escalation_protocol

20.4%
Metric definition workshop

use_case.data.metric_definition_workshop

16.9%
Data quality assistant

use_case.data.data_quality_assistant

15.7%
SQL debugging

use_case.data.sql_debugging

14.7%
Simulation setup assistant

use_case.eng.simulation_setup_assistant

14.5%
Executive brief from metrics

use_case.data.exec_brief_from_metrics

14.1%
Insight mining from text corpora

use_case.data.insight_mining

13.1%
Verilog/VHDL generation

use_case.eda.verilog_generation

12.9%