BasedAGIBasedAGI

Model Profile

gpt-4o-2024-05-13

External Benchmark Shadowexternal_benchmark_shadowpublic
4,096 ctx

Use this page to decide where this model is a strong fit. Rankings below are benchmark-backed by use case, with explicit confidence and contributor metrics.

Identity

ID: external/openai/gpt-4o-2024-05-13

Author: openai

Origin: external_benchmark_shadow

Arch: unknown

Benchmark Coverage

Scored use cases: 12

Avg confidence: 34.0%

Evidence points: 137

Raw rows: 191

Weighted rows: 21

Catalog Metadata

Parameters: unknown

Context window: 4096

Downloads: 0

Intelligence Profile

IQ *59%EQAccuracy *48%Creativity *89%Based *47%

Dimension Breakdown

IQ3 benchmarks
58.9%*
EQ0 benchmarks

No eq benchmarks found

Insufficient data
Accuracy1 benchmark
48.2%*
Creativity2 benchmarks
89.0%*
Based2 benchmarks
46.8%*

* Low confidence — limited benchmark evidence for this dimension

4/5 dimensions scored · Last updated Apr 21, 2026

Benchmark Signals

Click through to the benchmark source behind this model profile.

Coverage Diagnostics

actively scored

Use-Case Scores

113

Total Measurements

191

Weighted Measurements

21

Weighted Sources

10

Raw Source Coverage

repoqa_leaderboard 74ugi_main 57llm_aggrefact_leaderboard 12vals_gpqa 12llm_trustworthy_leaderboard 8icelandic_llm_leaderboard 7

Weighted Source Coverage

llm_trustworthy_leaderboard 5bigcodebench_official 3ugi_main 3aider_code_editing 2llm_aggrefact_leaderboard 2repoqa_leaderboard 2

Best Use Cases for This Model

Use CaseScore
Debugging assistant

use_case.dev.debugging

27.2%
Unit test generation

use_case.dev.test_generation

24.2%
Refactoring assistant

use_case.dev.refactoring

23.6%
Integration test generation

use_case.dev.integration_tests

23.0%
Verilog/VHDL generation

use_case.eda.verilog_generation

22.7%
Code Review Assistant

use_case.dev.code_review_assistant

22.6%
Jailbreak resistance (eval)

use_case.security.jailbreak_resistance_eval

19.4%
Crisis escalation protocol (eval)

use_case.safety.crisis_escalation_protocol

19.4%
Scam and social engineering resistance (eval)

use_case.security.scam_social_engineering_resistance_eval

19.4%
Refusal profile (eval)

use_case.security.refusal_profile_eval

19.4%
Overrefusal (eval)

use_case.security.overrefusal_eval

19.4%
Simulation setup assistant

use_case.eng.simulation_setup_assistant

18.3%