Model Profile

gpt-4.1-mini-20250414

Name: gpt-4.1-mini-20250414
Rating: 2.1 (158 reviews)
Author: openai

External Benchmark Shadowexternal_benchmark_shadowpublic

4,096 ctx

Use this page to decide where this model is a strong fit. Rankings below are benchmark-backed by use case, with explicit confidence and contributor metrics.

Identity

ID: external/openai/gpt-4-1-mini-20250414

Author: openai

Origin: external_benchmark_shadow

Arch: unknown

Benchmark Coverage

Scored use cases: 12

Avg confidence: 26.8%

Evidence points: 158

Raw rows: 398

Weighted rows: 29

Catalog Metadata

Parameters: unknown

Context window: 4096

Downloads: 0

Intelligence Profile

Dimension Breakdown

IQ12 benchmarks

71.7%

EQ0 benchmarks

No eq benchmarks found

Insufficient data

Accuracy1 benchmark

29.2%*

Creativity0 benchmarks

No creativity benchmarks found

Insufficient data

Based0 benchmarks

No based benchmarks found

Insufficient data

* Low confidence — limited benchmark evidence for this dimension

2/5 dimensions scored · Last updated Apr 21, 2026

Benchmark Signals

Click through to the benchmark source behind this model profile.

Galileo Agent Leaderboard v2

Avg AC

5.7%

Normalized value 87.0% · confidence 100.0%

Strongest impact in Config debugging

galileo_agent_v2.avg_ac · Apr 1, 2026

OpenVLM OCRBench Official

ocrbench_score_pct

3.0%

Normalized value 88.4% · confidence 100.0%

Strongest impact in Socratic tutor

openvlm_ocrbench_official.ocrbench_score_pct · Apr 1, 2026

OpenVLM TextVQA Official

textvqa_score_pct

2.8%

Normalized value 70.2% · confidence 100.0%

Strongest impact in Socratic tutor

openvlm_textvqa_official.textvqa_score_pct · Apr 1, 2026

Vals CorpFin v2

overall_accuracy_pct

2.7%

Normalized value 70.1% · confidence 100.0%

Strongest impact in Thesis red teaming

vals_corp_fin_v2.overall_accuracy_pct · Mar 31, 2026

OpenVLM MTVQA Official

mtvqa_score_pct

2.6%

Normalized value 100.0% · confidence 100.0%

Strongest impact in Socratic tutor

openvlm_mtvqa_official.mtvqa_score_pct · Apr 1, 2026

Galileo Agent Leaderboard v2

Healthcare AC

2.6%

Normalized value 95.5% · confidence 100.0%

Strongest impact in Patient-friendly explanations

galileo_agent_v2.healthcare_ac · Apr 1, 2026

Some fit rows have limited benchmark evidence.

3 of 12 scored use cases have low confidence or thin contributor coverage.

Coverage Diagnostics

actively scored

Use-Case Scores

129

Total Measurements

398

Weighted Measurements

Weighted Sources

Raw Source Coverage

vals_mmlu_pro 60vals_mgsm 48docvqa_leaderboard 34galileo_agent_v2 34corpfin_taxeval_public 28vals_medqa 28

Weighted Source Coverage

galileo_agent_v2 10bigcodebench_official 3vals_corp_fin_v2 3icelandic_llm_leaderboard 1openvlm_chartqa_human_official 1openvlm_mtvqa_official 1

Best Use Cases for This Model

Use Case	Vertical	Score	Confidence	Evidence	Top Contributor
Accounts payable invoice extraction (text) use_case.fin.ap_invoice_extraction	finance	20.8%	27.1%	14	Vals Tax Eval v2: overall_accuracy_pct
Thesis red teaming use_case.fin.thesis_red_team	finance	19.8%	26.1%	14	Vals CorpFin v2: overall_accuracy_pct
Config debugging use_case.sre.config_debugging	devops_sre	19.3%	29.5%	10	Galileo Agent Leaderboard v2: Avg AC
Terraform generation use_case.sre.iac_terraform	devops_sre	19.3%	29.5%	10	Galileo Agent Leaderboard v2: Avg AC
Kubernetes manifest generation use_case.sre.iac_k8s	devops_sre	19.3%	29.5%	10	Galileo Agent Leaderboard v2: Avg AC
Socratic tutor use_case.edu.socratic_tutor	education	18.3%	28.2%	14	OpenVLM OCRBench Official: ocrbench_score_pct
Lesson plan generator use_case.edu.lesson_plan_generator	education	18.3%	28.2%	14	OpenVLM OCRBench Official: ocrbench_score_pct
Job description drafting use_case.hr.job_description_drafting	hr_recruiting	18.0%	27.2%	14	Galileo Agent Leaderboard v2: Avg TSQ
Earnings call synthesis use_case.fin.earnings_call_synthesis	finance	17.9%	23.6%	14	Vals CorpFin v2: overall_accuracy_pct
Transaction anomaly narrative use_case.fin.transaction_anomaly_narrative	finance	17.5%	23.1%	14	Vals CorpFin v2: overall_accuracy_pct
Brand voice localization use_case.mkt.brand_voice_localization	marketing_sales	17.5%	26.4%	14	OpenVLM OCRBench Official: ocrbench_score_pct
Patient-friendly explanations use_case.health.patient_friendly_summaries	healthcare	17.3%	23.7%	16	Galileo Agent Leaderboard v2: Healthcare AC