Model Profile

Phi-3-mini-128k-instruct

Name: Phi-3-mini-128k-instruct
Rating: 0.9 (27 reviews)
Author: microsoft

4,096 ctxOpen weights

Use this page to decide where this model is a strong fit. Rankings below are benchmark-backed by use case, with explicit confidence and contributor metrics.

Identity

ID: microsoft/Phi-3-mini-128k-instruct

Author: microsoft

Origin: huggingface_catalog

Arch: unknown

Benchmark Coverage

Scored use cases: 9

Avg confidence: 14.1%

Evidence points: 27

Raw rows: 86

Weighted rows: 4

Catalog Metadata

Parameters: unknown

Context window: 4096

Downloads: 92,237

Intelligence Profile

Dimension Breakdown

IQ6 benchmarks

40.6%*

EQ0 benchmarks

No eq benchmarks found

Insufficient data

Accuracy2 benchmarks

66.4%*

Creativity0 benchmarks

No creativity benchmarks found

Insufficient data

Based0 benchmarks

No based benchmarks found

Insufficient data

* Low confidence — limited benchmark evidence for this dimension

2/5 dimensions scored · Last updated Apr 2, 2026

Benchmark Signals

Click through to the benchmark source behind this model profile.

DuckDB NSQL Leaderboard

all_execution_accuracy

5.6%

Normalized value 53.8% · confidence 100.0%

Strongest impact in Metric definition workshop

duckdb_nsql_leaderboard.all_execution_accuracy · Apr 1, 2026

RepoQA Official Results

overall_average_pass_at_1_pct

1.8%

Normalized value 39.6% · confidence 100.0%

Strongest impact in Debugging assistant

repoqa_leaderboard.overall_average_pass_at_1_pct · Apr 1, 2026

DuckDB NSQL Leaderboard

hard_execution_accuracy

1.4%

Normalized value 25.1% · confidence 100.0%

Strongest impact in SQL debugging

duckdb_nsql_leaderboard.hard_execution_accuracy · Apr 1, 2026

RepoQA Official Results

all_average_pass_at_1_pct

0.7%

Normalized value 39.6% · confidence 100.0%

Strongest impact in Unit test generation

repoqa_leaderboard.all_average_pass_at_1_pct · Apr 1, 2026

Some fit rows have limited benchmark evidence.

9 of 9 scored use cases have low confidence or thin contributor coverage.

Coverage Diagnostics

actively scored

Use-Case Scores

Total Measurements

Weighted Measurements

Weighted Sources

Raw Source Coverage

repoqa_leaderboard 74duckdb_nsql_leaderboard 12

Weighted Source Coverage

duckdb_nsql_leaderboard 2repoqa_leaderboard 2

Best Use Cases for This Model

Use Case	Vertical	Score	Confidence	Evidence	Top Contributor
Metric definition workshop use_case.data.metric_definition_workshop	data_analytics	8.6%	19.0%	3	DuckDB NSQL Leaderboard: all_execution_accuracy
SQL debugging use_case.data.sql_debugging	data_analytics	7.3%	17.6%	3	DuckDB NSQL Leaderboard: all_execution_accuracy
Data quality assistant use_case.data.data_quality_assistant	data_analytics	6.8%	14.9%	3	DuckDB NSQL Leaderboard: all_execution_accuracy
Executive brief from metrics use_case.data.exec_brief_from_metrics	data_analytics	6.1%	13.4%	3	DuckDB NSQL Leaderboard: all_execution_accuracy
Insight mining from text corpora use_case.data.insight_mining	data_analytics	6.0%	12.9%	3	DuckDB NSQL Leaderboard: all_execution_accuracy
Text-to-SQL analyst assistant use_case.data.text_to_sql	data_analytics	5.9%	13.9%	3	DuckDB NSQL Leaderboard: all_execution_accuracy
Debugging assistant use_case.dev.debugging	developer_tools	5.4%	13.5%	3	RepoQA Official Results: overall_average_pass_at_1_pct
Unit test generation use_case.dev.test_generation	developer_tools	4.8%	12.0%	3	RepoQA Official Results: overall_average_pass_at_1_pct
Code Review Assistant use_case.dev.code_review_assistant	developer_tools	4.1%	10.1%	3	RepoQA Official Results: overall_average_pass_at_1_pct