Model Profile

deepseek-v3

Name: deepseek-v3
Rating: 1.5 (124 reviews)
Author: deepseek-ai

External Benchmark Shadowexternal_benchmark_shadowpublic

4,096 ctx

Use this page to decide where this model is a strong fit. Rankings below are benchmark-backed by use case, with explicit confidence and contributor metrics.

Identity

ID: external/deepseek-ai/deepseek-v3

Author: deepseek-ai

Origin: external_benchmark_shadow

Arch: unknown

Benchmark Coverage

Scored use cases: 12

Avg confidence: 20.5%

Evidence points: 124

Raw rows: 124

Weighted rows: 34

Catalog Metadata

Parameters: unknown

Context window: 4096

Downloads: 0

Intelligence Profile

Dimension Breakdown

IQ3 benchmarks

41.8%*

EQ0 benchmarks

No eq benchmarks found

Insufficient data

Accuracy1 benchmark

43.4%*

Creativity0 benchmarks

No creativity benchmarks found

Insufficient data

Based2 benchmarks

72.1%*

* Low confidence — limited benchmark evidence for this dimension

3/5 dimensions scored · Last updated Apr 21, 2026

Benchmark Signals

Click through to the benchmark source behind this model profile.

Vectara HHEM Leaderboard

overall_hallucination_error_pct

3.2%

Normalized value 80.2% · confidence 100.0%

Strongest impact in Knowledge base Q&A (fast, no citations)

vectara_hhem_leaderboard.overall_hallucination_error_pct · Apr 1, 2026

LiveSQLBench

success_rate_pct

2.3%

Normalized value 53.5% · confidence 100.0%

Strongest impact in Text-to-SQL analyst assistant

livesqlbench.success_rate_pct · Apr 1, 2026

BigCodeBench Official

bigcodebench_complete_pct

2.2%

Normalized value 99.6% · confidence 100.0%

Strongest impact in Verilog/VHDL generation

bigcodebench_official.bigcodebench_complete_pct · Apr 1, 2026

Galileo Agent Leaderboard v2

Avg AC

2.0%

Normalized value 52.2% · confidence 100.0%

Strongest impact in Runbook step assistant

galileo_agent_v2.avg_ac · Apr 1, 2026

SYCON Bench (Table 2)

sycon_unethical_tof_pct

1.8%

Normalized value 63.5% · confidence 100.0%

Strongest impact in Knowledge base Q&A (with citations)

sycon_bench_paper.sycon_unethical_tof_pct · Apr 1, 2026

Vectara HHEM Leaderboard

overall_answer_rate_pct

1.7%

Normalized value 93.3% · confidence 100.0%

Strongest impact in Runbook step assistant

vectara_hhem_leaderboard.overall_answer_rate_pct · Apr 1, 2026

Some fit rows have limited benchmark evidence.

11 of 12 scored use cases have low confidence or thin contributor coverage.

Coverage Diagnostics

actively scored

Use-Case Scores

135

Total Measurements

124

Weighted Measurements

Weighted Sources

Raw Source Coverage

galileo_agent_v2 34vectara_hhem_leaderboard 21mmlu_pro_leaderboard 15artifactsbenchmark_leaderboard 11baxbench_leaderboard 9bigcodebench_official 8

Weighted Source Coverage

vectara_hhem_leaderboard 12galileo_agent_v2 10bigcodebench_official 3baxbench_leaderboard 1bird_critic 1icelandic_llm_leaderboard 1

Best Use Cases for This Model

Use Case	Vertical	Score	Confidence	Evidence	Top Contributor
Simulation setup assistant use_case.eng.simulation_setup_assistant	engineering	14.6%	19.6%	10	BigCodeBench Official: bigcodebench_complete_pct
Component selection assistant use_case.eng.component_selection	engineering	14.4%	20.4%	11	Vectara HHEM Leaderboard: overall_hallucination_error_pct
Verilog/VHDL generation use_case.eda.verilog_generation	engineering	13.8%	16.8%	10	BigCodeBench Official: bigcodebench_complete_pct
Integration test generation use_case.dev.integration_tests	developer_tools	13.7%	17.0%	10	BigCodeBench Official: bigcodebench_complete_pct
Litigation risk memo use_case.ins.litigation_risk_memo	insurance	13.4%	22.3%	10	Vectara HHEM Leaderboard: overall_hallucination_error_pct
Thesis red teaming use_case.fin.thesis_red_team	finance	13.2%	22.6%	13	Vectara HHEM Leaderboard: overall_hallucination_error_pct
Text-to-SQL analyst assistant use_case.data.text_to_sql	data_analytics	12.9%	30.1%	12	LiveSQLBench: success_rate_pct
Contract Q&A (RAG grounded) use_case.legal.contract_qna	legal	12.6%	19.3%	10	Vectara HHEM Leaderboard: overall_hallucination_error_pct
Knowledge base Q&A (fast, no citations) use_case.business.kb_qna_fast	business_productivity	12.4%	19.6%	10	Vectara HHEM Leaderboard: overall_hallucination_error_pct
Regulatory summary use_case.legal.regulatory_summary	legal	12.3%	18.9%	10	Vectara HHEM Leaderboard: overall_hallucination_error_pct
Runbook step assistant use_case.sre.runbook_steps	devops_sre	12.2%	20.0%	8	Vectara HHEM Leaderboard: overall_hallucination_error_pct
Knowledge base Q&A (with citations) use_case.business.kb_qna_with_citations	business_productivity	12.0%	19.0%	10	Vectara HHEM Leaderboard: overall_hallucination_error_pct