Find the right model for any task
100+ use cases ranked by benchmark evidence โ not opinions. Filter by domain, drill into any use case to see which model ranks first and why.
Featured Use Cases
15 curatedEarnings call synthesis
Summarize earnings calls into key points, tone, and risks.
NPC dialogue
Low-latency in-character dialogue suitable for games.
Adult ERP roleplay (explicit)
Explicit adult roleplay with boundary adherence and persona memory.
Log triage
Interpret logs and propose safe diagnostic steps.
Knowledge base Q&A (with citations)
Answer questions grounded in an internal KB, with evidence.
Document summarization
Summarize long business documents into scannable outputs.
Contract term extraction
Extract key terms into structured fields with clause references.
Support bot (RAG grounded)
Support chatbot grounded in docs with optional citations and escalation.
Code generation
Generate correct, secure code from requirements.
Ad copy variants
Generate diverse headline/CTA variants under strict constraints.
Long-form story co-author
Generate and refine long-form fiction with continuity.
Debugging assistant
Localize bugs and propose fixes with explanations.
Prompt injection resistance (eval)
Measure resistance to prompt injection in RAG and tool settings.
Text-to-SQL analyst assistant
Convert questions into SQL and explain the query.
Clinical note drafting
Summarize encounters into structured notes for clinician review.
All Use Cases
100 indexed ยท 19 high-confidence| Use Case | Best Score |
|---|---|
| Thesis red teaming Stress-test an investment thesis with counterarguments and risk. | 48.9% |
| Earnings call synthesis Summarize earnings calls into key points, tone, and risks. | 44.1% |
| Transaction anomaly narrative Summarize anomalies into hypotheses, evidence, and follow-up actions. | 43.3% |
| Casual chat companion Engaging conversation with consistent tone and context. | 50.1% |
| Life coaching and goal planning Goal setting, habit planning, and accountability check-ins. | 50.1% |
| Tarot-style reading Symbolic, personalized readings with consistent persona. | 50.1% |
| Mindfulness and meditation scripts Generate calming scripts and exercises tailored to a user's context. | 48.5% |
| Empathetic support chat Supportive conversation with strong boundaries and safe escalation. | 48.7% |
| Accounts payable invoice extraction (text) Extract structured fields from invoices/receipts for AP workflows. | 38.8% |
| AML alert triage Triage AML alerts into severity, rationale, and next actions. | 41.5% |
| KYC profile synthesis Turn identity docs and notes into a structured KYC profile. | 41.5% |
| Filings summarization (10-K/10-Q) Summarize filings with conservative factuality and risk highlights. | 38.5% |
| Adult erotica (long-form, explicit) Long-form explicit erotica with controllable style and strict boundaries. | 43.6% |
| Component selection assistant Recommend components under constraints with evidence and tradeoffs. | 35.5% |
| Quant research code generation Generate backtest or analysis code from trading hypotheses. | 34.2% |
| Interactive fiction / DM Run interactive fiction with state tracking and user agency. | 44.2% |
| NPC dialogue Low-latency in-character dialogue suitable for games. | 44.2% |
| SFW roleplay and simulation Roleplay/simulations for learning or entertainment with state tracking. | 45.9% |
| Adult ERP roleplay (explicit) Explicit adult roleplay with boundary adherence and persona memory. | 49.5% |
| Cross-paper contradiction analysis Identify contradictions and uncertainty across papers with citations. | Provisional |
| Literature synthesis with citations Synthesize papers and guidelines with citations and uncertainty. | Provisional |
| Knowledge base Q&A (fast, no citations) Answer KB questions grounded in retrieved text without citations. | Provisional |
| Runbook step assistant Suggest safe runbook steps and escalation points grounded in docs. | Provisional |
| Contract Drafting & Redlining Drafting, reviewing, and suggesting edits to legal contracts and agreements. | Provisional |
| Litigation risk memo Summarize a claim into litigation risk drivers and mitigation steps. | Provisional |
| Simulation setup assistant Turn design requirements into simulation setup checklists and boundary notes. | Provisional |
| Log triage Interpret logs and propose safe diagnostic steps. | Provisional |
| Contract Q&A (RAG grounded) Answer contract questions grounded in the actual contract text. | Provisional |
| Knowledge base Q&A (with citations) Answer questions grounded in an internal KB, with evidence. | Provisional |
| Regulatory summary Summarize and compare regulatory text with conservative interpretation. | Provisional |
| HR policy Q&A Answer HR policy questions grounded in authoritative text. | Provisional |
| Disinformation and manipulation resistance (eval) Measure refusal and safe handling of deceptive content generation requests. | Provisional |
| Document summarization Summarize long business documents into scannable outputs. | Provisional |
| Political risk brief Summarize key developments into risks, scenarios, and actions. | Provisional |
| Agent-assist reply suggestions Draft replies for human agents with tone and policy constraints. | Provisional |
| Decision memo Recommend a decision with options, constraints, and risks. | Provisional |
| Executive briefing Turn raw notes into a short executive brief with risks and actions. | Provisional |
| Search query rewriting Rewrite queries into higher-recall search queries and filters. | Provisional |
| Contract redline summary Summarize material changes between contract versions with clause refs. | Provisional |
| Support dialogue agent Multi-turn support conversations with escalation and policy awareness. | Provisional |
| Clause playbook check Check extracted terms against a playbook and flag deviations. | Provisional |
| Contract term extraction Extract key terms into structured fields with clause references. | Provisional |
| Support bot (RAG grounded) Support chatbot grounded in docs with optional citations and escalation. | Provisional |
| SQL debugging Diagnose and fix SQL queries for correctness and performance. | Provisional |
| Policy wording comparison Compare policy wording against a standard and flag material differences. | Provisional |
| Operator support chat Real-time operator assistant with grounded troubleshooting and escalation. | Provisional |
| Maintenance RCA memo Turn logs and notes into a maintenance root cause analysis. | Provisional |
| Manuals Q&A (RAG grounded) Answer operator questions grounded in technical manuals and runbooks. | Provisional |
| Social listening brief Summarize social chatter into themes, risks, and recommendations. | Provisional |
| Codebase onboarding brief Summarize a repository's architecture, modules, and conventions. | Provisional |
| Patient education bot (RAG grounded) Answer patient FAQ using trusted sources with cautious wording. | Provisional |
| Disruption monitoring brief Summarize disruptions into risk, options, and recommendations. | Provisional |
| Supplier risk monitoring Track supplier risk signals from multi-source text and summarize actions. | Provisional |
| Narrative tracking Track narratives across multi-lingual sources and flag contradictions. | Provisional |
| Campaign brief Draft a campaign brief with positioning, audience, and channels. | Provisional |
| Product positioning and messaging Develop positioning, value props, and message pillars with tradeoffs. | Provisional |
| Social post generation Generate short channel-specific social posts and variations. | Provisional |
| Landing page copy Draft landing pages with clear positioning and structure. | Provisional |
| Autonomous Coding Agent End-to-end autonomous software engineering: reading issues, writing code, running tests, submitting PRs. | Provisional |
| Language conversation partner Conversational practice with gentle corrections and explanations. | Provisional |
| Medical coding support (suggestions) Extract coding-relevant facts and suggest codes for human review. | Provisional |
| Poetry and lyrics Generate poems and lyrics with style control and variation. | Provisional |
| Screenplay scene writing Write screenplay scenes with formatting, pacing, and strong dialogue. | Provisional |
| Code generation Generate correct, secure code from requirements. | Provisional |
| CAD scripting helper Generate and debug CAD automation scripts and parametric geometry code. | Provisional |
| PR crisis response draft Draft a conservative public statement and internal talking points. | Provisional |
| Customer feedback theme mining Extract themes and trends from reviews, tickets, and surveys. | Provisional |
| Title document search assistant (RAG grounded) Navigate and answer questions across a corpus of property documents. | Provisional |
| Config debugging Diagnose and patch YAML/JSON/TOML configs with minimal diffs. | Provisional |
| Kubernetes manifest generation Generate K8s manifests with safe defaults and probes. | Provisional |
| Terraform generation Generate Terraform IaC with correct resources and safe defaults. | Provisional |
| Metric definition workshop Turn ambiguous KPI definitions into precise, measurable specs. | Provisional |
| Archaic and historical translation Translate older or domain-specific language into modern equivalents. | Provisional |
| Refactoring assistant Refactor code safely while preserving behavior and improving clarity. | Provisional |
| Ad copy variants Generate diverse headline/CTA variants under strict constraints. | Provisional |
| Personalized sales outreach Draft outbound emails/DMs personalized to a prospect persona. | Provisional |
| Dashboard narratives Generate weekly KPI narratives and investigation suggestions. | Provisional |
| Grammar and writing coach Correct grammar and explain fixes at the learner's level. | Provisional |
| Security incident triage Triage security incidents from alerts/logs into impact and next steps. | Provisional |
| Support FAQ bot Answer common support questions with safe troubleshooting steps. | Provisional |
| IDE code completion Fast local-context code completion and small snippet generation. | Provisional |
| Vendor contract summary (procurement) Summarize vendor contracts into key terms, risks, and deviations. | Provisional |
| Long-form story co-author Generate and refine long-form fiction with continuity. | Provisional |
| Verilog/VHDL generation Generate RTL code and testbenches from functional specs. | Provisional |
| Fraud signal summary Summarize potential fraud indicators with conservative evidence framing. | Provisional |
| Malware analysis report (defensive) Explain suspicious code and produce a defensive analysis report. | Provisional |
| PR review agent Review diffs for correctness, security, and maintainability. | Provisional |
| Crisis escalation protocol (eval) Measure safe crisis escalation behavior under the selected policy. | Provisional |
| Jailbreak resistance (eval) Measure robustness to adversarial prompts that attempt to bypass policy. | Provisional |
| Overrefusal (eval) Measure how often benign requests are incorrectly refused. | Provisional |
| Refusal profile (eval) Measure refusal/overrefusal rates across predefined categories. | Provisional |
| Scam and social engineering resistance (eval) Measure refusal and safe handling of deception/scam requests. | Provisional |
| Agentic bug fixing Agentic loop that reproduces, fixes, and validates with tests. | Provisional |
| Debugging assistant Localize bugs and propose fixes with explanations. | Provisional |
| Cross-lingual summary Summarize a document in one language into another language. | Provisional |
| Prompt injection resistance (eval) Measure resistance to prompt injection in RAG and tool settings. | Provisional |
| Spam filtering and classification Detect spam and low-quality messages for routing and moderation. | Provisional |
| Toxicity moderation routing Classify abusive content for moderation and escalation. | Provisional |
| Legal translation Translate legal text with terminology consistency and format safety. | Provisional |
| Agentic incident response Agentic tool-using workflow for incident triage and remediation planning. | Provisional |