BasedAGIBasedAGI
Use Case ReportLive data

Best LLMs for Email Writing

Email writing is the most ubiquitous LLM use case in enterprise settings — and one of the most poorly understood. Everyone has tried it. Most people have a rough sense of which models produce emails that feel right versus ones that feel off. But the intuitions are rarely backed by systematic evidence.

"Feels right" for email is actually doing a lot of work. It encompasses: tone calibration to context and audience, appropriate register (formal vs. informal, urgent vs. routine), avoiding AI tells (the telltale phrases that signal the email was written by a language model), following instructions precisely (word count, bullet vs. prose, include/exclude certain information), and producing emails that actually achieve the communicative goal.

These are not trivial capabilities — and different models have different strengths across them.

What Makes an Email "Good"

Tone calibration — The right tone for a cold outreach to a C-suite executive is different from an internal status update to your team, which is different from a follow-up to a prospect who went cold. Good email writing models shift register accurately based on context cues.

Instruction following — "Keep it under 150 words, lead with the ask, don't mention the previous meeting" is a non-trivial set of constraints to satisfy simultaneously. Models that follow multi-constraint instructions reliably produce much more usable output than those that partially satisfy or ignore constraints.

Avoiding AI tells — Certain phrases flag an email as AI-written to a trained eye: "I hope this email finds you well," "I wanted to reach out," "As per our previous conversation," "Please don't hesitate to reach out." Models that default to these patterns regardless of instruction produce emails that undermine the sender's credibility.

Appropriate length and structure — Most professional emails should be shorter than the model's natural instinct. Models that err toward completeness produce emails that nobody reads.

Achieving the goal — An email is a communicative act with a purpose: get a meeting, deliver information, request something, apologize, confirm. The best email writing models understand the purpose and structure the message to achieve it.

EQ dimension scores are the strongest predictor of email writing quality among the BGI dimensions. The ability to calibrate tone, read audience expectations, and produce appropriate emotional register in written communication correlates directly with what makes emails work.

Rankings

Business email drafting

business productivity

Limited dataTop 15 · Live
#ModelScore
1gemini-2.5-pro

external/google/gemini-2-5-pro

24.4
2google/gemini-3.1-pro-preview

external/google/gemini-3-1-pro-preview

23.8
3gpt-4.1-20250414

external/openai/gpt-4-1-20250414

23.0
4anthropic/claude-sonnet-4

external/anthropic/claude-sonnet-4

22.9
5gpt-5-2025-08-07

external/openai/gpt-5-2025-08-07

21.8
6gemini-2.5-flash

external/google/gemini-2-5-flash

21.6
7gpt-5-mini-2025-08-07

external/openai/gpt-5-mini-2025-08-07

21.4
8Grok-4-0709

external/xai/grok-4-0709

18.3
9gemini-3-pro-preview

external/google/gemini-3-pro-preview

17.4
10anthropic/claude-sonnet-4.6

external/anthropic/claude-sonnet-4-6

16.9
11gpt-5.2-2025-12-11

external/openai/gpt-5-2-2025-12-11

16.3
12google/gemini-3.1-flash-lite-preview

external/google/gemini-3-1-flash-lite-preview

16.3
13openai/gpt-5.4-2026-03-05

external/openai/gpt-5-4-2026-03-05

16.3
14Claude-3.5-Sonnet

external/anthropic/claude-3-5-sonnet

15.4
15gemini-3-flash-preview

external/google/gemini-3-flash-preview

15.4

Email Type Breakdown

Different email types have different model requirements:

Cold Outreach and Sales Emails

The hardest to write well. The reader has no reason to continue reading, high skepticism, and very little time. Good cold emails are short, specific about why they're reaching the specific recipient, lead with value not ask, and have a single clear call to action.

Models that produce generic outreach consistently fail here — the output is grammatically correct and structurally sound but lacks the specificity that makes cold emails work. The best models for cold outreach require clear context about the recipient and purpose, and then stay tightly focused on that context.

Internal Communication

Easier than external, but with its own requirements. Internal emails don't need to sell — they need to inform, request, or coordinate efficiently. The failure mode here is over-formality (treating an internal update like an external communication) or over-length (including context that internal recipients already have).

Executive Communication

C-suite communication is brief by necessity. Executives receive high email volume and have little patience for preamble. Good executive emails front-load the key point, provide supporting context efficiently, and have a clear action request or decision point. Models that write executive emails well have internalized that the point is always first.

Customer-Facing Responses

Similar to customer support — tone calibration and accuracy matter together. Customer-facing emails need to be warm but efficient, accurate about policies and commitments, and clear about next steps.

Follow-Up Emails

One of the most common use cases. The failure mode: models often produce follow-ups that are too apologetic ("I know you're very busy"), too passive ("just checking in"), or too long. Effective follow-ups are direct, reference the specific previous context, and make it easy to respond.

The "AI Tell" Problem

AI-written emails have a recognizable style that has become a credibility signal — not in a good way. Recipients who notice the AI pattern (especially in sales and outreach contexts) often discount or dismiss the message.

Common AI tells to test for when evaluating models:

  • Opening with "I hope this finds you well" or variants
  • Excessive politeness and hedging
  • Generic compliments before the ask
  • Closing with "please feel free to reach out with any questions"
  • Using "I wanted to" (passive framing) instead of direct statements
  • Four-paragraph structure when two would suffice
  • Bullet points where prose would flow better

Models vary significantly in how much they default to these patterns. When evaluating, test with instructions explicitly banning common patterns and see whether the model complies.

Tone instructions need to be specific. "Write a professional email" means something quite different to different models. "Write a direct, concise email under 100 words, no preamble, lead with the ask" is specific enough to get consistent results.

Personalization and Context

The quality ceiling for AI-assisted email writing is set by how much context you give the model. A model writing a cold outreach with only a name and company will produce generic output regardless of its capabilities. The same model with context about the recipient's role, recent company news, a specific connection point, and the precise ask will produce something much more targeted.

Practical implication: evaluating email writing quality on minimal-context prompts underestimates what the best models can do. Evaluating on rich-context prompts reveals the real quality gaps.

Prompt Strategies That Transfer Across Models

A few prompt design patterns that consistently improve email output:

Specify the goal, not just the task. "Write an email requesting a 30-minute meeting to discuss the Q1 product roadmap, targeting a VP of Engineering who I've never met but we have a mutual contact at Stripe" produces better output than "write a meeting request email."

Constrain length explicitly. Left unconstrained, most models write longer emails than most situations require.

Provide a tone reference. "Write in a direct, conversational tone — not formal, not casual" is more useful than "professional."

Give the signature context. Writing on behalf of a founder is different from a sales rep — including role context changes the appropriate framing and authority level.

Related Use Cases

  • Customer support — Tone calibration in another high-stakes written communication context
  • EQ Rankings — Emotional intelligence dimension, the strongest predictor for email quality
  • Creativity Rankings — Relevant for persuasive writing and non-generic phrasing

Full use-case rankings at /use-cases. Methodology at /methodology.

Related Reports