Claude Fable 5 vs Sonnet vs Haiku: Pick the Model

Anthropic now publishes a four-model Claude line, and the most common mistake teams make is reaching for the most capable model by default. The interesting question in 2026 is not which Claude model is best — it is which model is the cheapest one that still clears the bar for a given job. Get that right and you can run sophisticated AI features at a fraction of the naive cost. Get it wrong and you either burn money on overpowered inference or ship quality that does not hold up. This is the framework we use when building AI integrations for clients across the USA, UK, Canada, Europe and Australia.

The four models at a glance

As of June 2026, per Anthropic, here is the line you are choosing from. Prices are per million tokens, input then output.

Claude Fable 5 — the most capable model, built for the hardest long-horizon agentic and reasoning work. 1M context window, 128K max output, $10 input / $50 output. Thinking is always on. It carries safety classifiers that can refuse a request, requires 30-day data retention, and routes cyber, bio, chem and distillation requests to Opus 4.8.

Claude Opus 4.8 — the highly autonomous default for most complex work. 1M context, 128K output, $5 / $25. Supports adaptive thinking, where the model decides when and how much to reason.

Claude Sonnet 4.6 — the best speed and intelligence balance, and the production workhorse. 1M context, 64K output, $3 / $15. Also supports adaptive thinking.

Claude Haiku 4.5 — the fastest and cheapest, for simple, high-volume tasks. 200K context, 64K output, $1 / $5. Haiku does not support thinking, which is part of why it is so quick and inexpensive.

One discount applies across all four: a 90% prompt-caching reduction on repeated context, which materially changes the economics of anything that re-sends a large fixed prompt.

A decision framework, not a leaderboard

Start by writing down the job, then run it through four questions in order. The first model that passes all four is your answer.

1. What is the quality bar, and how do you measure it? You cannot pick a model rationally without an evaluation set — a representative sample of inputs with known-good outputs you can score. The bar is a pass rate on that set, not a feeling. Without it, every model choice is a guess, and teams default to the expensive model out of anxiety.

2. How hard is the task, really? Classification, routing, tagging and short extraction are simple. Summarisation, retrieval-augmented answering and most customer-facing chat are moderate. Multi-step reasoning, long-horizon agents that plan and execute across many tools, and large code refactors are hard. Be honest here — most production volume is simple or moderate, not hard.

3. What does a wrong answer cost? A mislabelled support ticket is cheap to fix; a wrong answer in a financial workflow is not. High cost-of-error pushes you up a tier even when the task looks moderate.

4. What are the latency and context constraints? Speed-critical paths favour Haiku and Sonnet. A 1M-token context need rules Haiku out at 200K. Always-on thinking on Fable 5 means longer, more deliberate turns — excellent for hard autonomous work, wrong for a snappy classifier.

The cheapest model that clears the bar wins on cost at scale. That is the whole game. You do not earn points for using a more capable model than the job requires.

Where each model earns its place

Haiku 4.5 is where you put the millions of cheap calls: classifying tickets, routing requests, tagging content, quick yes/no extraction. It clears the bar on simple tasks at a fifth of Sonnet's output price, which is what lets you afford the more capable models on the work that matters.

Sonnet 4.6 is the smart default to reach for first on most production features — support assistants, summarisation and extraction at volume, retrieval over your documents, and most general-purpose model choices. It gives you most of Opus's quality at lower cost and latency, with a full 1M-token context window. Upgrade only where your evaluation set shows Sonnet falling short.

Opus 4.8 is for hard work that genuinely stretches a model's intelligence but does not need the absolute ceiling: complex multi-step reasoning, long-horizon agents, deep code work, high-stakes analysis. At $5 / $25 it sits comfortably between the production workhorse and the frontier, and for most complex jobs it is the right autonomous default.

Fable 5 is reserved for the hardest, highest-value autonomous work — the tasks where the quality ceiling is worth $50-per-million output tokens and a more deliberate, longer-running turn. If you are weighing it against Opus, our deeper comparison of Fable 5 versus Opus 4.8 walks through exactly where the line falls.

Is Fable 5's edge real? Yes — but it is narrow

The benchmarks confirm Fable 5 is genuinely more capable, and they also explain why most workloads do not need it. On published third-party results, Fable 5 scores roughly 95% on SWE-bench Verified against 88.6% for Opus 4.8. On the harder SWE-bench Pro it posts a top score of 80.3% versus 69.2%. On FrontierCode Diamond, one of the most demanding coding evaluations, the gap widens to 29.3% versus 13.4%.

Read those numbers carefully. On the merely difficult SWE-bench Verified, the two models are six points apart. On the brutally hard FrontierCode Diamond, Fable 5 more than doubles Opus's score. The pattern is clear: Fable 5's advantage compounds as tasks get harder, and is almost invisible on tasks that are already well within Opus's range. If your hardest real workload looks like SWE-bench Verified rather than FrontierCode Diamond, Opus 4.8 will clear your bar at half the price. That is the practical reading of these benchmarks — not "Fable 5 is better, use it everywhere," but "Fable 5 is better exactly where the work is hardest, and that is where you should spend on it." We dig into the economics in our look at whether Fable 5 is worth it on pricing and ROI, and at the model itself in what Claude Fable 5 actually is.

The pro move: route by difficulty instead of standardising

The most cost-effective production systems we build do not pick one model — they route. A request comes in, Haiku triages or classifies it, Sonnet handles the bulk of the work, Opus 4.8 takes the genuinely hard cases, and Fable 5 is held back for the rare task that nothing cheaper can clear. Combine that routing with the 90% prompt-caching discount on repeated context and you get sophisticated behaviour at a fraction of what a single-model deployment would cost.

Routing also lets you respect Fable 5's constraints cleanly. Because it requires 30-day data retention and can refuse requests in restricted domains, you keep it off any path that handles data you cannot retain, and let the router fall through to Opus 4.8 — which is where Fable 5's own safety routing sends cyber, bio, chem and distillation requests anyway. Designed well, the router becomes the single place where cost, capability, latency and compliance decisions all live.

How we approach it for clients

We treat model selection as an engineering decision with a measurable answer, not a brand preference. First we build the evaluation set, because without it nothing else is rigorous. Then we start every feature on the cheapest plausible tier and only climb when the evaluation forces us to. We design routing in from the beginning rather than bolting it on, and we lean hard on prompt caching wherever a stable prefix exists. The result for clients in the USA, UK, Canada, Europe and Australia is the same: the quality their users need, at the lowest defensible cost, with a clear paper trail for why each model sits where it does.

Augmentation-first is the honest frame here. None of these models replaces good engineering judgement — they make a well-designed system cheaper and more capable. The teams that win in 2026 are not the ones paying for Fable 5 on every call. They are the ones who know exactly which jobs deserve it and route everything else to the model that already clears the bar.

Frequently Asked Questions

Do most workloads need Claude Fable 5?

No. Most workloads do not need Fable 5. It is Anthropic's most capable model for the hardest long-horizon agentic and reasoning work, but its edge only matters on the most demanding tasks. Haiku 4.5 and Sonnet 4.6 handle most volume, and Opus 4.8 covers hard work that does not need the absolute ceiling.

How do the four Claude models compare on price and context?

Per million tokens: Fable 5 is $10 input / $50 output with a 1M context window and 128K output. Opus 4.8 is $5 / $25, 1M context, 128K output. Sonnet 4.6 is $3 / $15, 1M context, 64K output. Haiku 4.5 is $1 / $5, 200K context, 64K output. A 90% prompt-caching discount applies across all of them.

When should you actually reach for Fable 5?

Reserve Fable 5 for the hardest, highest-value autonomous work where the quality ceiling justifies the price. Published third-party benchmarks show its real edge: roughly 95% on SWE-bench Verified versus 88.6% for Opus 4.8, and a top SWE-bench Pro score of 80.3% versus 69.2%. That gap only changes outcomes on genuinely hard tasks.

What is the "cheapest model that clears the bar" rule?

At scale, the cheapest model that clears your quality bar wins on cost. Define the bar with an evaluation set, then pick the lowest tier that passes it. Haiku for simple high-volume tasks, Sonnet for the balanced production workload, Opus 4.8 for hard work, and Fable 5 only where nothing else clears the bar.

Which models support thinking?

Fable 5 has thinking always on. Opus 4.8 and Sonnet 4.6 support adaptive thinking, where the model decides when and how much to reason. Haiku 4.5 does not support thinking, which is part of why it is the fastest and cheapest of the four.

Are there constraints unique to Fable 5?

Yes. Fable 5 has safety classifiers that can refuse a request, and it requires 30-day data retention, so it is not available under zero data retention. Cyber, bio, chem and distillation requests are routed to Opus 4.8. Factor these into your design before standardising on it.

Not sure which model fits your use case?

We'll help you pick the right Claude model and design cost-efficient routing. Book a free 30-minute strategy call.

Book a Free Call WhatsApp Us