Anthropic ships Claude in three named tiers: Opus, Sonnet, and Haiku. The naming scheme — from longest poetic form to shortest — encodes the quality-cost gradient, and the tiers have held their shape across several generations of the Claude family. But the differences between them are more subtle than "bigger is better." Picking the right tier for each task is one of the most consequential optimisations in any Claude-centric AI stack, and it is the difference between bills that scale gracefully and bills that explode. This guide explains what each tier is, where each genuinely wins, how to route traffic intelligently between them, and how to think about tier selection for production systems in 2026.
The naming convention and what it signals
Anthropic's tier naming is deliberately poetic. Opus is the longest musical form; Sonnet is a structured but shorter poetic form; Haiku is the shortest. The naming encodes quality and cost proportional to length.
Across generations (Claude 3, Claude 3.5, Claude 4, and beyond), the three-tier structure has persisted even as the models themselves have improved. The naming is sticky because it conveys the right mental model: there is a flagship for hard tasks, a workhorse for most tasks, and a compact variant for bulk work.
Version numbers flow through the tiers — Claude 4.7 Opus, 4.6 Sonnet, 4.5 Haiku — with each tier getting periodic upgrades. The specific current version at any moment is worth checking on Anthropic's site, because capabilities shift meaningfully with each release.
Claude Opus: the flagship
Opus is Claude's highest-quality tier. It is the model to reach for when you need the ceiling of what Claude can do — hard reasoning, complex writing, nuanced judgment calls, and high-stakes agentic work.
Opus shines on tasks where subtle quality differences matter. A Sonnet response to a difficult strategic question might be good; an Opus response is often notably better, with deeper analysis, more thoughtful qualifications, and a higher ceiling of insight. For once-a-day or once-a-week high-stakes work, Opus is worth the premium.
It is also the tier to use for agentic tasks with long horizons. Multi-step coding work, complex research syntheses, and agent loops where each decision matters benefit from Opus's deeper reasoning. In Claude Code and similar agent systems, routing harder subtasks to Opus often pays off in fewer retries and higher overall success rate.
The downsides are real. Opus is the slowest of the three — responses take noticeably longer. It is also the most expensive by a significant margin, typically 5-10x the cost of Sonnet per token. Running high-volume production traffic on Opus is economically painful.
Claude Sonnet: the workhorse
Sonnet is where most production Claude traffic lives in 2026. The quality is competitive with other frontier models (GPT-5, Gemini Pro) for the vast majority of tasks. The price is mid-tier. The latency is reasonable. For everyday assistant work, coding tasks, writing drafts, and general reasoning, Sonnet is usually the right answer.
The sweet-spot use case for Sonnet: you need frontier-quality output, but you do not need the absolute top-of-tier performance. Most day-to-day work fits this pattern. Writing a blog post, answering customer-support questions with RAG context, generating code for a moderately complex function, analysing a document — all Sonnet territory.
A practical habit: start every new Claude project on Sonnet. Only upgrade to Opus if evaluation shows Sonnet is failing on hard cases. Only downgrade to Haiku if cost is becoming a primary concern at scale. This "Sonnet by default" heuristic is a major efficiency gain for most teams.
Sonnet is also the tier most developers build against first via the API. Its balance of capability, speed, and price is well-matched to iterative product development.
Claude Haiku: the speed and cost tier
Haiku is the smallest and fastest Claude model. It is designed for high-volume, latency-sensitive, cost-sensitive tasks. Think triage, classification, routing, bulk summarisation, and the kind of backend work where the individual quality of each response matters less than the throughput and cost.
Haiku's quality is lower than Sonnet on genuinely hard tasks, but for focused, well-defined jobs it is often indistinguishable. Classifying tickets into categories, extracting structured fields from documents, generating brief summaries of conversations — all Haiku territory and all at a fraction of Sonnet's cost.
For production applications with meaningful volume, Haiku can be the difference between a profitable product and an unprofitable one. A pipeline handling a million queries per day at Sonnet rates is far more expensive than the same pipeline at Haiku rates. If the task fits Haiku's capability, the economics dramatically favour it.
Latency matters too. Haiku responds faster than Sonnet, and Sonnet responds faster than Opus. For user-facing applications where latency is a feature, Haiku's speed can noticeably improve the experience.
Pricing: the gradient that shapes routing decisions
Exact prices change, but the ratio is consistent: Haiku costs a small fraction of Sonnet, which costs a small fraction of Opus. Rough benchmarks in 2026: Opus around $15/$75 per million tokens (input/output), Sonnet around $3/$15, Haiku around $0.80/$4.
The price ratio between tiers — particularly the 5-10x gap between Opus and Sonnet, and the 3-5x gap between Sonnet and Haiku — is what makes smart routing economically decisive. An application that routes intelligently between tiers can be 3-10x cheaper than one that uses a single tier for everything, with no meaningful quality loss.
Prompt caching applies to all three tiers equally. A 5,000-token cached system prompt costs one-tenth the regular price on cache hits, across all tiers. This is the biggest cost-saving lever available and should be used whenever your prompts include repeated context.
Extended thinking: a feature across all tiers
Extended thinking — the reasoning mode where Claude spends extra compute on internal deliberation — is available on all three tiers, with different characteristics.
Opus with extended thinking is the most capable reasoning configuration Anthropic offers. For the hardest problems, it is the choice. Cost and latency are both significant.
Sonnet with extended thinking punches above its weight for moderately hard problems. The quality uplift from extended thinking on Sonnet often rivals baseline Opus, at lower cost. A frequently-used pattern.
Haiku with extended thinking is available but less commonly used. For simple tasks, the overhead of extended thinking is usually not worth it.
Thinking budgets (maximum tokens spent thinking) are configurable per request. Allocating a generous thinking budget for hard queries and a tight one for easy queries is another lever for optimising cost.
Routing strategies: the art of picking the right tier
For any Claude-heavy application handling meaningful traffic, routing between tiers is one of the highest-leverage optimisations available.
Task-based routing is the simplest pattern. Classification and extraction go to Haiku. Chat and general generation go to Sonnet. Reasoning-heavy and high-stakes work go to Opus. Each task type has a tier that fits it.
Complexity-based routing uses a lightweight classifier (or even a Haiku call) to evaluate each query's difficulty before routing. Easy queries go to Haiku, hard ones to Sonnet or Opus. Takes more engineering to build but can produce better economics.
Fallback routing tries the cheaper tier first and escalates to a more capable tier if the cheaper tier fails or is uncertain. Works well when quality can be evaluated quickly (structured output validation, explicit confidence signals).
User-tier routing assigns model tier based on who is asking. Premium users get Opus; free users get Haiku. A classic SaaS pricing-tier pattern.
Most serious production stacks combine multiple routing strategies. The payoff is dramatic: 3-10x cost reduction with similar or better quality than naive single-tier deployments.
A worked example: a customer-support chatbot
To make routing concrete, consider a support chatbot handling 100,000 conversations per day.
Incoming messages are first classified by intent (question, complaint, status update) using Haiku — fast and cheap, roughly $20/day in API cost.
Simple status queries ("where is my order?") are answered by Haiku with RAG context — good enough quality, very low cost.
General product questions go to Sonnet with RAG context — frontier quality at moderate cost.
Complex complaint resolution or refund decisions escalate to Opus with extended thinking for careful judgment, with human review for the highest-stakes cases.
Total cost at this scale: perhaps $500-$1,000/day depending on mix, versus several thousand dollars/day if everything went through Sonnet or Opus. The quality on each tier matches the task. This is what mature Claude deployment looks like in 2026.
When to pick each tier: a decision guide
A compressed rule of thumb.
Pick Opus when: the task is genuinely hard (complex reasoning, nuanced writing, high-stakes decision), you need the quality ceiling, cost and latency are acceptable, and volume is low-to-moderate.
Pick Sonnet when: you need frontier quality but not the absolute peak, cost matters, latency matters, and volume is moderate. This covers most production use cases.
Pick Haiku when: the task is focused and well-defined, volume is high, cost and latency are primary concerns, and quality requirements are met by Haiku's capability.
Route between them when: your application has mixed workloads with varying complexity. Routing infrastructure pays off quickly.
Cross-tier consistency
A useful property: the three tiers share an API surface and a prompting style. Switching between them is primarily a model-name change in the API call. This makes routing, fallback, and A/B testing between tiers operationally easy.
Prompts that work well on Sonnet typically work well on Opus (with higher quality) and acceptably on Haiku (with lower quality on complex tasks). This consistency lets you develop against Sonnet and then re-evaluate which tier best fits each task without rewriting prompts.
Comparing tiers against competitors
A useful mental map: Anthropic's tiers line up roughly with parallel tiers from OpenAI and Google.
Opus is comparable in positioning to OpenAI's GPT-5 flagship (especially with reasoning mode) and Gemini Ultra. All three are the most-capable tier, priced at a premium, for the hardest tasks.
Sonnet sits alongside GPT-5 base (non-reasoning), Gemini Pro, and similar mid-frontier models. These are the workhorses of production AI in 2026.
Haiku is in the same competitive zone as GPT-5-mini, GPT-4o-mini, Gemini Flash, and Mistral Small. These are the cost-effective options for high-volume or latency-sensitive applications.
When planning a multi-vendor AI stack, matching tiers across vendors lets you compare like-for-like. A Sonnet query should be benchmarked against GPT-5 base and Gemini Pro, not against GPT-5 with extended reasoning or Opus. Getting this tier-parity right is essential for meaningful evaluation.
Where Opus genuinely shines
Tasks where the premium is earned.
Legal drafting where subtle word choices matter. Financial analysis where numerical reasoning must be precise. Scientific reasoning with multi-step logical chains. Long creative writing where voice consistency over thousands of words is critical. Agentic tasks where an individual wrong decision compounds into cascading failures. High-stakes customer communications where tone and judgment matter. Complex coding refactors touching many files.
For any of these, Opus is usually worth the cost. For simpler versions of the same task types, Sonnet is often indistinguishable.
Where Haiku surprises people
A few places where Haiku punches above expectations.
Classification, extraction, and structured output generation often work just as well on Haiku as on Sonnet, because the task is well-defined and the output is constrained.
First-pass summarisation of long content often yields similar quality across tiers when the summary needs are basic.
Simple coding tasks — utility functions, one-file scripts, straightforward transformations — are often handled well by Haiku.
Routine chatbot interactions, when grounded with good RAG context, are often indistinguishable between Haiku and Sonnet from the user's perspective.
The implication: test Haiku before assuming you need Sonnet. On many tasks, the quality gap is smaller than the price gap, which is a clear economic win.
Common mistakes in tier selection
Patterns to avoid.
Using Opus as the default. The cost is rarely justified for everyday work. Sonnet is the right default for most applications.
Using Sonnet for bulk backend tasks. If a task is narrow, well-defined, and high-volume, Haiku is usually sufficient and dramatically cheaper.
Assuming tier differences are unified. The quality gap between Opus and Sonnet varies significantly by task. On some tasks, Sonnet is essentially Opus-quality. On others, Opus is notably better. Evaluate on your own tasks.
Not using prompt caching. Across all tiers, prompt caching is a meaningful cost saver. Skipping it for convenience is a common and expensive mistake.
Over-routing. If your traffic is mostly uniform in complexity, routing infrastructure may not pay off. Start simple and add routing when volume justifies the engineering cost.
How the tiers evolve
A pattern worth knowing: each generational update typically upgrades all three tiers together. Claude 4 Opus is Claude 3.5 Sonnet quality at a higher level; Claude 4 Sonnet is Claude 3.5 Opus quality at a lower cost; Claude 4 Haiku is previous-generation Sonnet quality at the Haiku price.
The practical implication: every 6-12 months, re-evaluate which tier you should be on. A task that required Opus last year may be handled perfectly by Sonnet today. A task that required Sonnet may now be handled by Haiku. This rolling improvement is one of the quiet benefits of staying on the Claude family.
A worked A-B test between tiers
For any application serving meaningful traffic, an actual measurement is worth more than intuition. A reasonable A-B test: route 50% of traffic to Sonnet and 50% to Haiku for a week. Collect user feedback, response quality scores (human or LLM-judge), and latency data. Compare.
Typical results: for well-scoped tasks with good RAG context, Haiku matches Sonnet on user satisfaction within measurement error, at roughly one-quarter the cost. For more open-ended or complex tasks, Sonnet pulls ahead measurably.
This kind of experiment is worth doing quarterly. As tier capabilities evolve (each generation pushes smaller tiers upward in capability), the right default for your specific task may shift. A task that needed Sonnet in early 2025 may be handled just as well by Haiku in mid-2026.
The meta-point: tier selection should be a data-driven decision, not a gut call. Build the evaluation infrastructure once, and it pays back every time you make a tier-change decision.
Use Haiku for bulk routing, Sonnet for most real work, and Opus only for hard reasoning or high-stakes writing. Always try Sonnet first, and upgrade or downgrade with evaluation data.
A quick checklist for choosing a tier today
For any new task or product, run through these questions before picking a tier. Is this task reasoning-heavy or high-stakes? If yes, start with Opus or Sonnet-with-extended-thinking. Is this task well-scoped with a clear output structure? If yes, try Haiku and only upgrade if quality fails. Is volume more than 100,000 queries a day? If yes, cost optimisation is critical; run A-B tests across tiers. Does the task benefit from long context? If yes, Sonnet and Opus have the strongest long-context performance.
These four questions cover most tier-selection decisions. Answering them honestly before writing code saves considerable backtracking later.
The short version
Claude ships in three tiers: Opus (flagship), Sonnet (workhorse), Haiku (fast and cheap). Sonnet is the right default for most production use. Opus earns its premium on genuinely hard tasks; Haiku wins on high-volume, well-defined work. Routing intelligently between tiers is a 3-10x cost optimisation for any application at scale. Prompt caching applies to all tiers and is essential for economics. Re-evaluate tier choice every 6-12 months as new generations ship. Used well, the three-tier structure is one of the most elegant quality-cost designs in 2026 AI, and teams that master routing across tiers unlock significant economic and quality advantages over teams that do not.