On June 12, 2026, Anthropic suspended Claude Fable 5 and Mythos 5 globally following a US Commerce Department export control directive. The suspension affects all users on all plans — consumer subscriptions and direct API access alike. Claude Opus 4.8 is now Anthropic's top publicly available model, and it remains fully available across all Claude plans and API tiers.
This guide covers everything developers need to know about building and running applications on Claude Opus 4.8 in the current environment.
What Is Claude Opus 4.8?
Opus 4.8 is Anthropic's premier frontier reasoning model in the Claude 4.x family. It was the primary model for professional and production use prior to Fable 5's launch and suspension, and it represents the most capable Claude model developers can reliably access today.
Opus 4.8 capabilities:
- 200,000-token context window: Sufficient for large codebases, lengthy legal documents, long-form content, and multi-session conversation history.
- Extended thinking mode: Enables structured, step-by-step reasoning for complex technical problems, code architecture decisions, and multi-step analysis.
- Native vision: Understands and analyzes images, diagrams, screenshots, and charts within the context window.
- Tool use and function calling: Full support for multi-step tool workflows, API integrations, and agent architectures.
- Computer use (beta): Can interact with browser interfaces and desktop GUIs for automated workflows.
Claude API Pricing: Opus 4.8
| Usage Type | Input (per 1M tokens) | Output (per 1M tokens) |
|---|---|---|
| Standard input | $5.00 | $25.00 |
| Cached input (90% discount) | $0.50 | — |
| Cache write | $6.25 | — |
Anthropic's prompt caching is one of the most aggressive in the industry: a 90% discount on cached input tokens. For applications with repeating system prompts or large static context (knowledge bases, code repositories, product documentation), caching is the single most impactful cost optimization available.
Cost Examples
Example 1: Code Review Tool
Architecture: 50,000-token repository context (cached), 500-token user request, 800-token response.
- Cache write (first call): 50,000 tokens × $6.25/M = $0.31
- Cached input (subsequent calls): 50,000 × $0.50/M = $0.025
- Uncached input (user request): 500 × $5.00/M = $0.0025
- Output: 800 × $25.00/M = $0.02
Cost per code review (after initial cache load): approximately $0.05 per request.
At 10,000 reviews per month: ~$500 in model costs (plus negligible embedding/compute overhead).
Example 2: Legal Document Analysis
Architecture: 10,000-token user document (no caching, varies per document), 1,500-token analysis response.
- Input: 10,000 × $5.00/M = $0.05
- Output: 1,500 × $25.00/M = $0.0375
Cost per analysis: ~$0.09 per document. At 5,000 documents/month: ~$450.
Opus 4.8 vs Claude Sonnet 4.5: When to Use Each
The most important architectural decision in a Claude application is which model tier to route requests to. Opus 4.8 is not always the right choice.
Use Claude Sonnet 4.5 when:
- The task is standard summarization, extraction, or question answering
- Response speed matters more than maximum reasoning depth
- You're processing high volumes of short-to-medium documents
- The task is a well-defined classification or labeling job
Claude Sonnet 4.5 costs $3/M input, $15/M output — significantly cheaper than Opus 4.8. For tasks where Sonnet 4.5 performs equivalently (and it does for most standard tasks), routing to Sonnet saves 40–50% on model costs.
Use Claude Opus 4.8 when:
- The task requires multi-step code architecture reasoning
- The response needs the highest level of professional polish and accuracy
- You're handling high-stakes outputs where an error has significant downstream cost
- Extended thinking mode provides measurable quality improvement on the specific task
- The context window exceeds what Sonnet can comfortably handle accurately
Subscription Plans: Which Plans Include Opus 4.8
On the consumer side:
- Claude Free: Claude Sonnet 4.5 only (daily message limits)
- Claude Pro ($20/month): Claude Sonnet 4.5 and Opus 4.8 with daily usage limits
- Claude Max 5x ($100/month): Opus 4.8 with 5× Pro's usage limits
- Claude Max 20x ($200/month): Opus 4.8 with 20× Pro's usage limits, full parallel workflow capability
Note: Claude Fable 5 is listed as suspended on all plans as of June 12, 2026. The suspension is global and affects all users regardless of plan tier.
Extended Thinking: How It Works and What It Costs
Extended thinking allows Opus 4.8 to reason step-by-step through a problem before generating a final answer. The thinking tokens are generated internally and are not included in the standard output token count in the same way — but they do consume additional compute and add latency.
Extended thinking is worth enabling for: * Complex mathematical derivations where intermediate steps matter * Multi-file code refactoring decisions where trade-offs must be weighed * Detailed professional analysis where the reasoning chain validates the conclusion
For standard tasks, extended thinking adds latency and cost without proportional quality improvement. Disable it at the API level for tasks that don't require deep reasoning chains.
Context Window Strategy for 200k
While 200k tokens sounds large, production applications need to be deliberate about context management:
- 200k tokens is approximately 150,000 words — roughly one and a half long novels, or a very large codebase
- Sending 200k tokens on every call at $5/M input costs $1.00 per call before any caching. With caching at $0.50/M, it drops to $0.10 — a 10× reduction
- For applications that reuse large contexts (the same codebase, the same product documentation, the same customer history), caching the stable prefix is a prerequisite for production cost management
The practical guidance: structure your context as a stable prefix (large context, cached) plus a dynamic suffix (user request, not cached). Cache the prefix. The cost profile becomes substantially more predictable and cost-efficient.