Claude Opus 4.8: Pricing, Capabilities, and Developer Guide After the Fable 5 Suspension

With Claude Fable 5 suspended by US export directive on June 12, Claude Opus 4.8 is Anthropic's current top model. Here's the complete developer guide: API pricing, context window, caching, and when to use Opus vs Sonnet.

On June 12, 2026, Anthropic suspended Claude Fable 5 and Mythos 5 globally following a US Commerce Department export control directive. The suspension affects all users on all plans — consumer subscriptions and direct API access alike. Claude Opus 4.8 is now Anthropic's top publicly available model, and it remains fully available across all Claude plans and API tiers.

This guide covers everything developers need to know about building and running applications on Claude Opus 4.8 in the current environment.

What Is Claude Opus 4.8?

Opus 4.8 is Anthropic's premier frontier reasoning model in the Claude 4.x family. It was the primary model for professional and production use prior to Fable 5's launch and suspension, and it represents the most capable Claude model developers can reliably access today.

Opus 4.8 capabilities:

200,000-token context window: Sufficient for large codebases, lengthy legal documents, long-form content, and multi-session conversation history.
Extended thinking mode: Enables structured, step-by-step reasoning for complex technical problems, code architecture decisions, and multi-step analysis.
Native vision: Understands and analyzes images, diagrams, screenshots, and charts within the context window.
Tool use and function calling: Full support for multi-step tool workflows, API integrations, and agent architectures.
Computer use (beta): Can interact with browser interfaces and desktop GUIs for automated workflows.

Claude API Pricing: Opus 4.8

Usage Type	Input (per 1M tokens)	Output (per 1M tokens)
Standard input	$5.00	$25.00
Cached input (90% discount)	$0.50	—
Cache write	$6.25	—

Anthropic's prompt caching is one of the most aggressive in the industry: a 90% discount on cached input tokens. For applications with repeating system prompts or large static context (knowledge bases, code repositories, product documentation), caching is the single most impactful cost optimization available.

Cost Examples

Example 1: Code Review Tool

Architecture: 50,000-token repository context (cached), 500-token user request, 800-token response.

Cache write (first call): 50,000 tokens × $6.25/M = $0.31
Cached input (subsequent calls): 50,000 × $0.50/M = $0.025
Uncached input (user request): 500 × $5.00/M = $0.0025
Output: 800 × $25.00/M = $0.02

Cost per code review (after initial cache load): approximately $0.05 per request.

At 10,000 reviews per month: ~$500 in model costs (plus negligible embedding/compute overhead).

Example 2: Legal Document Analysis

Architecture: 10,000-token user document (no caching, varies per document), 1,500-token analysis response.

Input: 10,000 × $5.00/M = $0.05
Output: 1,500 × $25.00/M = $0.0375

Cost per analysis: ~$0.09 per document. At 5,000 documents/month: ~$450.

Opus 4.8 vs Claude Sonnet 4.5: When to Use Each

The most important architectural decision in a Claude application is which model tier to route requests to. Opus 4.8 is not always the right choice.

Use Claude Sonnet 4.5 when:

The task is standard summarization, extraction, or question answering
Response speed matters more than maximum reasoning depth
You're processing high volumes of short-to-medium documents
The task is a well-defined classification or labeling job

Claude Sonnet 4.5 costs $3/M input, $15/M output — significantly cheaper than Opus 4.8. For tasks where Sonnet 4.5 performs equivalently (and it does for most standard tasks), routing to Sonnet saves 40–50% on model costs.

Use Claude Opus 4.8 when:

The task requires multi-step code architecture reasoning
The response needs the highest level of professional polish and accuracy
You're handling high-stakes outputs where an error has significant downstream cost
Extended thinking mode provides measurable quality improvement on the specific task
The context window exceeds what Sonnet can comfortably handle accurately

Subscription Plans: Which Plans Include Opus 4.8

On the consumer side:

Claude Free: Claude Sonnet 4.5 only (daily message limits)
Claude Pro ($20/month): Claude Sonnet 4.5 and Opus 4.8 with daily usage limits
Claude Max 5x ($100/month): Opus 4.8 with 5× Pro's usage limits
Claude Max 20x ($200/month): Opus 4.8 with 20× Pro's usage limits, full parallel workflow capability

Note: Claude Fable 5 is listed as suspended on all plans as of June 12, 2026. The suspension is global and affects all users regardless of plan tier.

Extended Thinking: How It Works and What It Costs

Extended thinking allows Opus 4.8 to reason step-by-step through a problem before generating a final answer. The thinking tokens are generated internally and are not included in the standard output token count in the same way — but they do consume additional compute and add latency.

Extended thinking is worth enabling for: * Complex mathematical derivations where intermediate steps matter * Multi-file code refactoring decisions where trade-offs must be weighed * Detailed professional analysis where the reasoning chain validates the conclusion

For standard tasks, extended thinking adds latency and cost without proportional quality improvement. Disable it at the API level for tasks that don't require deep reasoning chains.

Context Window Strategy for 200k

While 200k tokens sounds large, production applications need to be deliberate about context management:

200k tokens is approximately 150,000 words — roughly one and a half long novels, or a very large codebase
Sending 200k tokens on every call at $5/M input costs $1.00 per call before any caching. With caching at $0.50/M, it drops to $0.10 — a 10× reduction
For applications that reuse large contexts (the same codebase, the same product documentation, the same customer history), caching the stable prefix is a prerequisite for production cost management

The practical guidance: structure your context as a stable prefix (large context, cached) plus a dynamic suffix (user request, not cached). Cache the prefix. The cost profile becomes substantially more predictable and cost-efficient.