← Back to all guides
APIs

Claude Opus 4.8: Pricing, Capabilities, and Developer Guide After the Fable 5 Suspension

AIPricely Editorial TeamPrincipal AI Infrastructure Analyst
PublishedJune 16, 2026
Read Time7 min read

With Claude Fable 5 suspended by US export directive on June 12, Claude Opus 4.8 is Anthropic's current top model. Here's the complete developer guide: API pricing, context window, caching, and when to use Opus vs Sonnet.

On June 12, 2026, Anthropic suspended Claude Fable 5 and Mythos 5 globally following a US Commerce Department export control directive. The suspension affects all users on all plans — consumer subscriptions and direct API access alike. Claude Opus 4.8 is now Anthropic's top publicly available model, and it remains fully available across all Claude plans and API tiers.

This guide covers everything developers need to know about building and running applications on Claude Opus 4.8 in the current environment.


What Is Claude Opus 4.8?

Opus 4.8 is Anthropic's premier frontier reasoning model in the Claude 4.x family. It was the primary model for professional and production use prior to Fable 5's launch and suspension, and it represents the most capable Claude model developers can reliably access today.

Opus 4.8 capabilities:

  • 200,000-token context window: Sufficient for large codebases, lengthy legal documents, long-form content, and multi-session conversation history.
  • Extended thinking mode: Enables structured, step-by-step reasoning for complex technical problems, code architecture decisions, and multi-step analysis.
  • Native vision: Understands and analyzes images, diagrams, screenshots, and charts within the context window.
  • Tool use and function calling: Full support for multi-step tool workflows, API integrations, and agent architectures.
  • Computer use (beta): Can interact with browser interfaces and desktop GUIs for automated workflows.

Claude API Pricing: Opus 4.8

Usage TypeInput (per 1M tokens)Output (per 1M tokens)
Standard input$5.00$25.00
Cached input (90% discount)$0.50
Cache write$6.25

Anthropic's prompt caching is one of the most aggressive in the industry: a 90% discount on cached input tokens. For applications with repeating system prompts or large static context (knowledge bases, code repositories, product documentation), caching is the single most impactful cost optimization available.


Cost Examples

Example 1: Code Review Tool

Architecture: 50,000-token repository context (cached), 500-token user request, 800-token response.

  • Cache write (first call): 50,000 tokens × $6.25/M = $0.31
  • Cached input (subsequent calls): 50,000 × $0.50/M = $0.025
  • Uncached input (user request): 500 × $5.00/M = $0.0025
  • Output: 800 × $25.00/M = $0.02

Cost per code review (after initial cache load): approximately $0.05 per request.

At 10,000 reviews per month: ~$500 in model costs (plus negligible embedding/compute overhead).

Example 2: Legal Document Analysis

Architecture: 10,000-token user document (no caching, varies per document), 1,500-token analysis response.

  • Input: 10,000 × $5.00/M = $0.05
  • Output: 1,500 × $25.00/M = $0.0375

Cost per analysis: ~$0.09 per document. At 5,000 documents/month: ~$450.


Opus 4.8 vs Claude Sonnet 4.5: When to Use Each

The most important architectural decision in a Claude application is which model tier to route requests to. Opus 4.8 is not always the right choice.

Use Claude Sonnet 4.5 when:

  • The task is standard summarization, extraction, or question answering
  • Response speed matters more than maximum reasoning depth
  • You're processing high volumes of short-to-medium documents
  • The task is a well-defined classification or labeling job

Claude Sonnet 4.5 costs $3/M input, $15/M output — significantly cheaper than Opus 4.8. For tasks where Sonnet 4.5 performs equivalently (and it does for most standard tasks), routing to Sonnet saves 40–50% on model costs.

Use Claude Opus 4.8 when:

  • The task requires multi-step code architecture reasoning
  • The response needs the highest level of professional polish and accuracy
  • You're handling high-stakes outputs where an error has significant downstream cost
  • Extended thinking mode provides measurable quality improvement on the specific task
  • The context window exceeds what Sonnet can comfortably handle accurately

Subscription Plans: Which Plans Include Opus 4.8

On the consumer side:

  • Claude Free: Claude Sonnet 4.5 only (daily message limits)
  • Claude Pro ($20/month): Claude Sonnet 4.5 and Opus 4.8 with daily usage limits
  • Claude Max 5x ($100/month): Opus 4.8 with 5× Pro's usage limits
  • Claude Max 20x ($200/month): Opus 4.8 with 20× Pro's usage limits, full parallel workflow capability

Note: Claude Fable 5 is listed as suspended on all plans as of June 12, 2026. The suspension is global and affects all users regardless of plan tier.


Extended Thinking: How It Works and What It Costs

Extended thinking allows Opus 4.8 to reason step-by-step through a problem before generating a final answer. The thinking tokens are generated internally and are not included in the standard output token count in the same way — but they do consume additional compute and add latency.

Extended thinking is worth enabling for: * Complex mathematical derivations where intermediate steps matter * Multi-file code refactoring decisions where trade-offs must be weighed * Detailed professional analysis where the reasoning chain validates the conclusion

For standard tasks, extended thinking adds latency and cost without proportional quality improvement. Disable it at the API level for tasks that don't require deep reasoning chains.


Context Window Strategy for 200k

While 200k tokens sounds large, production applications need to be deliberate about context management:

  • 200k tokens is approximately 150,000 words — roughly one and a half long novels, or a very large codebase
  • Sending 200k tokens on every call at $5/M input costs $1.00 per call before any caching. With caching at $0.50/M, it drops to $0.10 — a 10× reduction
  • For applications that reuse large contexts (the same codebase, the same product documentation, the same customer history), caching the stable prefix is a prerequisite for production cost management

The practical guidance: structure your context as a stable prefix (large context, cached) plus a dynamic suffix (user request, not cached). Cache the prefix. The cost profile becomes substantially more predictable and cost-efficient.

AI

AIPricely Editorial Team

Principal AI Infrastructure Analyst

The AIPricely Editorial Team researches and tracks AI product launches, subscription pricing changes, and model benchmarks across the industry. We publish independent, data-backed guides to help developers, freelancers, and businesses make informed decisions about their AI tooling spend. Learn about our editorial process →

Want to compare these tools side-by-side?

Our dynamic compare tool lets you place ChatGPT, Claude, Gemini, and 20+ other leading platforms side-by-side with full pricing tiers and limits.

Open Comparison EngineBrowse All Tools