On June 12, 2026, at 5:21 PM ET, Anthropic received an order from the US Commerce Department's Bureau of Industry and Security telling it to cut off access to its most capable model for every foreign national on earth. Because consumer cloud APIs can't verify citizenship in real time, the practical effect was a full global blackout.
Fable 5 went dark 72 hours after launch.
Eighteen days later, it's back. On June 30, Commerce lifted the suspension, and starting Wednesday July 1, Fable 5 and Mythos 5 return to the API, claude.ai, and Claude Code. Anthropic is including up to 50% of weekly limits through July 7 to smooth the re-entry.
The comeback matters less than what came back. Fable 5 is the first model built primarily for long-horizon autonomous work, and the reinstatement arrived bundled with compliance changes that shape how you should deploy it.
What "Fable 5 is back" means for a power user: you again have access to a Mythos-class model that can run unattended for hours across hundreds of tool calls without losing the plot, but it now ships with widened safety classifiers and mandatory 30-day data retention that you have to engineer around.
TL;DR
Fable 5 returned July 1, 2026 after an 18-day export-control suspension. It leads SWE-bench Pro at 80.3% and is designed for multi-hour agentic autonomy that Opus 4.8 and GPT-5.5 can't sustain.
It costs double Opus ($10/$50 per million tokens), so the winning pattern is orchestration: Fable 5 plans and verifies, cheaper models execute. The reinstatement came with widened classifiers and 30-day retention, so build model-agnostic routing and treat its output as untrusted.
Key takeaways
- Fable 5 is a planner, not a doer. Route mechanical steps to Opus 4.8 or Sonnet 5 and save 70-90% of token cost.
- The 917-task audit shows a 73% price premium for a 0.9-point accuracy gain on routine work. Don't run everything through it.
- Its real edge is sustained state across long runs: full-repo targeted migrations, error recovery, and self-validation.
- Safety refusals return HTTP 200 with
stop_reason: "refusal". Handle them with server-side fallback to Opus, or your agents will break mid-run. - The government risk is real and unresolved by a clean technical fix. Assume the model can be pulled again with under two hours' notice.
What actually happened, and what Fable 5 is
Fable 5 launched June 9, 2026 as a new "Mythos-class" tier, a shift away from chat-style prompt-and-response toward long-horizon autonomous operation. Mythos 5, the restricted sibling, went only to vetted cybersecurity and infrastructure orgs under Project Glasswing.
Three days later came the BIS directive. It invoked the "deemed export" doctrine (15 CFR § 734.13), treating access by a foreign national as an export of controlled technology.
Anthropic's own words: "the net effect of this order is that we must abruptly disable Fable 5 and Mythos 5 for all our customers to ensure compliance." The full timeline is documented in Anthropic's statement and a Cloud Security Alliance analysis.
Here's the technical shape you need to configure against.
Fable 5 ships with a 1,000,000-token default context and up to 128,000 tokens of output. Adaptive thinking is always on and cannot be disabled. Pricing is exactly double Opus 4.8: $10 per million input tokens, $50 per million output, versus Opus at $5/$25 and Sonnet 5 at $3/$15, per Anthropic's model docs and Finout's pricing breakdown.
Two constraints deserve early attention. First, Fable 5 mandates 30-day data retention; unlike Opus, you can't run it under zero data retention by default. Second, its cyber, bio, and chem safety classifiers route restricted requests to Opus 4.8, which means Fable's general-use dangerous-capability risk is deliberately capped near Opus's level.
That fallback is a feature you'll interact with constantly, so we'll wire it up below.
The hype holds up, and here's the evidence
Start with the numbers, because they're unusually decisive.
On SWE-bench Pro, Fable 5 scored 80.3% against Opus 4.8's 69.2% and GPT-5.5's 58.6%. On Cognition's FrontierCode, it posted the highest scores among frontier models even at medium deliberation. An 11-point gap over the next-best frontier model is not noise.
But the benchmark that matters for power users isn't a single-shot score. It's coherence over time. Opus 4.8 and GPT-5.5 both degrade as context fills and tool calls stack up. Fable 5 is engineered to hold state across multi-hour, sometimes multi-day execution, and that's the capability leap.
Concretely, that buys you four things you couldn't reliably get before. Multi-hour unattended autonomy. Persistent state across hundreds of tool calls. Error recovery, where the model reads a failed test, diagnoses it, and self-corrects instead of looping. And full-repo targeted migrations.
The Stripe example is the headline, and it's worth stating precisely. In a 50-million-line Ruby codebase, Fable 5 performed one codebase-wide migration in a day that a team would have spent over two months doing by hand.
One correction, because accuracy is the point. This was not a 50-million-line rewrite. As a Hacker News commenter clarified: "in a 50M LOC codebase, one specific codebase-wide migration was done.
Very impressive, but obviously not on the order of a whole-codebase migration." Frame it as a targeted transformation across a giant repo, which is exactly the workload Fable 5 is built for.
The power-user playbook
This is where the value lives. Everything below assumes you're running Fable 5 through the API or Claude Code, not just chatting with it.
Treat Fable 5 as an orchestrator, not a worker
The single highest-use move is architectural. Fable 5's strength is planning, deep logical analysis, and self-validation. As the LushBinary long-horizon guide puts it, "most sub-agent work, running tests, editing files, searching a codebase, summarizing a document, does not need the premium model."
So build a hierarchy. Fable 5 sits at the top and produces the plan. Cheaper specialized models, Opus 4.8 or Sonnet 5 or open weights, execute bounded tasks: file edits, code search, test runs, summarization.
Route by complexity. High-deliberation planning and architecture go to Fable 5. Mechanical steps route down. A custom wrapper classifies each task into mechanical, standard, or frontier tiers; mechanical and standard route to Sonnet 5 or Opus 4.8, and only ambiguity or regressions escalate back up.
Teams running this pattern report 70-90% token savings versus running everything through Fable 5.
Verify in two layers, and give the critic amnesia
A long-horizon agent that can't verify its own work compounds errors for hours. Build checkpoints so each phase produces an output you can check before the next begins, a discipline the MindStudio writeup makes central.
Layer one is deterministic and zero-token. Run unit tests, type-checkers (Mypy, Sorbet), linters, and compiler dry-runs before you spend a single secondary model call. If the code doesn't compile, no LLM needs to look at it.
Layer two is adversarial. Spawn a fresh-context subagent that sees only the code diff and the original spec, with none of the planning transcript. Stripped of the reasoning that produced the code, it critiques without confirmation bias. This catches the plausible-but-wrong output that a same-context reviewer rubber-stamps.
The four-gate migration playbook
For codebase-wide migrations, the mental model from the Stripe work is a planning loop wrapped around a patch loop wrapped around a verification loop. Four gates run it.
Gate 1, scope and seam. Define the initial state, desired state, file globs, invariants, and success criteria (compile plus test). Without a clear before and after, the verifier has nothing to check and the agent drifts.
Gate 2, agent init and memory. Boot claude-fable-5, mount the Fable 5 memory tool to persist style guides, invariants, and scope. Set effort to medium for routine changes, high for core data models.
Gate 3, the automated verification loop. Write patches to a sandbox branch. After each commit, run syntax, lint, static analysis, and unit tests, then inject stderr and test failures back as tool feedback so the model corrects.
Gate 4, grouped review staging. Group commits by architectural intent into separate bite-sized PRs, each with a summary and CI logs, so human review isn't the bottleneck.
Three hard rules make this safe. Restrict active file loading to the immediate dependency graph, never the whole repo. Never operate on main; use ephemeral sandbox branches only. And after 3 to 5 failed self-corrections, roll back deterministically to the last green commit and pause for a human.
Prompt patterns for unattended runs
Fable 5's deliberation is powerful and occasionally its own enemy. The official prompting guide gives four patterns that keep long runs on track.
Cut deliberation latency:
When you have enough information to act, act. Do not re-derive facts already established. Give a recommendation, not an exhaustive survey.
Suppress unrequested refactoring:
Don't add features, refactor, or introduce abstractions beyond what the task requires. Don't add error handling, fallbacks, or validation for scenarios that cannot happen.
Avert empty planning loops, the failure where the model narrates a plan and stops:
You are operating autonomously. The user is not watching. Before ending your turn, check your last paragraph. If it is a plan, an analysis, a question, or a promise about work you have not done, do that work now with tool calls.
Ground progress claims against reality:
Before reporting progress, audit each claim against a tool result from this session. If tests fail, say so with the output.
For async status, give the agent a custom send_to_user tool so it can report without ending its turn.
Context and memory engineering
Long runs overflow context. Fable 5 supports server-side compaction (beta header compact-2026-01-12): when input crosses a threshold, the API pauses, writes a structured summary into a compaction_block at the start of context, discards verbose history, and preserves prompt caching. The compaction docs cover it.
One sharp edge to fix. During summarization the model may try to call a tool and return a null compaction block. Prevent it with explicit compaction_instructions:
Summarize the transcript inside <summary></summary> tags.
Do not call any tools while writing this summary; respond with text only.
For durability across crashes, persist two files. A plan.json holds the roadmap, current task index, and checkpoints. A state.db (SQLite) holds variables, subagent PIDs, and verified invariants. Update both each turn so a killed thread resumes from the last checkpoint.
For cross-session memory, Mem0 handles turn ingestion, fact extraction to a vector DB, and context-aware retrieval.
Handle the safety classifiers before they break your agent
This is the part most teams get wrong on day one. Fable 5 runs four classifiers: cyber, bio, frontier_llm (distillation), and reasoning_extraction. When one triggers, the Messages API returns HTTP 200, not a 4xx or 5xx, with stop_reason: "refusal" and a stop_details.category.
If your error handling only watches for non-200 status codes, a refusal sails through as a "successful" empty response and your agent acts on nothing.
Wire up server-side fallback. Send the header server-side-fallback-2026-06-01 and set fallbacks=[{"model":"claude-opus-4-8"}] so a refused request completes on Opus in the same call. Preserve the injected {"type":"fallback","from":...,"to":...} block in later turns so context stays coherent.
One gotcha for regulated shops: ZDR organizations get a 400 invalid_request_error, because Fable mandates 30-day retention. Override it per workspace in Console under Settings, Workspaces, Privacy, then toggle 30-day retention. The migration guide has the full path.
Cost control that actually moves the bill
You can't turn off adaptive thinking, but you have three levers.
First, the effort parameter (low/medium/high/xhigh/max). Medium or low cuts thinking tokens hard for routine work. Second, prompt-level steering: "Answer directly without deliberating" for the simple stuff. Third, and biggest, prompt caching.
Put stable prefixes (rules, tool schemas, system prompt) first for a 90% cache discount on a minimum 512 cacheable tokens. That drops effective input cost from $10 per million toward roughly $1 per million.
Stack these with orchestration and the premium price stops being the story.
What the ROI actually looks like
Now the honest accounting, because the price premium is steep and you should route deliberately.
The most useful data point is a 917-task audit. Fable 5 hit 92.9% success at $1.25 per task. Opus 4.8 hit 92.0% at $0.74 per task. That's a 73% cost premium for a 0.9-point accuracy gain.
On routine tasks, Opus wins on value, full stop. Reserve Fable 5 for work where the 0.9 points, or the long-horizon coherence, actually pays for itself.
There's a token-efficiency counterpoint worth knowing. In some internal agentic harnesses, Fable 5 achieved better results with about half the tokens, making it cost roughly the same as Opus because it produces more surgical, targeted diffs.
So measure your own harness before assuming the sticker price. The premium can evaporate when the model does the job in one clean pass instead of three messy ones.
And the ceiling is real. Stripe's roughly 60x time compression on a codebase migration is the kind of outcome that justifies the tier by itself.
Two challenges to run this week. Take a real migration you've been dreading, a framework bump or an API deprecation across dozens of files, scope it with the four gates, and let Fable 5 orchestrate it on a sandbox branch. Then, separately, take one of your existing Opus agents, add the two-layer verification with a fresh-context critic, and measure how many bad diffs it catches before they reach review.
Limitations, risks, and the part nobody has resolved
Steelman both sides here, because the honest read is genuinely unsettled.
Start with the technical caveats. Fable 5's extended thinking causes timeouts: Endor Labs' Agent Security League ran it on 200 vulnerability-fix tasks and reported "more per-instance timeouts than any model-and-harness combination we have ever tested," with 15 runs blowing a 40-minute limit and scores of 59.8% FuncPass and 19.0% SecPass.
Over-helpfulness is a live risk too: a Salesforce CRM agent executed a planted query_opportunities call from untrusted record data, because the model that refuses to leak a credential will still fire an attacker's tool call framed as "the next step." Endor also found 38 of 200 tasks showed confirmed memorization of public CVE patches, so validate on private regression tests, not public benchmarks.
And the widened classifiers over-refuse: benign psychology, neuroscience, and DB-schema prompts trip the bio and cyber gates and fall back to Opus.
Then the bigger question. The system card classified Fable 5 as ASL-3 for CBRN, at the "CB-1" threshold (assisting synthesis of non-novel biological agents). Anthropic called it "the most capable model we have ever evaluated on cyber tasks," one that found zero-days in hardened OSes like OpenBSD pre-release, and conceded that "perfect jailbreak resistance does not appear to be possible today."
The national-security case: Commerce Secretary Howard Lutnick framed oversight as strengthening US AI leadership, and White House AI advisor David Sacks argued the codebase-remediation bypass was a severe exposure the government acted on because leadership initially refused to fix it.
The industry and civil-liberties case: OpenAI's Sam Altman said "I just don't like the idea of the government picking the customers," CTO Alireza Rezvani argued the passport-based restriction "just sorted builders by passport" since the bypass is commodity-level on GPT-5.5, and Snyk's Stephen Thoemmes warned that "defense cannot be improved if the tools defense requires are forbidden".
Here's the neutral core. The reinstatement came through legal pressure and a compliance package: a classifier that blocks the bypass in over 99% of tested cases, deliberately widened false-positive-heavy safety margins, 30-day retention monitoring, and voluntary government pre-release access for future models.
If the government's danger assessment was accurate and the fix was mostly administrative rather than a hard technical guarantee, meaningful dual-use risk remains, and Commerce reserves the right to "reevaluate and adjust the scope" with under two hours' notice.
The responsible posture follows directly. Run a model-agnostic routing gateway with no hardcoded claude-fable-5 and dynamic failover to open weights or GPT-5.5. Treat AI-generated code as untrusted and run local SAST before every commit.
Use ephemeral sandboxed containers with zero default network access. And frame prompts around defense, audit, and structure to avoid tripping the cyber classifier unnecessarily.
What to run Wednesday morning
Don't rip out your Opus pipelines to route everything through Fable 5. That's how you pay a 73% premium for 0.9 points.
Instead, ship a small, sharp change on day one. Stand up the orchestrator pattern: Fable 5 plans, cheaper models execute, and a fresh-context critic verifies. Add the server-side-fallback-2026-06-01 header and handle stop_reason: "refusal" before a real workload hits a classifier.
Cache your stable prefixes to pull effective input cost toward $1 per million. Then point it at one migration you've been avoiding and watch it run unattended on a sandbox branch.
And keep the exit ramp built. Abstract the model name, keep failover to a second provider warm, and assume the tap can be turned off again. The capability is worth adopting. The dependency is worth hedging.
We'll follow this with a hands-on teardown of the orchestrator gateway, with real routing code and cost traces from a week of production runs. Subscribe if you want it the day it ships.
Sources
- Introducing Claude Fable 5 and Claude Mythos 5
- Statement on the US directive to suspend Fable 5 and Mythos 5
- Redeploying Claude Fable 5
- Claude Fable 5 & Mythos 5 System Card
- Prompting Claude Fable 5
- Refusals and fallback
- Compaction
- Model migration guide
- Stripe 50M-line Ruby migration
- Hacker News migration correction
- LushBinary: build long-horizon agents
- MindStudio: real-world agentic results
- 917-task audit (Reddit)
- Endor Labs red-team
- Reco AI red-team findings
- Snyk suspension takeaways
- The Guardian: export controls lifted
- FT: ban lifted
