cluster

Claude Fable 5 First Look: What Actually Changes for Coding Agents

The 80.3% SWE-Bench Pro headline is vendor-stated; the mandatory 30-day retention and silent safety classifier are contractual facts, and they should drive your architecture decisions this week.

June 11, 202610 min read
Claude Fable 5 first lookClaude Fable 5 SWE-Bench ProClaude Fable 5 pricing
Claude Fable 5 First Look: What Actually Changes for Coding Agents

One day after Anthropic launched Claude Fable 5, Microsoft removed it from the model picker its own employees use in internal GitHub Copilot. That decision, reported by The Verge on June 10, wasn't about capability.

It was about a single clause: Fable 5 requires 30-day data retention on all traffic, with no opt-out, even for customers holding zero-data-retention contracts.

That's the real story of this Claude Fable 5 first look. The benchmark headline will get the attention. The terms of use will decide your architecture.

TL;DR

  • Claude Fable 5 shipped June 9, 2026 at a vendor-stated 80.3% on SWE-Bench Pro, with no independent replication and no system card posted yet.
  • Pricing is $10/$50 per million tokens, exactly 2x Opus 4.8, free on paid Claude plans through June 22.
  • A live safety classifier silently reroutes under 5% of sessions to Opus 4.8, with documented false positives on benign security and biology prompts.
  • Mandatory 30-day data retention overrides existing ZDR contracts on every surface, including AWS Bedrock and the Anthropic API.
  • The right week-one move: measure on the free window, keep Opus 4.8 as your default, route only long-horizon work to Fable 5.

Here's the one-line answer for anyone arriving from search: Claude Fable 5 is the public version of Claude Mythos 5, same weights, plus a safety classifier and a mandatory 30-day retention policy, and those two additions matter more to coding-agent architecture than its benchmark scores do.

What did Anthropic actually ship?

Anthropic released two models with the same weights and different wrappers. Claude Mythos 5 is the unrestricted model, gated to 52 Project Glasswing partners in cyber-defense, biosecurity, and frontier research. Claude Fable 5 is the public surface, with a live classifier that routes flagged prompts to Opus 4.8.

The launch post is explicit that Fable 5 is "a version of Mythos the public can access today," a framing TechCrunch picked up on launch day.

The verified facts: API model IDclaude-fable-5, a 1M-token context window, $10/$50 per million tokens, free on Pro, Max, Team, and Enterprise plans through June 22, 2026. It's live on the Anthropic API, AWS Bedrock, GitHub Copilot, and Microsoft Foundry.

Is the 80.3% SWE-Bench Pro score real?

The 80.3% is Anthropic-stated and has not been independently replicated. The launch post names the eval suites Anthropic ran (Cognition FrontierCode, Hebbia Finance, CursorBench, FrontierBench) but the model page doesn't publish raw numbers or a harness spec. The system card is listed on Anthropic's index but not yet posted.

Anthropic's stated comparison set: Fable 5 at 80.3%, Opus 4.8 at 69.2%, GPT-5.5 at 58.6%, Gemini 3.1 Pro at 54.2%. All vendor-stated.

The independent signals point the same direction but at different magnitudes. Vellum reports 95% on SWE-Bench, a 15-point gap above Anthropic's own number, which Vellum attributes to scaffolding: multi-shot agentic harnesses with retries versus a single-eval pass. Artificial Analysis ranks Fable 5 #1 on its GDPval-AA benchmark at 1932. BenchLM puts it #2 of 123 models.

The lesson for harness owners is that "SWE-Bench Pro" without a fixed harness spec is no longer one number. Treat 80.3% as the conservative floor and run your own evals.

Codersera's launch review said it best: "Wait for the system card before betting your roadmap on these numbers." No Epoch AI, HELM, Aider polyglot, or LiveCodeBench result exists yet. That's normal at 48 hours. It's also exactly why fleet-wide migration this week is premature.

How does the Fable 5 safety classifier work?

Fable 5 ships with a live classifier that reroutes three prompt categories to Opus 4.8: cybersecurity, biology and chemistry, and capability distillation. Anthropic states the fallback fires in under 5% of sessions. Artificial Analysis measured 2% on its own eval. The cyber and bio reroutes show a visible "switched to Opus 4.8" notice; the distillation reroute is silent.

There's a second layer most coverage missed. Per Interconnects' reading of the system card, queries about frontier LLM development (pretraining, distributed training, accelerator design) get deliberately degraded output via prompt rewriting and weight interventions, covering roughly 0.03% of traffic.

Under this intervention, Fable 5 scores at Sonnet 4.6 level on PostTrainBench. That detail never appeared in the consumer launch post.

False positives are already documented. Business Insider's reporter got rerouted asking how cancer misinformation spreads online. The Verge found it won't answer basic biology questions. And GitHub issue #66697 confirms false positives on authorized defensive security audits.

For an autonomous coding agent, this is a product risk, not a curiosity. A refactor touching code that mentions "vulnerability" or "exploit" can get silently answered by a model that costs half as much and performs worse on the task you priced for.

Anthropic has published no false-positive rate; the prior-generation Constitutional Classifiers paper measured a 0.38% refusal increase on benign queries, and a silent reroute is harder to detect than a refusal.

The 30-day retention is the actual architecture decision

Fable 5's mandatory 30-day retention overrides existing zero-data-retention contracts on every surface. That single clause changes coding-agent architecture more than any benchmark in the launch post.

Anthropic requires 30-day retention on all Fable 5 and Mythos 5 traffic, with no opt-out, to operate its safety classifiers. The override of prior ZDR terms is verified across Anthropic's data-usage docs, The Register, and the GitHub Copilot changelog. Anthropic says the data isn't used for training and auto-deletes at day 30.

The enforcement detail that matters most: on AWS Bedrock, enabling Fable 5 requires aprovider_data_sharingtoggle that moves prompt data out of the AWS security boundary into Anthropic's. If you chose Bedrock specifically for residency (FedRAMP, C5, IRAP), that's a categorical change, and it's what drove the 260-point Hacker News thread on the policy.

On GitHub Copilot, Fable 5 is the only Claude model that breaks ZDR; Opus 4.8, Sonnet 4.5, and Haiku 4.5 keep it. The model ships off by default for Copilot admins.

Two more compliance flags. Fable 5's HIPAA eligibility is not confirmed in any publicly reviewed material as of June 11, so healthcare workloads stay on Opus 4.8 under existing ZDR, full stop.

And for EU/UK customers, 30-day retention on prompts is a new processing purpose: expect a DPA amendment, sub-processor list update, and possibly a fresh DPIA before legal signs off. That review clock runs in weeks, not days, which is itself an argument against a week-one default flip.

Claude Fable 5 pricing: the 2x math

At list price, Fable 5 costs exactly double Opus 4.8, and the premium only pays back on tasks where it finishes in fewer turns.

Model Input ($/1M) Output ($/1M) 500K-in / 200K-out session 100 sessions/day, annualized
Claude Opus 4.8 $5 $25 $7.50 $273,750
Claude Fable 5 $10 $50 $15.00 $547,500

Anthropic's justification is that Fable 5 completes equivalent work with fewer tool calls and fewer total tokens. That claim is vendor-stated and unmeasured by anyone independent. The 2x figure is your uncached anchor; cache hits in multi-turn loops can pull realized cost below it.

The widely shared Stripe anecdote deserves a flag too. The "50-million-line Ruby migration in one day" story originates on Anthropic's own product page, has been repeated by a dozen outlets, and is not confirmed by Stripe's engineering blog.

Stripe's last self-disclosed Ruby codebase size was 25 million lines in 2024. Exactly double. Treat it as a marketing-grade customer story.

What this means for you

If you run Opus 4.8 in a coding-agent harness today, here's the week-one playbook:

  1. Keep Opus 4.8 as the default. Short, well-scoped tasks can't justify 2x.
  2. Use the free window (through June 22) for evals only. Run SWE-Bench Pro plus your top three internal tasks. Don't ship production traffic through a window Anthropic is using to calibrate classifier thresholds.
  3. Pre-classify on your side. A cheap keyword filter (vuln, exploit, CVE, payload, pathogen, RCE) routing matches straight to Opus 4.8 skips the classifier churn and the Fable 5 price for prompts that would get rerouted anyway.
  4. Detect the reroute. The fallback isn't surfaced in the response payload. Logmodel_requestedversusmodel_returned, inferred from cost and token profile; a 50% cost drop on a non-sensitive prompt is your strongest reroute signal.
  5. Pin regulated workloads to Opus 4.8. HIPAA eligibility unconfirmed, ZDR broken, DPA updates pending. There's no version of this that clears compliance review in a week.
  6. Keep Sonnet and Haiku on autocomplete. Fable 5 is wasted there.

Migrate the default in two to three weeks, when three things land: the system card, at least one independent SWE-Bench Pro replication, and clarity on HIPAA and enterprise retention terms.

Fable 5 is plausibly the strongest public coding model available today. But "plausibly" is doing real work in that sentence, and the retention clause is not plausible, it's contractual. Microsoft read the terms and waited. So should your default route.

Sources

Frequently asked questions

Is Claude Fable 5's 80.3% SWE-Bench Pro score independently verified?

No. The 80.3% figure is Anthropic-stated, and as of June 11, 2026 the system card has not been posted. Independent trackers like Vellum report 95% on SWE-Bench using multi-shot agentic harnesses, a gap driven by scaffolding rather than the model. Treat 80.3% as the vendor's conservative floor, not community consensus.

Does Claude Fable 5 really require 30-day data retention with no opt-out?

Yes. Anthropic's launch materials state that Fable 5 and Mythos 5 require 30-day retention on all traffic to operate safety classifiers, overriding prior zero-data-retention contracts. This is confirmed by The Register, TechCrunch, and the GitHub Copilot changelog. Anthropic says the retained data is not used for training.

What does the Fable 5 safety classifier do to coding-agent traffic?

It routes prompts flagged for cybersecurity, biology/chemistry, or capability distillation to Claude Opus 4.8 instead of Fable 5. Anthropic states this affects under 5% of sessions. False positives are documented, including a GitHub issue on authorized defensive security audits, so harnesses should detect and log the reroute.

How much does Claude Fable 5 cost compared to Opus 4.8?

Fable 5 lists at $10 per million input tokens and $50 per million output tokens, exactly double Opus 4.8's $5/$25. A typical 500K-in/200K-out agent session costs $15 on Fable 5 versus $7.50 on Opus 4.8. Fable 5 is free on paid Claude plans through June 22, 2026.

Should I switch my coding agent's default model to Fable 5 now?

Not as a wholesale default. The defensible week-one move is to keep Opus 4.8 as the production default, use the free window through June 22 for internal evals, and route only long-horizon engineering tasks to Fable 5. Pin security, biology, and regulated workloads to Opus 4.8 directly.