Model file · Fable 5 / Mythos 5

Fable 5 use cases — what it's for, with receipts.

Anthropic's launch page ships six customer receipts. All six are vendor-curated — the vendor picked the names, the vendor reported the numbers. That doesn't make them worthless; it makes them signals. This page reads each one honestly, then translates them into the patterns this book already runs.

Receipts sourced from Anthropic's announcement. The model file lives on the Fable 5 hub; the full table on the benchmarks page.

Jump to section tap to open

The 30-second answer

Fable 5 is built for complex, long-running work: huge codebase migrations (Stripe: 50 million lines of Ruby in one day), long-horizon agents, knowledge work (GDPval-AA 1932), and science. Every launch receipt is vendor-curated — treat them as signals, then run your own eval in the free June 9–22 plan window.

The launch receipts, read honestly

Six receipts shipped with the announcement. Here's each one, with the read Anthropic's copy won't give you. Ch 24's rule applies to customer quotes the same as it applies to benchmark rows: a receipt is a signal, a private eval is the proof.

  1. Stripe — 50 million lines of Ruby, migrated in one day. Scoped at two months by hand.

    The read: the strongest cost-per-task receipt any model launch has shipped — not a demo, a migration with a by-hand baseline attached. Also: one company, one codebase, vendor-reported. The Ch 29 math on this lives on the pricing page.

  2. Cursor + Cognition — the coding-tool vendors agree. Cursor calls Fable 5 the "state of the art model on CursorBench". On Cognition's FrontierCode, it scores highest among frontier models at medium effort.

    The read: two companies whose product economics depend on picking the best model picked the same one. "At medium effort" is the detail worth keeping — the top score didn't need the max dial.

  3. GitHub — the long-horizon quote. Praised complex, long-horizon coding tasks with "a level of autonomy and reliability that exceeded previous benchmarks".

    The read: "long-horizon" is the load-bearing phrase. It matches the Claude Code banner positioning, and it's the axis where the table gap is widest — FrontierCode Diamond (xhigh effort) at 29.3% vs Opus 4.8's 13.4%.

  4. IMC — the trading desk. Fable 5 "aced their trading-analysis evaluations nearly across the board".

    The read: the first non-coding receipt, and the most instructive one — IMC ran private evals, which is the Ch 25 discipline. "Nearly" is the honest word in the quote; keep it.

  5. Drug design — around 10x. Anthropic-internal protein experts accelerated aspects of the drug-design process by around 10x.

    The read: the most caveated receipt on the page — internal experts, "aspects of", "around". Anthropic wrote those hedges themselves. A real signal for science workloads, not a number to put in a deck.

  6. Genomics — autonomous research, but read the model name. Mythos 5 conducted autonomous research identifying cell roles across 138 animal species and trained a custom ML model 100x smaller than a recent Science-journal publication's. Scientists preferred its hypotheses ~80% of the time in blinded comparisons against Opus-class models.

    The read: the most impressive receipt of the launch — and it isn't Fable 5.

The two demos — watch the long horizon, don't take it on faith

Benchmarks can be reward-hacked; a timelapse anyone can watch end to end can't. Anthropic shipped two with the launch, and they're the most honest evidence on this page — you can verify them with your own eyes.

Pokémon FireRed, vision only. Previous Claude models couldn't finish FireRed even with harnesses handing them maps and navigation tools. Fable 5 beat the game on raw screenshots — no maps, no navigation aids, vision only. That's the long-horizon claim in its purest form: an entire playthrough's game state held across a single run with the thinnest possible harness.

Watch alongside
Claude Fable 5 beats Pokémon FireRed using only vision — full timelapse

The solar system, simulated. The second demo has Fable 5 simulate the solar system and predict a solar eclipse — the science-workload counterpart to the genomics receipts above, except this one you can watch.

Watch alongside
Claude Fable 5 simulates the solar system and predicts a solar eclipse

The operator's map — receipts to patterns

Every receipt above maps onto a pattern this book already documents. That's the useful translation: not "Stripe did a cool thing" but "which of my workloads has the Stripe shape".

  1. 1 — Long-horizon refactors and migrations. The Stripe shape. This book ran the same pattern one size down: Ch 43's run deleted a net 91,874 lines across 718 files behind layered verification. Stripe's day is that move at 50M-line scale. If you have a migration scoped in months, this is the workload Fable 5 was priced for.
  2. 2 — Run-until-done loops. "Complex, long-running work" is Anthropic's own banner copy, and it is the Ch 38 pattern verbatim: goal in, evaluator on, agent loops until the condition proves. A model that holds the thread longer is worth more per loop, not per token.
  3. 3 — Swarm orchestration — the conductor seat. Put the expensive model where the judgment is and fan cheap ones out below it (/swarms, /dynamic-workflows). Fable 5 takes the conductor seat; Haiku and Sonnet stay in the sections. The 2× sticker buys one seat, not the whole orchestra.
  4. 4 — Knowledge work — the quiet headline. GDPval-AA reads 1932 vs Opus 4.8's 1890. GDP.pdf — vision knowledge work, no tools — reads 29.8% vs 22.5%. The launch is sold on code; the table says the gain shows up in contracts, decks, and documents too. If you're a non-developer reading this, these two rows are yours.
  5. 5 — Computer use — eval before believing. OSWorld-Verified: 85.0% vs Opus 4.8's 83.4%. An incremental gain, not a leap — if browser-driving agents are your workload, the delta may not survive a 2× price. Measure it.

Where this runs day to day for most operators: Fable 5 in Claude Code — the banner, /model, and the routing call.

What NOT to run on it

  • Cyber- and bio-adjacent work. The safeguard classifiers trip — in the Claude apps the response comes from Opus 4.8 instead; on the API the request errors, unbilled, unless you opted into Opus-priced fallback. Either way you bought Fable and got Opus. Route those workloads to Opus 4.8 directly.
  • Bulk cheap work. Classification, extraction, routine drafts — Haiku 4.5 at $1/$5 and Sonnet 4.6 at $3/$15 didn't get worse on June 9. The tier list stays the live source of truth on routing.
  • Anything you haven't evaled. Fable 5 is included in paid-plan limits June 9–22, 2026 — a free private-eval window (Ch 25). After June 22 the meter runs on usage credits. Eval first, route second.

The receipts say what Fable 5 did for six teams that aren't yours. The free window says what it does for you. One of those is marketing.

FAQ

What is Fable 5 best at?

Complex, long-running work — that is Anthropic's own positioning and the launch receipts back it: Stripe migrated a 50-million-line Ruby codebase in one day (scoped at two months by hand), GitHub praised long-horizon coding autonomy, Cursor calls it state of the art on CursorBench. On the table, SWE-Bench Pro reads 80.3% vs Opus 4.8's 69.2%. All launch numbers are vendor-reported — treat them as signals and run a private eval.

Can Fable 5 do scientific research?

Partially — and the distinction matters. Anthropic-internal protein experts report accelerating aspects of drug design by around 10x. But the genomics receipts (autonomous research across 138 animal species, a custom ML model 100x smaller than a recent Science publication's, hypotheses preferred ~80% of the time in blinded comparisons) belong to Mythos 5 — the gated twin with safeguards lifted by area — today Project Glasswing partners (cyber), next vetted biology researchers (bio/chem). The Fable 5 you can buy blocks most bio/chem requests — in the Claude apps the response falls back to Opus 4.8 and you're told; on the API a blocked request errors, unbilled, with fallback opt-in.

Is Fable 5 good for non-coders?

Yes — the quiet headline of the launch table. GDPval-AA (knowledge work) reads 1932 vs Opus 4.8's 1890 and GPT 5.5's 1769. GDP.pdf, the vision knowledge-work benchmark with no tools, reads 29.8% vs Opus 4.8's 22.5%. The launch is marketed on code; the table says the gain shows up in documents too.

What did Stripe do with Fable 5?

Per Anthropic's announcement, Stripe used Fable 5 to perform a 50-million-line Ruby codebase migration in one day — work scoped at two months by hand. It is vendor-curated like every launch receipt, but it ships with a by-hand baseline attached, which makes it the strongest cost-per-task receipt of the launch.

Where does Fable 5 underperform?

Three places. Cyber- and bio-adjacent work: the safeguard classifiers fall back to Opus 4.8, so on starred benchmark rows Fable 5 performs closer to Opus 4.8 by design — an opt-in fallback on the API bills at Opus pricing — you don't pay the 2× sticker for Opus-grade answers, but you don't get Fable-grade ones either. Computer use: OSWorld-Verified 85.0% vs Opus 4.8's 83.4% is incremental, and Mythos Preview reported 85.4% — higher than the shipped product. Bulk cheap work: Haiku 4.5 ($1/$5) and Sonnet 4.6 ($3/$15) win on cost.

Stay close

The next edition lands when this list says it does.

No course. No paywall. Operator playbooks weekly. 10K+ subscribers.