Dynamic Workflows

Claude writes the plan, runs the swarm, and checks its own work.

The headline of Opus 4.8 isn't the weights — it's a new way of working. You hand Claude a task too big for one conversation. It writes a script that breaks the work into subtasks, fans out up to 16 agents at a time, has other agents try to break what the first ones built, fixes what they find, and comes back with one answer. The proof point already on the table: Jarred Sumner ported Bun from Zig to Rust — 750,000 lines, 11 days, 99.8% of the test suite still green.

The Playbook already has the swarm you run by hand. This is the swarm that runs itself — the plan moves into code, the orchestration happens outside your context, and work that used to be scoped in quarters lands in days. This page is what it is, how the generator→validator loop actually works, how you turn it on, where I point it across the portfolio, and the part nobody writes down — when not to.

Jump to section tap to open

What it is

A dynamic workflow is a JavaScript script Claude writes to orchestrate subagents at scale. You describe the task; Claude writes the script; a runtime executes it in the background while your session stays responsive. That last part is the whole trick — the plan moves into code. The script holds the loop, the branches, and the intermediate results, so your context window only ever sees the final answer instead of every agent's scratch pad.

That's the line between this and what came before. With subagents and skills, Claude is the orchestrator — it decides turn by turn what to spawn, and every result lands back in its context, eating the window. A workflow is a script the runtime executes: dozens to hundreds of agents per run, resumable mid-session, with the coordination happening outside the conversation so the plan stays on track no matter how big the task gets. Opus 4.8 shipped it as a research preview — needs Claude Code v2.1.154+, runs on every paid plan (on Pro you flip it on in /config).

Agent teams vs dynamic workflows

Anthropic shipped two shapes of parallelism, and the difference is who writes the org chart. Agent teams are a roster you define up front — one session as lead, then named roles: Frontend Specialist, Backend Engineer, Quality Engineer, each with a brief. That's the right tool when the work decomposes cleanly into domains you can name before you start.

Dynamic workflows are for when you can't name the decomposition yet. Claude writes the org chart itself: how many agents to spawn, how to split the work, when to run an adversarial verification pass, and when the results have converged enough to stop. Per task it's an implementer, then verifiers, then a fixer — fanned out across as many tasks as the job needs.

Two shapes of parallelism: an agent team is a roster you define; a dynamic workflow is a decomposition Claude writes — implementer → verifiers → fixer, fanned across N tasks. House rendering of Anthropic's announcement diagram. N can be in the hundreds.

The validator loop — the part that matters

Here's the part that took me a beat to feel: the value isn't that a pile of parallel agents burns more tokens. It's the loop. One set of agents makes the change — code, refactor, tests. Another set tries to break it — reads the diff, hunts for errors, thinks through the edge cases. Generate, then validate, then fix what validation caught. It's a little like a GAN, only pointed at an engineering workflow instead of pixels.

Because the plan lives in the script, a workflow can apply a repeatable quality pattern instead of just running more agents: have independent agents adversarially review each other's findings before anything is reported, or draft a plan from several angles and weigh them against each other. One request can become several workflows in a row — one to understand the code, one to make the change, one to verify it. The model stops being a single answer and becomes an orchestration that has already argued with itself before it reaches you.

The mechanics of orchestrating parallel agents by hand — the wave pattern, the between-wave audit, the skill shelf — live in The Swarm. Dynamic workflows is that, automated: you get the chain of command without writing the dispatch prompts yourself.

Turning it on — the /effort dial

Three ways in. Drop the word workflow anywhere in a prompt and Claude writes one for that single task. Run the bundled /deep-research to watch one work end to end. Or set the dial: /effort runs low → medium → high (the 4.8 default) → xhigh → max, and then a separate notch, ultracode — xhigh reasoning plus standing permission to orchestrate workflows. With ultracode on, Claude plans a workflow for every substantive task in the session instead of waiting for you to ask.

So ultracode is the on-ramp: it's how you tell Claude "you decide when this job is big enough to fan out." It's session-only — it resets when you start fresh, and you drop back with /effort high when you return to routine work. (The full notch-by-notch table is in the reference below.)

The /effort slider, set to high — the default on Opus 4.8. low → max sets reasoning depth; ultracode adds workflow orchestration on top.

The full /effort menu — low through max, then ultracode (xhigh + workflows) at the smart end. ultracode is session-only. It's the switch that lets Claude plan a workflow per task.

The dial in motion — dragging up to ultracode. Past this point, a single request can turn into several workflows in a row.

My read, after actually running it

I gave it a real task and it went away for forty minutes — wrote code, fixed its own errors, checked itself, kept going. It's one of the first "agent swarm" features that reads like a working tool instead of a demo. You stop typing prompts and start handing off jobs.

The cherry, for me, isn't the parallelism — it's the generator→validator cycle I described above. One half builds, the other half tries to prove it wrong. The tests it writes aren't always ones I'd trust yet, but it already catches its own mistakes at a rate that changes how much I have to babysit. That's the unlock.

My one complaint going in was the planning stage — it felt like "here's the task, go," when what I wanted was: Claude proposes a plan, I edit it, I add constraints and success criteria and the files that matter, then it runs. Turns out that's mostly already there. Before a run, Claude Code shows the planned phases and lets you View raw script, hit Tab to adjust the prompt, or Ctrl+G to open the script in your editor before you approve it. The gap between "go do it" and "let me shape the plan first" is narrower than it felt on day one — you just have to know the approval screen is the seam.

Verdict: the direction is very strong. This is the first version of "let the swarm run itself" I'd actually put on a real codebase.

Where I point it

A workflow earns its cost on a specific shape of job: one with a plan worth writing and branches worth running in parallel. Here's where it goes across the portfolio.

Belkins · the gnarly refactor

The multi-file refactor that used to eat a sprint — the kind where the risk isn't writing the code, it's holding twelve files in your head at once. A workflow plans it, fans out across the files, runs the verifier pass, and comes back with a diff I can read top to bottom. I still read every line. It does the holding-in-its-head part I'm bad at after the third coffee.

Folderly · the audit sweep

Point a workflow at a batch of sending-domain findings and it organizes the collection, runs the analysis in parallel, cross-checks the results, and synthesizes a client-ready report — a fan-out over many items with a reconcile at the end, exactly what the script form is built for. A day of copy-paste-and-format becomes a review pass over something already assembled.

The Newsletter · research, then I take the pen

A workflow for the research-and-structure pass — pull the sources, cross-check them against each other, argue the angles, lay out the skeleton. Then I take the voice pass myself, because the one thing the swarm can't do is sound like me. It does the work that parallelizes; I do the work that doesn't.

The pattern under all three: a workflow pays off when the job has a plan worth writing and branches worth running in parallel. A one-file fix doesn't. A twelve-file decision — or a 750,000-line port — does.

What Anthropic says

I run it the way an operator runs it. Anthropic ships it with their own framing — worth reading next to mine.

The short version: they built it for hard, multi-step, verify-as-you-go work at a scale one conversation can't hold. That's the same place I land — I just have the receipts and a "when not to" section they'd never write.

When not to run a workflow

The fastest way to look like you don't understand the feature is to fire a workflow at everything. It spawns many agents, so a single run uses meaningfully more tokens than working the same task in conversation — and most of the day doesn't need it.

Don't run it on simple tasks. A rename, a one-line fix, "what's the syntax for X" — a workflow will plan, fan out, and reconcile a job that wanted a single edit. You pay for a swarm to convene on an answer a single turn would have streamed back before you finished reading the question.

Mind the seams. A run takes no mid-flight input — only agent permission prompts can pause it, so if you need sign-off between stages, run each stage as its own workflow. And add the shell commands and MCP tools the agents need to your allowlist before you start, or a long run stalls on a prompt mid-flight.

The operator move: default to high effort, reach for a workflow when you can name the plan and the branches. If you can't name them, you don't need one yet.

Reference

The lookup tail — who holds the plan, the /effort notches, and the limits — so you can come back and skip the prose.

Subagents vs skills vs workflows

	Subagents	Skills	Workflows
What it is	a worker Claude spawns	instructions Claude follows	a script the runtime executes
Who decides next	Claude, turn by turn	Claude, per the prompt	the script
Intermediate results	Claude's context	Claude's context	script variables
Scale	a few per turn	a few per turn	dozens to hundreds per run
Interruption	restarts the turn	restarts the turn	resumable in-session

The /effort dial

Notch	What it's for	Persists?
`low`	throwaway one-liners, syntax lookups, subagents	saved
`medium`	light edits, quick questions, cost-sensitive work	saved
`high` (default)	the daily driver — edits, reviews, focused features	saved
`xhigh`	traps — multi-file refactors, risky migrations, >30-min agentic work	saved
`max`	one model thinking as hard as it can; prone to overthinking	session only
`ultracode`	xhigh + workflows — Claude plans a workflow per substantive task	session only

Limits: up to 16 concurrent agents, 1,000 per run, no mid-run input, resumable in-session. Research preview — requires Claude Code v2.1.154+, on all paid plans (Pro: enable in /config). Opus 4.8 itself: $5/$25 per M tokens, 1M context auto on Max/Team/Enterprise, agentic coding 64.3% → 69.2%, and 4× less likely than 4.7 to let code flaws pass. A workflow run uses substantially more tokens than the same task in conversation.

Do this Monday

Don't take my use cases. Take the feature. Tomorrow, pick the gnarliest thing on your list — the migration, the codebase-wide audit, the spec you've been avoiding — and start the prompt with the word workflow. When the approval screen comes up, don't just hit yes. Read the planned phases. Open the script. Adjust the prompt. Then let it run, and go do something else for forty minutes.

You're not learning a command. You're learning where the seam is between "go do it" and "let me shape the plan first" — and once you've felt the validator loop catch its own mistake, you won't want to refactor a big thing by hand again.