Tuesday afternoon, the Belkins deal-research workflow finally cracked. Five .claude/scratch/ — had a 40% race-condition rate. Two agents would write to qualified.md at the same time, last-write-wins, half the prospects vanished from the queue. Same prompts, same model, same MCPs. The difference was orchestration.
CC’s subagent system is built for one repo, one task, one human supervising. The moment you need five agents with strict handoff contracts running on a cron, you’re not in a swarm anymore — you’re in a graph. The graph wants a framework. That’s the chapter.
The threshold — when to leave CC#
Three signals. If you hit one, look at a framework. If you hit two, you’re already late.
Signal one: five-plus agents with strict handoff contracts. Up to four agents, CC’s pattern of “each subagent writes to its own file, the parent reads them all” is fine. Past that, the contention isn’t a hypothetical — it’s a Tuesday. The deal-research workflow above hit this; the Belkins onboarding pipeline hit it before that. Five is the hinge.
Signal two: persistent state across days. A CC session is a process. It dies when the terminal closes. If your workflow needs to remember that prospect #47 was qualified on Tuesday, drafted on Wednesday, reviewed Thursday, sent Friday — and if the Wednesday step depends on Tuesday’s output existing somewhere durable — CC isn’t the right shape. You need a state store the agents read from and write to, and you need an orchestrator that survives Ctrl-C.
Signal three: deterministic orchestration that survives a process restart. Cron fires at 3 AM, the runner crashes at 3:04, you come in at 9 — what happens? In CC, the answer is “you start over.” In a framework with a state machine and a checkpoint, the answer is “the workflow resumes at the last completed node.” That’s not a luxury when you’re running customer-facing work overnight; that’s the whole reason you’re doing this.
When you hit two of three, stop adding agents to CC and start drawing the graph.
CrewAI — the handoff pattern#
CrewAI is what I reach for when the workflow shape is “team of specialists, each with one job, results passed down a chain.” It’s good at sequential and hierarchical patterns, weaker at branching state machines. The mental model is a relay race — each agent runs its leg, hands the baton, sits down.
The deal-research workflow, in roughly forty lines:
from crewai import Agent, Task, Crew, Process
researcher = Agent(
role="Company researcher",
goal="Pull recent funding, hiring, and product signal for the target company",
backstory="You read Crunchbase, the company blog, and recent LinkedIn posts.",
tools=[crunchbase_tool, linkedin_tool, blog_scraper_tool],
)
qualifier = Agent(
role="ICP qualifier",
goal="Score the company against the Belkins ICP and decide go/no-go",
backstory="You know the ICP cold — 50-500 employees, B2B SaaS, US/UK, post-Series-A.",
tools=[icp_scorer_tool],
)
drafter = Agent(
role="Outreach drafter",
goal="Write a 3-paragraph outreach email tied to the research signal",
backstory="You write in Vlad's voice — punchy, lowercase, no corporate hedging.",
)
reviewer = Agent(
role="Voice reviewer",
goal="Rewrite paragraph 2 of the draft to land sharper, kill any adverbs",
)
sender = Agent(
role="Send orchestrator",
goal="Queue the approved email through Customer.io with a Tuesday 9 AM send",
tools=[customerio_tool],
)
research_task = Task(description="Research {company}", agent=researcher, expected_output="3-bullet signal brief")
qualify_task = Task(description="Qualify against ICP", agent=qualifier, context=[research_task])
draft_task = Task(description="Draft email", agent=drafter, context=[qualify_task, research_task])
review_task = Task(description="Sharpen voice", agent=reviewer, context=[draft_task])
send_task = Task(description="Queue send", agent=sender, context=[review_task])
crew = Crew(
agents=[researcher, qualifier, drafter, reviewer, sender],
tasks=[research_task, qualify_task, draft_task, review_task, send_task],
process=Process.sequential,
)
result = crew.kickoff(inputs={"company": "Acme Corp"})
The context=[...] parameter is the whole game. Each task declares what it depends on. The framework wires the handoff. There’s no shared scratch file because there’s no shared scratch file — the researcher’s output gets passed to the qualifier’s prompt as a structured field, not as a file the qualifier has to remember to read. That’s the contract CC’s subagent system doesn’t enforce.
What CrewAI is bad at: anything with a loop (“review until score > 8”), anything with conditional branching (“if qualified, draft; if not, log and skip”), anything where the same agent runs multiple times against different inputs in the same workflow. The moment you need that, you’re in LangGraph territory.
LangGraph — the state machine pattern#
LangGraph is what I reach for when the workflow has branches, loops, or conditional routing. It’s a state graph — nodes are agents (or pure functions), edges are transitions, the state is a typed object every node reads and writes. Verbose, more boilerplate, but it survives complex shapes.
The Folderly deliverability-triage workflow, sketched in about thirty lines:
from typing import TypedDict, Literal
from langgraph.graph import StateGraph, END
class TriageState(TypedDict):
domain: str
spam_score: float
blacklist_hits: list[str]
fix_plan: str
severity: Literal["low", "medium", "high"]
def measure_spam(state: TriageState) -> TriageState:
state["spam_score"] = call_postmark_score(state["domain"])
return state
def check_blacklists(state: TriageState) -> TriageState:
state["blacklist_hits"] = call_blacklist_scanner(state["domain"])
return state
def classify_severity(state: TriageState) -> TriageState:
state["severity"] = "high" if state["spam_score"] > 7 or len(state["blacklist_hits"]) > 2 else "medium" if state["spam_score"] > 4 else "low"
return state
def draft_fix_plan(state: TriageState) -> TriageState:
state["fix_plan"] = call_claude_with_state(state)
return state
graph = StateGraph(TriageState)
graph.add_node("measure", measure_spam)
graph.add_node("blacklist", check_blacklists)
graph.add_node("classify", classify_severity)
graph.add_node("plan", draft_fix_plan)
graph.set_entry_point("measure")
graph.add_edge("measure", "blacklist")
graph.add_edge("blacklist", "classify")
graph.add_conditional_edges(
"classify",
lambda s: "plan" if s["severity"] != "low" else END,
)
graph.add_edge("plan", END)
app = graph.compile()
result = app.invoke({"domain": "acme.com", "spam_score": 0, "blacklist_hits": [], "fix_plan": "", "severity": "low"})
The add_conditional_edges line is the move. If severity classifies as low, the graph terminates without burning a draft step. If medium or high, it routes to the planner. That conditional is impossible to express cleanly in CrewAI’s sequential or hierarchical shape — you’d end up with a wrapper script that calls the crew twice with different configs, and now you’re maintaining a wrapper script.
The state object is the second move. Every node sees the same TriageState. There’s no implicit context being passed along — every field is typed and visible. When something breaks at 3 AM, the state object is the first thing you log, and it tells you exactly where the workflow was. That’s the durability story CC subagents can’t tell.
The Anthropic SDK as the floor#
Underneath all of this is the SDK. CrewAI calls the model. LangGraph calls the model. CC calls the model. The model is talking to anthropic.messages.create or openai.chat.completions.create no matter what wraps it. The frameworks are buying you orchestration, not inference.
When the framework gets in the way — when CrewAI’s abstractions don’t fit your shape, when LangGraph’s verbosity costs more than it saves — drop to the SDK direct. See Chapter 30 for the deep dive on anthropic SDK direct, where the patterns and the receipts live. The SDK is the floor. Everything else is a building you put on top of it.
I drop to the SDK about 20% of the time. The other 80% the frameworks are worth their weight, but the 20% is the workflows that didn’t fit any framework’s mental model and were cheaper to write as 200 lines of explicit Python than to bend a framework around.
AutoGen — research-strong, prototype-friendly#
One paragraph because that’s what AutoGen earns at this point. Microsoft Research’s framework, conversational multi-agent shape, strong for prototypes and research demos, weaker for production. It has some of the best abstractions for “agents talking to each other in a structured conversation” — agent-to-agent debate, tool-use loops with human-in-the-loop checkpoints — but I haven’t shipped a customer-facing AutoGen workflow that survived more than a month. The patterns drift, the API churns, the docs lag. Useful as a thinking tool. Reach for CrewAI or LangGraph when you ship.
Build-vs-buy table#
| Framework | What it’s worth | What it costs you | When to leave |
|---|---|---|---|
| Claude Code subagents | One repo, one task, fast prototype, day-driver work | Filesystem races at 5+ agents, no persistence across sessions | Five-plus agents with handoffs, or persistent state |
| CrewAI | Sequential/hierarchical teams, clean handoff contracts, fast to write | No conditional branching, no loops, weak state model | Workflow has branches, loops, or restart needs |
| LangGraph | State machines, branches, durable workflows, restart-safe | Verbose, more boilerplate, steeper ramp | Graph becomes a DAG-of-DAGs, or you need cross-runtime orchestration |
| AutoGen | Research, prototypes, agent-to-agent conversation patterns | API churn, weak prod story, hard to operate | Anywhere you ship to customers |
| Anthropic SDK direct | Full control, no abstraction tax, easiest to debug | You write the orchestration yourself | Pattern is repeatable enough that a framework saves real lines |
The graduation pattern is the move. Don’t pick the framework on day one. Prototype in CC. Promote to CrewAI when the handoff contracts sharpen. Promote to LangGraph when the graph branches. Drop to the SDK when the framework fights the workflow. Each promotion costs roughly a day of refactor — the prompts and the agents survive the transition; the orchestration layer is what gets rewritten. That’s a fair trade because the orchestration layer is the thing you’re optimizing.
Update — May 2026#
This chapter shipped six months before this update and every framework version in it is stale. Here’s what moved, what’s new, and what’s actually worth picking up.
AutoGen → Microsoft Agent Framework 1.0 GA. Microsoft shipped MAF 1.0 on 2026-04-03 and explicitly migrated AutoGen + Semantic Kernel users to it. AutoGen the original repo is now in maintenance mode. The “research-strong, prototype-friendly” paragraph above still describes a useful tool — but if you’re starting today, start on MAF, not AutoGen. The trap to flag up front: MAF is worth adopting if you’re a .NET shop, an Azure shop, or already invested in Semantic Kernel. For everyone else — TypeScript teams, Python teams not on Azure, Anthropic-platform operators — the gravitational pull of MAF will burn weeks on Azure-specific patterns that don’t transfer back to your stack. Skip it. The Anthropic-stack-direct path from Chapter 30 is still the highest leverage per line of code.
OpenAI Agents SDK 0.14 (released 2026-04-15) added a model-native sandbox and a model-native harness — the agent loop moved closer to the model. Subagents and code-mode are documented as coming soon (verify against your version/source). This is OpenAI converging toward the same shape Anthropic has been shipping, which means the SDK-direct thesis in Chapter 30 gets a free second confirmation.
Anthropic Managed Agents went into public beta on 2026-04-08. Anthropic hosts the runtime, error recovery, and execution; you keep writing against the standard anthropic SDK. Memory files followed on 2026-04-23 — persistent memory mounted as /mnt/memory/ inside the agent’s container, readable and writable with the bash + file tools the agent already has, exportable and editable in Console. That solves the “stateless agent” problem for Anthropic-platform shops without bringing in a framework. Pair with adaptive thinking on Opus 4.6 / Sonnet 4.6 (and Mythos, when it ships) — budget_tokens is deprecated; the model decides when and how much to think, with interleaved thinking enabled by default and preserved across turns. The Anthropic stack in May 2026 is materially more capable than the version this chapter described in November.
Vercel AI SDK 6 is the other entry that didn’t exist when I wrote this chapter. 20M+ monthly downloads — that’s not a startup framework; that’s infrastructure. The new piece worth knowing is Workflow DevKit with DurableAgent — a drop-in replacement for the Agent class that gives you pause/resume, crash-safe execution, retries, and step-based observability. For TypeScript product teams running agents inside a Next app, this is the cleanest “agent that survives a process restart” story available. Closer in shape to LangGraph than to CrewAI, but lives natively in a Vercel-shaped deployment.
Mastra 1.0 hit January 2026 — entirely new entry, didn’t exist when the chapter shipped. 22k GitHub stars, ~300k weekly npm downloads, YC W25 graduate. TypeScript-only, agents + workflows + RAG in one stack. The Mastra sweet spot is “TS team that wants the LangGraph state-machine model without the Python ergonomics.” Worth a real look if your team has already standardized on TS and you want one framework instead of three. The reason it earned a row above CrewAI for some teams: opinionated single-language footprint, less abstraction sprawl.
Inngest AgentKit deserves a mention for one specific shape — event-driven shops who already run Inngest for durable jobs and want deterministic agent routing on top. If you’re already on Inngest, the migration cost is near zero. If you’re not, this isn’t the reason to adopt Inngest.
The orchestration consensus#
The hardest-won lesson of the May 2026 ecosystem isn’t a framework choice — it’s a topology choice. Hub-and-spoke wins production, roughly 70% of deployments per multiple framework docs and case studies (Microsoft’s MAF guidance, Anthropic’s research-style multi-agent system, the gurusup orchestration writeup, augmentcode’s guide). Swarm patterns win demos and Twitter threads. Hub-and-spoke wins customer-facing work.
The shape: one orchestrator decomposes the task and dispatches to specialist workers; workers don’t talk to each other; one verifier/critic checks the output before it ships. The reason it wins is debuggability — one control flow to trace, one place where the state lives, one log to read at 3 AM. Microsoft’s own migration guide spells it out: start centralized, decentralize only when a concrete scalability bottleneck appears. Swarm-style peer-to-peer handoff (the OpenAI Swarm pattern, now folded into Agents SDK 0.14) is powerful for parallelism but observability is brutal. My own portfolio rule — 3 to 4 parallel agents per wave, 5+ invites filesystem contention — is the operator-scale version of the same lesson.
Updated framework count#
This chapter previously walked through roughly five frameworks. The May 2026 menu, with what’s actually worth knowing per use case:
| Framework | Best for | Headline number |
|---|---|---|
| Anthropic Claude Agent SDK + Managed Agents | Anthropic-platform builds, agents that need memory and adaptive thinking | Vlad’s primary platform; Memory files GA-beta 2026-04-23 |
| CrewAI | Role-based sequential or hierarchical teams; fastest path from idea to shipped crew | ~47.8k GitHub stars, 12M daily executions, 150+ enterprise customers |
| LangGraph | Known DAGs, durable state, branches and loops, restart-safe production | 1.1.10 + prebuilt 1.0.12 as of April 2026; Klarna, Replit, LinkedIn in prod |
| OpenAI Agents SDK | OpenAI-native code agents with sandbox + native harness | 0.14 released 2026-04-15 |
| Microsoft Agent Framework | .NET / Azure shops, Semantic Kernel and AutoGen migrators | 1.0 GA 2026-04-03 — trap for non-.NET teams |
| Vercel AI SDK 6 + Workflow DevKit | TS product teams running agents inside a Next app | 20M+ monthly downloads, DurableAgent drop-in |
| Mastra | TS-only teams wanting agents + workflows + RAG in one stack | 22k stars, 300k weekly npm, YC W25 |
| Inngest AgentKit | Event-driven shops already on Inngest | Built on durable Inngest infra |
| Anthropic SDK direct | Everything that doesn’t earn the framework tax | Still the floor — Chapter 30 |
The graduation pattern at the top of this chapter still holds — start in CC, leave for CrewAI or Mastra when the contract sharpens, leave for LangGraph or Workflow DevKit when the graph branches, drop to the SDK when the framework fights the workflow. The menu just got longer. Most operators only need three of these in their head: Anthropic SDK direct as the floor, CrewAI for fast crews, LangGraph or Workflow DevKit for state-machines that survive restarts. See the Mythos entry in /research-notes for why the SDK-direct floor matters more, not less, as the next model wave hits.
The mistake I see most often — and made myself, twice — is picking the heaviest framework on day one because it’ll “scale later.” LangGraph for a three-agent linear workflow is a punishment. CrewAI for a single-agent script is a punishment. Claude Code for a five-agent stateful pipeline running on a cron is a punishment. The right framework is the one that matches the shape of the work today, with a one-day promotion path to the next one when the shape changes. The orchestrator you don’t have to maintain is the orchestrator the framework gives you. The orchestrator you wrote yourself is the one you’ll be debugging on a Saturday.