What is agentic coding?
The vendor pages define it in marketing. The conference talks define it in vibes. This is the operator's definition — with the receipts that earned it: a 91,874-line production refactor, an 18-minute migration that ran while the coffee brewed, and the $1,847 bill that taught me where the fences go.
Everything here is sourced from this book's chapter files — the builds, the loops, and the failures are documented runs, not hypotheticals. Start at the learning page if the words "context window" are new to you.
Jump to section tap to open
The one-sentence definition
Agentic coding is delegating a coding goal to an AI model that plans, edits files, runs commands, and verifies its own work across many steps — while you review outcomes, not keystrokes.
The load-bearing clause is the last one. Plenty of tools edit files. The line that separates agentic coding from everything before it is who holds the loop: the model runs the tests, reads the failure, fixes the code, and runs them again — and you show up at the outcome boundary, not at every keystroke. Your unit of approval moves from "this line" to "this result."
The second load-bearing clause is verifies. An agent that edits without checking its own work isn't agentic coding, it's autocomplete with a longer leash. The whole discipline — and most of this page — is about what "verified" means and who defines it.
Agentic coding vs vibe coding vs autocomplete
Three terms, routinely blended, naming three different contracts between you and the machine.
| Autocomplete | Vibe coding | Agentic coding | |
|---|---|---|---|
| Unit of work | the next line | a session you steer | a goal you define |
| Your role | typist | driver, in the loop | reviewer of outcomes |
| Feedback loop | per keystroke | per prompt | per verification gate |
| Where it lives | your editor | chat + terminal | terminal, CI, cron, loops |
| When it fails | you ignore a suggestion | you lose a Saturday | it loops — and bills you |
Vibe coding isn't the opposite of agentic coding — it's the supervised mode of it. Ch 23 logs a real one hour by hour: eight hours, $72 in tokens, six bugs hit and fixed, one shipped pipeline, the operator present the whole day. The agent did the editing and the running; the human did the steering and the scope-cutting. Same machinery as the autonomous runs below, different hands on the wheel.
Autocomplete is the term people upgrade from. The mental model that breaks first: with autocomplete, a wrong suggestion costs you a glance. With an agent, a wrong assumption compounds across forty turns before you read any of them. That asymmetry is why the rest of this page is about verification, not about prompting.
What it looks like in practice
The big one — a 90k-line production refactor. Ch 43 documents an agentic run pointed at a shipping product with one constraint: simplify, follow the design system. It deleted a net 91,874 lines across 718 files — 243 commits, 106 of them beginning with "Simplify," 27 of them hardening commits — behind real build, CI, browser, and API checks. The detail that matters more than the line count: mid-run, watching the diff balloon, the operator typed a checkpoint order — test the build, commit, push if no regressions. Autonomous does not mean unsupervised. The loop was good; it still needed a hand on the wheel.
The small one — an 18-minute migration. Ch 38 is the same pattern at desk scale. One command — a goal with a measurable finish line and a turn cap — and the agent swept a deprecated model id out of a repo: twelve turns, eighteen minutes, branch pushed, tests green, $3.12 of model spend plus $0.04 of evaluator spend judging "done" after every turn. The operator made coffee. That's the contract in miniature — the human defined done, a model checked done, the agent ran until done.
The frontier-scale one. Anthropic's Fable 5 launch carried the receipt that shows where this ceiling is moving: Stripe ran a 50-million-line Ruby codebase migration in one day — scoped at two months by hand. Vendor-curated, so apply the usual discount, but the shape matches what the two runs above show at smaller scale: the constraint on agentic coding is no longer the model's stamina, it's your verification. More on the Fable 5 use-cases page.
What you need to do it
Four things. None of them is a prompt technique.
- 1 — A harness with tool access. An agent that can't run your tests can't verify anything. Claude Code is the one this book runs — the sous chef of the five-tool stack in Ch 2 — but the requirement is generic: file edits, shell, git. The honest comparison with the editor-shaped alternative lives at Claude Code vs Cursor.
- 2 — A context file. The agent reads it before every edit: stack, conventions, and — highest leverage per line — what not to build. The Ch 23 build credits its "Don'ts" section with killing rabbit holes before they started. The rules that survived contact are at CLAUDE.md rules.
- 3 — A verification gate. The finish line has to be checkable without you: tests pass, lint clean, build green, a grep that returns 0. Ch 38's rule is the whole game — if you can't measure done, you can't run until done. Vague goals don't converge, they bill.
- 4 — A scale plan. One agent is the unit; the leverage compounds when you fan out. The Ch 43 refactor wasn't one agent — it was a bench of named explorers, each reading one slice of the surface (Ch 6 is the reference). The standalone guide is /swarms.
Workflow discipline — plan first, small verified layers, checkpoint commits — is its own page: Claude Code best practices.
Where it breaks
A definition page that skips the failure modes is an ad. Ch 28 is six documented failures with six bills; here is the one that defines the category's sharp edge. A subagent hit an "ambiguous" tool result and its decision tree said retry with a longer prompt. The longer prompt produced a longer ambiguous result. 1,400 calls in a tight cycle, each a few cents bigger than the last, no per-call threshold tripped — $1,847 in eleven hours, masked from every heartbeat check. The aggregate didn't have a threshold. That's the agentic failure shape: not wrong code, but a loop nobody was watching.
The cheaper version of the same lesson, from Ch 38: a goal of "investigate until you find the cause" burned $11 across forty-one turns and learned nothing, because the finish line wasn't measurable and the evaluator kept saying "not yet." The successful run and the failed run used the same command. The difference was whether done was checkable.
FAQ
Is agentic coding the same as vibe coding?
No. Vibe coding is one way to drive agentic tools — you stay in the loop hour by hour, steering a conversational build (a real one: eight hours, one shipped pipeline, operator present the whole day). Agentic coding is the broader category: the model works across many steps toward a goal, and the loop can run with you present (vibe coding) or without you (run-until-done loops, swarms, CI agents).
Is agentic coding production-ready?
With fences, yes. A real agentic run deleted a net 91,874 lines across 718 files of a shipping product and pushed to production — behind build, CI, browser, and API checks, with a human steering mid-run. Without fences, the same machinery produced a $1,847 bill in eleven hours from one undetected loop. Production-ready is a property of your verification harness, not of the model.
Do I need Claude Code for agentic coding?
No. Claude Code is the reference implementation in this playbook — the sous chef of a five-tool stack — but Codex, Cursor, and other harnesses run the same pattern. What you need is non-negotiable regardless of vendor: terminal-level tool access, a context file the agent reads before every edit, and a verification gate that decides "done" without you.
What's the difference between agentic coding and autocomplete?
Autocomplete predicts the next tokens inside your editor while you type — you remain the unit of work. Agentic coding takes a goal and works the repo: it plans, edits files, runs tests, reads the failures, and iterates. The unit of work is the finished task, not the next line. Different tool, different contract, different failure modes.
What model is best for agentic coding?
On Anthropic's June 2026 launch table, Claude Fable 5 leads the agentic-coding rows — SWE-Bench Pro 80.3% vs Opus 4.8's 69.2%. But a benchmark is a signal, not proof: this playbook's standing rule is to discount public scores 10-15 points and pair every external benchmark with a private eval on your own workload. Run that eval before you commit a budget.