The 12-rule CLAUDE.md

The 12-rule CLAUDE.md.

Karpathy's four rules cut Claude's mistake rate from 41% to 11%. Eight more rules — added by community operators running agent loops, not single-shot autocomplete — pushed it to 3%. Across 30 codebases over six weeks. Community claim, not first-party data, but the structure holds up to the operators I've watched paste it.

Each rule below has the verbatim text and an operator overlay — the receipt where it earned its slot in my own CLAUDE.md, or where its absence cost me. Companion to Ch 37 — Context Files and the research note on the 41% → 11% → 3% claim. The paste-ready file lives at the bottom.

Skip to the paste-ready file → The 3 rules to lead with Read Ch 37 first

11:47 AM Wednesday, February. An engineer in Slack.

I'd already typed the lesson myself the week before, near-verbatim. An engineer on my team pings me:

"Claude stopped formatting commits as conventional commits. I'll spell the convention out in CLAUDE.md."

The agent had been doing the right thing for weeks. Then it stopped. Not because the rule changed. Because the rule was never written down in the layer the model actually reads. It lived in a Slack thread, a PR review comment, somebody's head. The fix — spell it out in CLAUDE.md — is the whole point. The convention has to live in the file the agent reads at session start, not in the team's accumulated muscle memory. Rule 8 (read before you write) is what should have prevented this, but Rule 8 only fires if the convention is in the file. If CLAUDE.md is silent, the agent isn't reading anything to inherit.

That's the wedge. CLAUDE.md is the kitchen rules taped to the wall (see Ch 37 for the four-layer architecture — CLAUDE.md, memory, skills, session-scoped). If your rules aren't in it, the model defaults to whatever it was trained to do, and any operator working in the same repo writes their own version of the convention in their own session. Six months later you have a codebase where no two PRs follow the same pattern, and you're not sure how it happened.

The Karpathy peg — four rules, then eight more

Andrej Karpathy's original CLAUDE.md framing circulated as a public post in May 2026. Four rules: Think Before Coding, Simplicity First, Surgical Changes, Goal-Driven Execution. The headline claim: across 30 codebases over six weeks, mistake rates dropped from 41% to 11%. Community operators then added eight more rules and the rate dropped further — 11% to 3%.

Honest disclosure on the numbers. This is a community claim, not a first-party Anthropic study with a published methodology. Treat the 41% → 11% → 3% as informal evidence of shape, not as a peer-reviewed benchmark. Confidence level is medium. The shape is what matters: rules in the file the model reads on every turn change behavior in measurable ways, and the marginal rule still pays after the first four.

Here's the wedge most operators miss. The original four rules are autocomplete-flavored — they assume a single-turn completion where the model writes some code and a human reviews. They cover the inner loop of writing. The eight added rules cover what operators actually run in 2026 — agent loops, multi-step refactors, token budgets, silent failures, codebase-convention drift across long sessions. That's not a slight on the original four. It's the difference between IDE-assisted coding and full-loop delegation via /goal (Ch 38). If you're running Plan → Auto → /goal you need all twelve. If you're hitting tab-tab inside Cursor, the four will hold.

The 12 rules, with operator overlay

Each rule below: verbatim text from the source, then a one-to-three-sentence overlay with my specific receipt — where it earned its slot, or where its absence cost me. Pasting rules without the receipts is how they rot. Read the receipts first.

Rule 1 — Think Before Coding

State assumptions explicitly. Ask rather than guess. Push back when a simpler approach exists. Stop when confused.

Operator overlay

The operators I've watched skip this are the ones who say "just have Claude do it" on a feature they haven't architected. The agent guesses an architecture, ships it, and they spend three days reverse-engineering what their own repo became. Ch 37 says CLAUDE.md is for stable rules, not workflows — Rule 1 is the meta-rule that buys you the right to have stable rules in the first place. Without it, you don't have conventions; you have a sequence of unique guesses.

Rule 2 — Simplicity First

Minimum code that solves the problem. Nothing speculative. No abstractions for single-use code.

Operator overlay

Abstractions are expensive when you're shipping a weekend build in six hours (Ch 19). The agent's default is to factor everything — class hierarchies, config layers, retry middlewares. For a one-off product you'll ship, screenshot, and either kill or scale, every abstraction is debt before it earns its slot. Rule 2 saves the second weekend you'd otherwise burn unwinding a "reusable" helper that got used once.

Rule 3 — Surgical Changes

Touch only what you must. Don't improve adjacent code. Match existing style. Don't refactor what isn't broken.

Operator overlay

Six failures, six bills (Ch 28) — half of those receipts traced back to a "while I'm in there" refactor the agent slipped into an unrelated PR. The hook-on-keystroke disaster started as "improve the autosave behavior while you're editing the linter config." Surgical changes is the rule that keeps a one-line fix from becoming a 200-line PR. If the agent's diff is twice the size of your ask, the diff is wrong even when each line looks right.

Rule 4 — Goal-Driven Execution

Define success criteria. Loop until verified. Strong success criteria let Claude loop independently.

Operator overlay

This rule is literally the /goal primitive from Ch 38, written into the rule-file layer. /goal shipped in Claude Code v2.1.139 on May 11, 2026; my first run with it cleared a model-bump refactor in 12 turns and $3.16, my second run looped 41 turns on a vibe-eval and burned $11 to learn nothing. The difference was a measurable success criterion. Rule 4 is what makes the agent capable of running until done; without it, every session ends in "is this what you wanted?" and you're back to per-turn approval.

Rule 5 — Use the model only for judgment calls

Use for: classification, drafting, summarization, extraction. Do NOT use for: routing, retries, status-code handling, deterministic transforms. If code can answer, code answers.

Operator overlay

The cost-economics chapter (Ch 29) and the skills chapter (Ch 5) both land on the same line: code routes, model writes. The skill is the runbook — the deterministic glue. The model is the judgment call inside the runbook. Letting the model decide whether to retry on a 503, what timeout to use, or which branch of an if statement to take is how a stable workflow becomes a random-walk workflow. The origin story in the source — "the model read the request body as context for the retry decision; the policy became random" — is exactly the pattern I've watched eat three production skills before I learned to wall it off.

Rule 6 — Token budgets are not advisory

Per-task: 4,000 tokens. Per-session: 30,000 tokens. If approaching budget, summarize and start fresh. Surface the breach. Do not silently overrun.

Operator overlay

Last winter I put my entire vault-update workflow into CLAUDE.md. 340 lines. The agent read all 340 lines on every turn of every session — including unrelated TypeScript debugging in a different repo. The token bill on the project doubled in a week (Ch 37 has the receipt). Pair this with the OPS-204 research note on long-edit-chain drift (research-notes) — both are the same underlying claim: token budgets aren't just a cost question, they're a quality question. Past the budget, attention triages badly and the relevant rule gets buried.

Rule 7 — Surface conflicts, don't average them

If two patterns contradict, pick one (more recent / more tested). Explain why. Flag the other for cleanup.

Operator overlay

Ch 37 walks the hierarchy of authority — explicit user message beats skill, skill beats CLAUDE.md, CLAUDE.md beats memory, memory beats model defaults. Rule 7 is what happens inside the CLAUDE.md layer when two of your own rules disagree. The origin story — "Claude blended two error-handling patterns; errors got swallowed twice" — is the rule-file version of the conflicting-skill-vs-CLAUDE.md trap. Pick the more recent pattern, name the one you're deprecating, leave a one-line breadcrumb so the next operator doesn't re-add it.

Rule 8 — Read before you write

Before adding code, read exports, immediate callers, shared utilities. If unsure why existing code is structured a certain way, ask.

Operator overlay

This is the regression in the cold open above. The agent stopped formatting commits as conventional commits because it hadn't read what the convention was — there was no convention in the file the agent reads. Rule 8 is the most load-bearing of the 12 in my experience: roughly half of every regression I've debugged in portfolio repos traces back to the agent writing without reading the existing pattern first. If you only adopt three rules from the 12, this is one of them. (See the 3-rules section below.)

Rule 9 — Tests verify intent, not just behavior

Tests must encode WHY behavior matters, not just WHAT it does. A test that can't fail when business logic changes is wrong.

Operator overlay

The origin story — "12 tests passed, auth was broken in production, the tests verified the function returned something, the function returned a constant" — is the same shape as my friday-wrapup skill silently shipping a $0-pipeline canvas for nine days (Ch 25). The skill ran. The tests would have "passed." The artifact was broken. Rule 9 is evals-or-hope written into the rule-file layer: a test that can't fail when the business logic changes is decoration, not protection. Pair this with Rule 12.

Rule 10 — Checkpoint after every significant step

Summarize what was done, what's verified, what's left. Don't continue from a state you can't describe back. If you lose track, stop and restate.

Operator overlay

Rule 10 is the multi-step pair to Rule 4 — Rule 4 defines what done looks like; Rule 10 keeps the agent honest about where it is on the way there. The /goal primitive in Ch 38 gives you a turn counter and a token meter, but the meter is structural — Rule 10 is what keeps the per-turn output describing the state in plain words. The origin story — "a 6-step refactor went wrong on step 4; Claude completed steps 5 and 6 on top of the broken state" — is exactly why the per-turn checkpoint is non-negotiable in any session longer than 10 turns.

Rule 11 — Match the codebase's conventions, even if you disagree

Conformance > taste inside the codebase. If you think a convention is harmful, surface it. Don't fork it silently.

Operator overlay

The team-adoption chapter (Ch 26) covers what happens when one operator's CLAUDE.md silently disagrees with the team's pattern — the agent ends up forking the convention in PRs nobody reviews carefully, and three sprints later half the repo is one shape and half is the other. The origin story — "Claude introduced React hooks into a class-component codebase; they worked, they broke the testing patterns" — is the team-adoption failure at the file level. Conformance is the price of working in someone else's repo. If you have a better idea, write it in the PR description, not the diff.

Rule 12 — Fail loud

"Completed" is wrong if anything was skipped silently. "Tests pass" is wrong if any were skipped. Default to surfacing uncertainty, not hiding it.

Operator overlay

"Completed" is the most expensive word in operator-Claude. The origin story — "database migration reported completed successfully; had skipped 14% of records on constraint violations; found 11 days later" — is the same shape as Anthropic's 81k-respondent study landing unreliability as the #1 concern at 26.7% (the highest single number in the whole study). The friday-wrapup canvas, the OPS-204 doc-drift, the migration that ate 14% of records — they're all the same failure: silent success. Rule 12 is the eval-thesis of Ch 25 written into the rule-file. If you adopt one rule from the 12, this is it.

The 3 rules Vlad would lead with

If a reader can adopt only three of the 12 today, which? The other nine amplify if these three land. The other nine are also strictly less load-bearing if these three don't.

Rule 8 (Read before you write). Roughly half the regressions I've debugged across portfolio repos trace back to the agent writing without reading the existing pattern first. That conventional-commits regression is one example; I have a folder of others. Rule 8 is the cheapest possible defense against "the agent wrote a function next to an identical one it hadn't read." Cost to enforce: 60 seconds of agent latency per task. ROI: most of the silent regressions you'd otherwise debug at 11 PM.
Rule 12 (Fail loud). The silent-failure category is the single most expensive class of bug in operator-Claude — friday-wrapup at $0 pipeline for 9 days, the wrong-vault-write for 9 days, the migration that skipped 14% of records. All of them shipped "completed" messages. Rule 12 doesn't eliminate the failure mode; it eliminates the silence around it. Cost to enforce: occasional false-positive surfacing. ROI: catch the failure on day zero instead of day nine.
Rule 4 (Goal-driven execution). Pairs with /goal in Ch 38 — same primitive, different surface. Without measurable success criteria, every session ends in "is this what you wanted?" and the autonomy ladder collapses to per-turn approval. Cost to enforce: a few minutes thinking about what "done" looks like before you start. ROI: the difference between 12 turns to a clean diff and 41 turns to a vibe-eval that never lands.

Adopt these three this week. Add the other nine over the next month, one rule per Monday, with a real receipt attached. Rules without receipts rot.

Role-specific layering — which rules are load-bearing for which role

The 12 rules are universal. The four role-specific CLAUDE.md skeletons in /resources (CLAUDE_MD_SOLO, CLAUDE_MD_B2B_SALES, CLAUDE_MD_NEWSLETTER, CLAUDE_MD_PORTFOLIO_CEO) layer the role-specific conventions on top. Different roles lean on different rules from the 12:

Solo operator (CLAUDE_MD_SOLO). Lean on Rule 2 (Simplicity First) and Rule 4 (Goal-Driven). You're shipping fast across multiple weekend builds. Abstractions are the enemy. Measurable success criteria are how you avoid building the wrong thing for three days before noticing.
B2B sales lead (CLAUDE_MD_B2B_SALES). Lean on Rule 5 (Model only for judgment) and Rule 12 (Fail loud). Routing, stage filters, CRM writes are deterministic — they belong in code, not the model. And every report the model produces about pipeline lands in front of leadership; "completed" with an empty deal list is the failure mode that costs the most.
Newsletter operator (CLAUDE_MD_NEWSLETTER). Lean on Rule 6 (Token budgets) and Rule 11 (Match conventions). Your voice is the convention; drift is the failure. Long sessions on a published draft are exactly the OPS-204 drift case (research-note) — token budgets keep the prose from corrupting silently.
Portfolio CEO (CLAUDE_MD_PORTFOLIO_CEO). Lean on Rule 8 (Read before write) and Rule 10 (Checkpoint). Across five companies, the agent's job is mostly synthesis across an inbox you can't read end-to-end — reading before writing is the difference between a useful brief and a confident hallucination. Checkpoint per step is how you trust the multi-step rollup.

The paste-ready file

The full 12-rule CLAUDE.md, copy-pasteable. Replace the bracketed sections with your project's identity, commands, and never-do list. Keep the rule bodies as-is until you've personally seen one of them rot.

CLAUDE.md — paste me

# CLAUDE.md — the 12-rule baseline

# Identity (one paragraph — what this project is, who it serves, the stack)
[fill in: 2-3 sentences. no marketing copy. the model doesn't care that it's exciting.]

# Rule 1 — Think Before Coding
- State assumptions explicitly. Ask rather than guess.
- Push back when a simpler approach exists. Stop when confused.

# Rule 2 — Simplicity First
- Minimum code that solves the problem. Nothing speculative.
- No abstractions for single-use code.

# Rule 3 — Surgical Changes
- Touch only what you must. Don't improve adjacent code.
- Match existing style. Don't refactor what isn't broken.

# Rule 4 — Goal-Driven Execution
- Define success criteria. Loop until verified.
- Strong success criteria let the agent loop independently.

# Rule 5 — Use the model only for judgment calls
- Use for: classification, drafting, summarization, extraction.
- Do NOT use for: routing, retries, status-code handling, deterministic transforms.
- If code can answer, code answers.

# Rule 6 — Token budgets are not advisory
- Per-task: 4,000 tokens. Per-session: 30,000 tokens.
- If approaching budget, summarize and start fresh.
- Surface the breach. Do not silently overrun.

# Rule 7 — Surface conflicts, don't average them
- If two patterns contradict, pick one (more recent / more tested).
- Explain why. Flag the other for cleanup.

# Rule 8 — Read before you write
- Before adding code, read exports, immediate callers, shared utilities.
- If unsure why existing code is structured a certain way, ask.

# Rule 9 — Tests verify intent, not just behavior
- Tests must encode WHY behavior matters, not just WHAT it does.
- A test that can't fail when business logic changes is wrong.

# Rule 10 — Checkpoint after every significant step
- Summarize what was done, what's verified, what's left.
- Don't continue from a state you can't describe back.
- If you lose track, stop and restate.

# Rule 11 — Match the codebase's conventions, even if you disagree
- Conformance > taste inside the codebase.
- If you think a convention is harmful, surface it. Don't fork it silently.

# Rule 12 — Fail loud
- "Completed" is wrong if anything was skipped silently.
- "Tests pass" is wrong if any were skipped.
- Default to surfacing uncertainty, not hiding it.

# Commands (the actual ones, copy-pasteable)
- [pnpm dev / pnpm lint / pnpm test / pnpm typecheck]

# Never-do list (with receipts — replace with your own)
- Don't [X] — broke [Y] on [date].
- Don't [X] — same reason.

# Linked deep docs (the agent has Read — it can pull these)
- docs/[your-arch].md
- docs/[your-roadmap].md

Total length: ~70 lines including the identity paragraph and the commands block. Under the 100-line ceiling Ch 37 recommends. Past 150 lines, the prompt cache gets voided more often and the relevant rule gets buried in noise — both of which are exactly the failure modes Rule 6 (token budgets) was added to prevent.

The rule I violated most recently

The 340-line CLAUDE.md that doubled my token bill (Ch 37 origin story) violated Rule 6 — token budgets are not advisory. I knew the rule. I'd typed it into other CLAUDE.md files. I had a budget. I overran it silently because there was no breach-surfacing mechanism — just my own discipline, which was busy elsewhere that week. The fix was the extraction-into-skills move documented in Ch 37, plus a hook that warns when CLAUDE.md crosses 120 lines.

The lesson the rule didn't teach me is the one the rule can't teach you. Rules are necessary and not sufficient. Receipts are how rules earn their slot. The 12 rules above survived the community filter for that reason — 30 codebases is a real filter, even if the 41% → 11% → 3% numbers are informal. Whether they survive in your repo is whether you can name a receipt for each one before you paste. If you can't, you don't have a CLAUDE.md yet. You have a draft. Most operators have a draft they call a CLAUDE.md and don't realize the difference.

Conventions don't break because the agent forgot them. They break because they were never in the layer the agent actually reads, or they were in the layer with no receipt that made them sticky. Pick the layer. Then pick the receipt. Then write the rule.

Read Ch 37 — Context Files Ch 38 — /goal Ch 25 — Evals or Hope The Karpathy research note Role-specific skeletons

The 12-rule CLAUDE.md.

11:47 AM Wednesday, February. An engineer in Slack.

The Karpathy peg — four rules, then eight more

The 12 rules, with operator overlay

Rule 1 — Think Before Coding

Rule 2 — Simplicity First

Rule 3 — Surgical Changes

Rule 4 — Goal-Driven Execution

Rule 5 — Use the model only for judgment calls

Rule 6 — Token budgets are not advisory

Rule 7 — Surface conflicts, don't average them

Rule 8 — Read before you write

Rule 9 — Tests verify intent, not just behavior

Rule 10 — Checkpoint after every significant step

Rule 11 — Match the codebase's conventions, even if you disagree

Rule 12 — Fail loud

The 3 rules Vlad would lead with

Role-specific layering — which rules are load-bearing for which role

The paste-ready file

The rule I violated most recently

The next edition lands when this list says it does.