The reason most operators have no idea what AI should cost
You sign up for ChatGPT Plus. $20/month, one seat, fine. You add Cursor for the engineers — $20/seat. You add Claude.ai for yourself — $20. The mental model anchors on $20-per-thing. Then you start running agents on a schedule, the morning briefing pulls in 40K tokens of context every weekday, the swarm fans out into 6 parallel subagents, the codex bug-watcher runs against Sentry 24/7, and one Tuesday in May the finance card gets a transaction that doesn't fit the anchor.
The anchor is wrong. Seat pricing scales like SaaS. Agentic AI scales like compute. They're different units of measure. Treating them as the same number is how operators end up surprised by a bill they could have predicted, and how CFOs end up freezing AI spend the day after the platform finally started compounding.
What follows is a real receipt, line by line, from running a portfolio across Belkins, Folderly, LinguaLive, and the newsletter. Some lines have exact numbers. Some are bands. The ones I'd have to fabricate to make exact are flagged so you can see what's a placeholder.
The receipt
| Line item | Monthly | What it does |
|---|---|---|
| Anthropic API (Claude — direct + agent stacks, swarm fan-out, skills) | $1,000 soft cap actual: $1,108-$4,312/wk band | The whole agentic surface — morning briefing, friday-wrapup, deal alerts, mentee prep, newsletter drafts, deliverability evals, competitive scans. Finance card issued specifically to ring-fence this line. |
| Claude.ai Cowork (Claude with MCP, seat × team) | $X — Vlad to fill in ~$200-$400 band for 4-8 seats | Interactive Claude in a managed environment with MCP connectors (Notion, Slack, Stripe, Gmail, calendar, HubSpot). Where the operators actually do work. |
| OpenAI API (Codex, embeddings, edge cases) | $X — Vlad to fill in ~$200 ballpark | Codex running 24/7 against Sentry and GitHub for the bug-watch loop. Embeddings for semantic search across the vault. |
| Hosting + compute (Vercel, Railway, Supabase, edge) | $X — Vlad to fill in ~$300-$600 across 5 projects | Where the custom agents and dashboards actually run. Not the model spend — the wrapper around it. |
| Voice stack (ElevenLabs + STT for transcripts) | $X — Vlad to fill in ~$50-$150 | Voice generation for the newsletter audio, transcript-to-summary for meetings the team couldn't attend. |
| Image + video (Nano Banana, SeeDance, Suno, Higgsfield) | $X — Vlad to fill in ~$80-$300 | Newsletter visuals, agency content, social cuts. Mostly batched, mostly used. |
| Cursor seats (engineering) | $X — Vlad to fill in ~$40-$200 | The editor with model access for the team that ships features. Distinct from the agentic API spend — this is interactive coding. |
| Perplexity Pro (research) | $X — Vlad to fill in ~$20-$40 | Sourced research with citations, used when the question needs current web data and a paper trail. |
| GitHub Copilot (legacy, kept for one workflow) | $X — Vlad to fill in ~$20-$80 | Inline completion in IDEs that don't have Cursor. Mostly a backstop, not the primary tool anymore. |
The honest band: somewhere between $2,000 and $6,500 per month across the whole portfolio in a normal month. Spikes to $8K when a cache breaks (see the $4,312-week incident in Ch 29) or when a new agent ships before its eval gate is tight.
That's the all-in number for the AI side of running five companies. It's roughly what one mid-senior engineer costs fully loaded in a single month. The output is closer to a small team. We'll get to that math.
The token math behind "3 to 5 billion tokens last month"
I burned somewhere between 3 and 5 billion tokens across the Anthropic side of the stack last month. That number sounds insane until you do the conversion.
A chat conversation averages roughly 10,000 tokens end-to-end — system prompt, your messages, the model's responses, a couple of follow-ups. Three billion tokens is ~300,000 chat-conversations-equivalent. I did not have 300,000 conversations last month. Nobody did.
The actual shape of the burn:
- Sub-agent fan-out. One operator-level instruction becomes 4-8 parallel subagents, each carrying the full context. A single "audit this repo" call can spend 800K tokens before it writes a line.
- Repeated context loading. A scheduled job that fires every 15 minutes and pulls in 40K tokens of portfolio context burns 2.5M tokens a day just on the prefix, before the agent does any work.
- Long-running agentic loops. A Codex bug-watcher that iterates 30 times to land a fix is 30 round-trips of context — easily 1-2M tokens per fix.
- Eval batches. Running a 500-item eval against the morning briefing twice a week is ~100M tokens you'll never see in a UI.
The token-per-conversation anchor is the wrong unit. The right unit is tokens per workflow output — a finished morning briefing, a deployed bug fix, a shipped newsletter draft. By that measure, 3-5B tokens a month is dozens of finished operator-grade outputs every day, not 300,000 conversations.
What's not on the bill but should be
The line items above are the API and SaaS spend. They're not the whole cost. Three things show up on no invoice but cost real money:
- The engineer's time tuning prompts and skills. Every skill in the stack got 30-90 minutes of iteration before it started returning useful output. Across ~50 skills, that's 25-75 hours of human time. At a $150/hr operator rate, $4-11K of unbilled work behind the line items.
- The cache-busting CLAUDE.md edits. Every time the system prompt changes, the cache resets and the next ~5 minutes of work pays full price. Most operators eat this without noticing. The $4,312 spike in Ch 29 was one badly-placed paragraph.
- Failed experiments. Maybe one in three agent prototypes ships. The other two cost $20-200 of API spend and a day of engineering each before getting cut. That's the R&D line nobody draws.
Three ways operators burn 10x more than they should
1. No prompt caching.
If your repeated agent calls aren't hitting the cache, you're paying full input price on every prefix, every time. The cache-read rate is roughly 1/10th of the input rate — meaning a workflow that should cost $100/week costs $1,000/week with no caching, for identical work. See Ch 29 for the mechanism.
2. No batch.
Anthropic's Message Batches API runs the same model at 50% off if you can wait up to 24 hours. Eval runs, content generation, overnight summarization, deliverability checks — none of these need to return in 60 seconds. Operators who run these interactively are paying double for the privilege of seeing results immediately, every time, even when nobody's watching the screen.
3. No eval gate before headless runs.
A new agent ships, runs unattended for a weekend, and turns out to be calling Sonnet 60 times in a row for a task Haiku could do once. By Monday you've burned $400 on a workflow that should have cost $8. The fix is a tiny eval that runs the agent against 20 known cases before you let it run headless on real data. The fix takes an hour. The bug, ungated, runs forever.
Operator move: if your AI bill is bigger than you expected, the answer is almost never "use less AI". It's "fix caching, move async work to batch, gate headless runs with evals". Same workload, ~40% lower bill, in under a week of engineering.
The honest comparison
For a portfolio of five companies, the AI bill is roughly $2,000-$6,500/month. What it replaces:
- Human labor: ~$X — Vlad to fill in /mo of analyst, ops, and engineering hours that no longer get spent on triage, summarization, briefing prep, and the long tail of "someone needs to read this and tell me what matters". Honest ballpark: somewhere in the $15K-$30K/month band of work that would otherwise need hiring.
- Cancelled SaaS: ~$X — Vlad to fill in /mo of point tools we stopped renewing once the AI stack absorbed their job. The dashboard tool, the brief-generation tool, the social-scheduling assist tool. Probably $800-$1,500/month.
The math the vendor decks won't draw: AI isn't cheap. It's targeted. A $5,000/month AI bill that deletes $20,000/month of labor and $1,500/month of SaaS is not a cost — it's a 4x return that the finance team can see in a spreadsheet, with line items they can audit. The number on the invoice goes up. The number across the rest of the budget goes down by more.
The mistake is treating AI spend as a category to minimize instead of a category to optimize. You don't minimize sales spend. You optimize CAC. Same principle applies here. The number to watch isn't the bill, it's the ratio of bill to displaced cost. Ours runs roughly 1:4. Yours should too, or you're either underspending or running the wrong workflows.
The line nobody writes in the proposal
The reason most "AI strategy" decks land badly is they show the cost and skip the displaced cost. The reason most CFO conversations go sideways is the bill is real and the savings are anecdotal. The fix isn't to argue the bill down. The fix is to put both numbers on the same chart and let the ratio do the talking. If the ratio is wrong, the answer isn't "spend less" — it's "this workflow shouldn't be on the AI stack yet". If the ratio is right, the bill is the cheapest line you're paying.
For the technical layer on caching, batch, and model routing — the levers that drag this bill down 30-50% without changing the workload — see Ch 29 — Why Is My Bill So High?. For the conversation about whether to keep paying for the public stack or build your own, see Build vs Buy. For the 600-word case to forward to your CFO, see The case for the AI spend.