Thursday, June 4, 2026 ~5 min

The week the cheap assumptions about AI infrastructure all expired at once

Anthropic killed the third-party Claude subsidy, frontier models cleared full network takeover in UK government tests, and your ingress layer has two pre-auth RCEs. Three unrelated stories with the same shape.

ServiceNow burned its full-year Anthropic budget by May. Its CDIO said so on the record, and could not tell you which users or workloads drove the spend, because Anthropic ships no per-user telemetry and no SLAs. The same week, Anthropic confirmed it grew 80x against a 10x capacity plan and leased Colossus 1 — 220,000 GPUs — from xAI, run by a CEO who four months ago called Anthropic misanthropic and evil. On June 15, every paid Claude subscription becomes a dollar-matched API credit pool, which closes the 70–90% implicit discount that Cursor, Cline, Zed, and OpenCode users had been quietly running on. ServiceNow is not an outlier. It is a leading indicator.

In the same week, the UK AI Security Institute confirmed that Anthropic's Mythos cleared both of its hardest simulated attack ranges end-to-end. GPT-5.5-cyber cleared one. The prior generation topped out at "advanced persistence." Mozilla wrapped Mythos in a custom harness and surfaced 271 Firefox bugs. Daniel Stenberg pointed the same model at curl with a generic scan and got 1 CVE plus 4 false positives. Same weights, 271:1 yield.

And the ingress layer broke. NGINX's rewrite module — present in roughly 90% of production configs — has an unauthenticated RCE that has been sitting in the tree for 18 years. Traefik shipped a CVSS 10.0 auth bypass that makes ForwardAuth and BasicAuth decorative. Both are pre-auth and internet-facing. PraisonAI was weaponized four hours after disclosure. LiteLLM landed on CISA's Known Exploited Vulnerabilities list.

Three unrelated stories. One shape.

The assumption that broke

Every one of these stories is the same kind of failure: a quietly load-bearing assumption — about cost, about adversary speed, about how long a defensive abstraction stays opaque — that held for years and stopped holding this quarter. The numbers people had pencilled in were calibrated against a world that does not exist anymore, and the calibration error is not 20%. It is roughly an order of magnitude in every direction.

The AI cost model assumed third-party harness traffic was structurally subsidized. It was. It isn't. Teams running 10 engineers on Pro plans through Zed eight hours a day are about to discover what their workload actually costs at API rates, and the answer is 3–10x what the budget says.

The defensive security model assumed that understanding an EDR product cost more than bypassing it. TrustedSec ran LLMs against five commercial EDRs and reduced weeks of skilled reverse engineering to days of automated work. All five share the same architectural furniture — YARA, Lua, local ML classifiers — which means the bypass is not specific to one vendor's mistake. It is general.

The patch-cycle model assumed adversary tempo was human tempo. Disclosure-to-exploit windows were 30–90 days for years. They are now four hours for anything an AI can chain. A 30-day critical-CVE SLA is not conservative. It is, on internet-facing assets, a window of known exploitability with a corporate signature on it.

What the agentic data actually says

Vercel's AI Gateway covers about 200,000 teams and seven months of production traffic. 59% of token volume is now agentic — multi-turn tool loops, not chat completion. Anthropic captures 61% of spend on the expensive reasoning end. Google captures 38% of volume on the cheap throughput end. Two different businesses sit inside the phrase "foundation models" and they should not trade at the same multiple.

This is the number that retires the most decks. If a cost model was built on 3:1 input-to-output ratios and single-turn evals, it is wrong by about 5x on agentic traffic, and the error is not symmetric across providers. If an eval harness only scores final-answer correctness, it is measuring 41% of the bill while the planner burns 40,000 tokens arguing with itself in the other 59%. Pass@1 was the right instrument in 2023. It is the wrong instrument now.

The Mozilla-versus-curl result generalizes the same point. The harness dominates the model. Anyone investing in AI capability — offensive, defensive, productivity, anything — and treating the model as the moat is funding the wrong layer. The moat is the scaffolding around it: the prior bug corpus, the triage pipeline, the reproducible test cases, the trajectory-level metrics, the cost attribution at the gateway.

The October IPO is the forcing function

Anthropic hired a CFO. The IPO is reportedly targeted for October. Margin-per-token is now a board-level metric and the subsidy regime is structurally over, not paused. Expect at least one more pricing adjustment before the S-1 narrative locks. The lease from xAI is not a strategic choice. It is a confession that capacity is the binding constraint, and a CEO renting from a declared enemy has run out of better options.

Meanwhile OpenAI is offering two months of free Codex to enterprise teams that switch within 30 days, with the deadline timed to land mid-July. That is displacement pricing aimed at the exact window where Anthropic customers are recalculating their bills. Ramp's data has Anthropic at 34.4% of business spend versus OpenAI's 32.3% — the first leadership change. Both companies know how reversible it is.

What to do this week

One thing, not five. Stand up gateway-level cost and trajectory telemetry in front of every LLM call before June 15. Per-tenant, per-user, per-feature tagging. Input and output tokens logged per call, not per day. Tool-call sequences captured. Daily budget alerts wired to whoever owns the line item. If LiteLLM is what you reach for, patch it first — it is on KEV.

This is not the most exciting recommendation in the document. It is the one that makes every other decision this quarter possible. Without it, the Anthropic re-pricing is invisible until finance forwards the invoice. The agentic-versus-single-turn cost split is invisible until margin compresses. The Codex evaluation is unfalsifiable. The case for a multi-provider routing layer is hand-waving. The trajectory anomalies that look like agent compromise are indistinguishable from legitimate tool use.

The operators who wrote down their token budgets, their patch SLAs, and their AI-vendor sub-processor lists before this week are moving now. Everyone else is in a Slack thread about it.

◆ Behind the synthesis

Six specialist takes that fed this piece.

The piece above is one stream in my voice. Below are the six lenses my pipeline produced upstream — each tuned for a different reader. Use them when you want the angle that matters most to your role.

The week the cheap assumptions about AI infrastructure all expired at once

The assumption that broke

What the agentic data actually says

The October IPO is the forcing function

What to do this week

Six specialist takes that fed this piece.

The NGINX rewrite module has an 18-year-old unauthenticated RCE in a code path that runs before auth middleware in roughly 90% of production configs.

Lead item is the NGINX rewrite module: an unauthenticated RCE, eighteen years old, disclosed today.

Anthropic killed the flat-rate Claude subscription this week.

Anthropic's June 15 pricing change eliminates the 70-90% implicit discount on Claude usage through third-party tools (Cursor, Cline, Zed, OpenCode).

Your EDR became structurally transparent this week.

ServiceNow burned its full-year Anthropic budget by May, with no SLAs, no per-user telemetry, no enterprise dashboard.