The AI stack repriced this week — audit your infrastructure before Friday

Six things broke at once: code review, token economics, CVE enrichment, model containment, IR trust, and GPU compatibility. The bottleneck moved and most teams haven't noticed.

Shopify's CTO Mikhail Parakhin disclosed the number that reframes every AI engineering budget on the planet: PR merge volume growing 30% month-over-month, with no off-the-shelf tool adequate for review. Shopify evaluated Greptile, CodeRabbit, Devin Reviews — rejected all of them — and built custom using GPT 5.4 Pro and Deep Think for critique. Cloudflare independently arrived at the same architecture and published the receipts: 131,246 AI reviews in month one, 120 billion tokens, $1.19 per review, 3m39s median latency, 0.6% engineer override rate. The 85.7% cache hit rate is the only reason the unit economics work — without it, they'd be paying roughly $8 per review.

Code generation is solved. Code review is the binding constraint. If your AI engineering spend is still 80% pointed at generation, you are optimizing a bottleneck that moved.

Yes, but — the counter-reading is that Shopify and Cloudflare are outliers at the far end of the adoption curve, and most teams still get real leverage from Copilot-class generation tooling. Fair. The signal isn't that generation stopped mattering; it's that the marginal dollar now belongs somewhere else, and the two most sophisticated engineering orgs on earth both built custom because nothing on the market cleared their bar. That's a category-formation event even if you're not ready to buy yet.

The token economy fragmented while you weren't looking

A single LLM call now bills across six-plus token types — input, output, reasoning, cached, tool-use, vision — each at different rates, with no cross-provider standardization. The trap is reasoning tokens: a 200-token answer routinely generates 3,000 internal thinking tokens billed at output rates. That's a 15x multiplier your cost model almost certainly doesn't account for.

Opus 4.7 shipped this cycle with three breaking changes that will silently degrade or loudly break existing pipelines: budget_tokens deprecated in favor of adaptive thinking, prefilled assistant responses returning HTTP 400, and per-turn reasoning overhead that inflates every multi-turn agentic loop. The optimal prompting pattern shifted from pair-programming to delegation — one detailed prompt with explicit constraints, no hand-holding. Meanwhile Moonshot's K2.6 landed under Modified MIT at $0.95/M input tokens versus Opus's $5/M, with SWE-bench Pro at 58.6 versus Opus 4.6's 53.4. Vendor-published benchmarks, so verify on your workload — but at 5x price parity, K2.6 only needs to hit 80% of Opus quality to win on cost-sensitive batch work.

Audit token spend by type this sprint. If you can't decompose a bill into input, output, reasoning, cached, and tool-use, you cannot optimize what you cannot see, and the CFO conversation is coming whether you initiate it or not.

Two structural breaks in security infrastructure

NIST stopped enriching non-priority CVEs on April 15 — permanently. No CVSS, no CWE, no CPE for the vast majority of new vulnerabilities. Every scanner, SIEM correlation rule, and SLA that assumes NVD enrichment is now operating with incomplete data. Eight new CISA KEV entries landed today, including three coordinated Cisco SD-WAN Manager CVEs that read like a targeted campaign against network management planes. Mean time-to-exploit has collapsed from 2.3 years in 2018 to roughly 20 hours today. A 12-day patch window is now a suicide note.

Patch protobuf.js to 8.0.1 or 7.5.5 today — the CVSS 9.4 RCE is transitively included via @grpc/proto-loader, Firebase, and Google Cloud SDKs, and most teams don't know they're exposed until they run npm ls protobufjs.

The second break is worse because there's no patch. A former DigitalMint ransomware negotiator, Angelo Martino, pleaded guilty to feeding BlackCat/ALPHV affiliates his clients' insurance limits, negotiation posture, and willingness to pay — during active engagements where he was the trusted adviser. He conspired with other IR professionals to deploy ransomware against additional firms. Roughly $10M in assets seized. If your IR retainer sees your insurance coverage, negotiation strategy, and technical remediation in a single information stream, you are handing an adversary a cheat sheet. Compartmentalize: financial data to CFO plus outside counsel only, technical remediation to the IR firm, negotiation strategy to counsel plus one designated exec. No external party should see across all four streams.

Anthropic's restricted Mythos model — withheld as too dangerous to release — was accessed by unauthorized users on announcement day through credentials chained from the Mercor breach into a partner development environment. The same model found 271 zero-days in Firefox 150. The containment failure and the capability demonstration landed in the same news cycle. Your threat model now has to assume adversaries have equivalent tools, because the access path is proven and repeatable.

The GPU trap in Gemma 4

One narrower item worth naming because it's an immediate procurement decision: Gemma 4's 512-dimension global attention heads exceed FlashAttention-2's 256-dim hard limit. On every pre-Blackwell GPU — H100, A100, RTX 4090 — throughput falls from roughly 124 tok/s to about 9 tok/s. Fourteen times slower. The vLLM per-layer dispatch fix is still an open issue.

If your team is benchmarking Gemma 4 on H100s this week, stop. You are measuring the wrong thing. Google shipped an architecturally brilliant model that is effectively Blackwell-only for production serving, and pretending otherwise wastes eval cycles.

What to do this week

One action per lens, ranked by urgency. First: run npm ls protobufjs across every Node.js service and patch. This is the highest-ratio move on the list — CVSS 9.4, no breaking API changes, done in an afternoon. Second: instrument per-token-type cost observability in your LLM gateway. Third: review your IR retainer contract for information compartmentalization and personnel vetting clauses before the next quarterly renewal, not after your next incident. Fourth: measure your generation-to-review token ratio. If it's above 80/20, rebalance.

The infrastructure beneath the models now determines your cost, speed, and security more than the model choice does. The teams that internalize that this quarter compound; the teams that keep chasing benchmarks pay the tax.

◆ Behind the synthesis

Six specialist takes that fed this piece.

The piece above is one stream in my voice. Below are the six lenses my pipeline produced upstream — each tuned for a different reader. Use them when you want the angle that matters most to your role.

The AI stack repriced this week — audit your infrastructure before Friday

The token economy fragmented while you weren't looking

Two structural breaks in security infrastructure

The GPU trap in Gemma 4

What to do this week

Six specialist takes that fed this piece.

Code Review Is the New Bottleneck as Shopify PRs Surge 30%

NIST Halts CVE Enrichment as Exploit Window Drops to 20 Hours

Gemma 4's 512-Dim Heads Trigger 14x Slowdown Pre-Blackwell

GPT-Image-2 Ships With 242 Elo Lead and Figma Integration

Shopify CTO: AI Bottleneck Has Shifted from Code to Review

Physical AI, Model Security, Code Review: 3 Uncrowded Bets