Synthesis

~4 min

The infrastructure beneath your models just repriced everything

NIST went dark on most CVEs, a ransomware negotiator was caught feeding victims to BlackCat, Shopify's CTO confirmed code review is the real bottleneck, and Anthropic's restricted offensive model leaked on day one. The model layer isn't where the action is anymore.

Six different specialist briefs landed today and they kept circling the same point from different directions: the interesting part of the AI stack has moved. Not up to bigger models, not down to better chips — sideways, into the unglamorous infrastructure that decides whether any of this actually works in production.

Start with the cost layer, because it's the one most teams are quietly getting wrong. A single API call now bills across six to eight token categories — input, output, reasoning, cached, tool-use, vision, structured, speculative — with no standardization between providers. Reasoning tokens are the silent killer: a 200-token answer can generate 3,000 internal thinking tokens billed at output rates. That's a 15x multiplier on bills your finance team thinks they're modeling correctly. Cloudflare ran 131,246 AI code reviews last month at $1.19 each, but only because they hit an 85.7% semantic cache rate. Without the cache, that's roughly $8 per review. The difference between viable and unaffordable is one architectural decision most teams haven't made yet.

Opus 4.7 picked the same week to break three integration patterns: budget_tokens is gone, prefilled assistant responses return 400 errors, and every multi-turn message now triggers reasoning overhead. If your harness was built around any of those, it's already failing on Mythos Preview. Meanwhile Moonshot's Kimi K2.6 ships open-weight at $0.95 per million input tokens against Opus at $5, claiming parity on four of six agentic benchmarks. Three of those four are within run-to-run noise — but the SWE-bench Pro delta is real, and the pricing gap is unambiguous. K2.6 only needs to be 80% as good as Opus to win on cost.

Code review is the bottleneck nobody is selling

Shopify's CTO Mikhail Parakhin disclosed numbers worth dwelling on: PR volume up 30% month-over-month, near-100% daily AI tool adoption, unlimited Opus 4.6-floor token budgets for engineers. He evaluated every commercial AI code review product on the market — Greptile, CodeRabbit, Devin Reviews — and rejected them all. Built custom, using GPT 5.4 Pro and Gemini Deep Think for the critique step. Cloudflare independently arrived at the same conclusion: seven specialized agents with circuit breakers and model failback chains, because no vendor product was adequate.

When two of the most sophisticated engineering organizations on earth independently reject the entire commercial category and build their own, the category has a problem. Generation is solved. AI writes fewer bugs per line than humans, but it writes so many more lines that absolute production bug count is rising. The bottleneck moved, the spending didn't follow, and the gap is wide enough to drive a unicorn through.

The corollary is just as important: parallel non-communicating agent swarms are an anti-pattern. Parakhin's word is "useless." What works is critique loops — one model generates, a different model from a different family critiques, the first regenerates. Heterogeneous models prevent correlated failures. The ratio of cheap generation tokens to expensive review tokens is the new architectural metric.

The security floor cracked in three places

NIST permanently stopped enriching non-priority CVEs on April 15. No CVSS scores, no CWE mappings, no CPE data for the vast majority of new vulnerabilities — only KEV-listed, federal, or EO 14028 critical CVEs get the full treatment. Every Dependabot rule, every SLA, every executive risk dashboard built on NVD enrichment is now operating with incomplete data. Eight new KEV entries dropped today, including three coordinated Cisco SD-WAN Manager CVEs that smell like a targeted campaign against management planes. Mean time-to-exploit is 20 hours. Average patch time is 12 days.

The protobuf.js RCE (CVSS 9.4, GHSA-xq3m-2v4x-88gg) is eval() with extra steps, hiding in your transitive dependencies via @grpc/proto-loader, Firebase, and Google Cloud SDKs. Run npm ls protobufjs today and pin to 8.0.1 or 7.5.5. SGLang has an unpatched RCE via malicious GGUF files with no vendor response. Model files are code. Schema files are code. Treating either as benign is shipping exploitable infrastructure.

Then there's the Martino case at DigitalMint. A ransomware negotiator pleaded guilty to feeding BlackCat affiliates exactly the intelligence that maximizes ransom — insurance limits, negotiating posture, willingness to pay — while posing as the victim's adviser. He conspired with other IR professionals to deploy ransomware against additional firms. $10M seized. If your IR retainer has visibility into your insurance policy, your payment authorization, and your technical scope all at once, you're handing the adversary a cheat sheet. Compartmentalize: technical remediation, business continuity, financial/insurance, and negotiation strategy should be four separate streams, and no external party should sit across all four.

Anthropic's Mythos — the model they explicitly held back as too dangerous — was accessed on announcement day via credentials chained through a Mercor breach into a partner dev environment. The same model found 271 zero-days in Firefox 150. The containment failed and the capability is real. Update your threat model accordingly: assume your adversaries have AI-augmented vulnerability discovery, because the supply chain to get there is proven.

What to do this week

If you ship anything that calls an LLM in production, instrument token-type observability before Friday. Parse reasoning_tokens, cached_tokens, and completion_tokens separately from every response and put them on a per-feature dashboard. You cannot optimize, price, or defend a margin you cannot see. Every other recommendation in this piece — model routing, semantic caching, effort-tier dispatch, K2.6 evaluation — depends on this one.

The model layer will keep churning. The infrastructure beneath it is where the next year of advantage gets built or lost.

◆ Behind the synthesis

Six specialist takes that fed this piece.

The piece above is one stream in my voice. Below are the six lenses my pipeline produced upstream — each tuned for a different reader. Use them when you want the angle that matters most to your role.

  1. Code generation is solved — code review is now the bottleneck, and nobody has an answer yet.

    The code generation problem is solved — the code review problem is not, and it's now the binding constraint at companies like Shopify (30% MoM PR growth) and Cloudflare (131K AI re…

    35 sources · 9 min Read →
  2. NIST permanently stopped enriching non-priority CVEs on April 15 — no CVSS scores, no CWE mappings, no CPE data for the vast majority of new vulnerabilities.

    NIST permanently stopped enriching most CVEs the same week a ransomware negotiator was convicted of feeding victim intelligence to BlackCat and Anthropic's restricted offensive AI…

    35 sources · 7 min Read →
  3. Google's Gemma 4 ships the most aggressive KV cache engineering in any open model — 83% memory reduction, 128K context on 8GB phones — but its 512-dimension global attention heads exceed FlashAttention-2's hard limit of 256, causing a confirmed 14x throughput penalty on every pre-Blackwell GPU (H100, A100, RTX 4090).

    Gemma 4 shipped the most sophisticated KV cache engineering in any open model — 83% memory reduction, five stacked compression techniques, 128K context on phones — but broke FlashA…

    35 sources · 9 min Read →
  4. OpenAI's GPT-Image-2 launched with API access, a +242 Elo lead over every competitor, and day-one integrations from Figma, Canva, and Adobe — if your product roadmap includes any visual generation (UI mockups, marketing assets, data visualization), your build-vs-buy calculus just flipped to 'call this API.' The image-to-code pipeline — generate a visual spec, then have Codex implement against it — is the new prototyping primitive your fastest competitors will adopt this quarter.

    GPT-Image-2 just made visual AI a one-API-call commodity (with a +242 Elo gap nobody else is close to closing), three agent platforms launched in the same week but none solved cost…

    35 sources · 7 min Read →
  5. Shopify's CTO just disclosed the most detailed enterprise AI transformation data available: near-100% daily AI tool adoption, 30% month-over-month PR volume growth — and a critical revelation that the bottleneck has permanently shifted from code generation to review, testing, and CI/CD infrastructure, which no off-the-shelf tool solves.

    The AI engineering economy repriced this week across three dimensions simultaneously: Shopify proved the bottleneck has permanently shifted from code generation to review infrastru…

    35 sources · 10 min Read →
  6. While the market obsesses over $60B AI coding tool valuations, three category-formation events landed in the same week that most investors haven't priced: Bezos's Project Prometheus hit $38B in 5 months with a separate $100B manufacturing holdco behind it (physical AI is now a funded category), Anthropic's 'too dangerous' Mythos model was breached on its announcement day while Congress moves to classify ransomware as terrorism (AI security just got its SolarWinds moment), and Shopify's CTO revealed that no commercial AI code review product meets enterprise needs despite 30% month-over-month PR volume growth (a $5-10B infrastructure gap with zero winner).

    AI security just got its SolarWinds moment — Mythos breached, ransomware going terrorism-class, NIST exiting the CVE market, and the Fed convening emergency meetings — while the co…

    35 sources · 8 min Read →