~4 min
Nvidia paid $20B to admit GPUs lost the inference war
The same week Nvidia licensed Groq's LPU into its own racks, Amazon's AI-generated code took AWS down for 13 hours. Both stories are about the same thing: the inference era has different physics, and most teams aren't ready.
Nvidia just licensed Groq's LPU for roughly $20 billion and is shipping racks with 256 of someone else's chips inside them. OpenAI is the named first buyer, specifically for coding agents. AWS announced a parallel deal with Cerebras the same week. Intel processors handle the inter-chip communication because NVLink doesn't yet integrate with the LPU — a detail that tells you more than the press release does.
This is the first time Nvidia has put another company's silicon into its own server architecture. When the company that defined AI compute pays $20B to bolt a competitor's chip onto its rack, the message is unambiguous: GPUs are not the right shape for inference at scale, and the gap was big enough that building wasn't fast enough. Buy.
That's the headline. The more interesting story is what happened in the same news cycle on the other side of the stack.
Amazon ran the experiment for you
Amazon confirmed that its Kiro AI coding tool caused a roughly 6-hour retail outage and a 13-hour AWS disruption. Both were described internally as high-blast-radius incidents. The response was swift and revealing: mandatory senior sign-off on all AI-assisted code changes from junior and mid-level engineers. That is the world's most operationally disciplined cloud provider pulling a manual brake on its own tooling.
METR's study of 296 real pull requests, published the same week, quantified the gap. About half of AI patches that pass SWE-bench would be rejected by the maintainers of scikit-learn, Sphinx, and pytest. Claude Opus 4.5 scoring 92% on Stripe's 11-task benchmark and AWS going dark for 13 hours are not contradictory data points. They are the same data point. Benchmarks measure the model. Outages measure the system.
The New York Times shipped the counter-template in the same window. Constrain AI to test generation only, deny it write access to source code, keep coverage reports read-only, require human review on every merge. Result: test coverage across six web projects went from 28% to 83% with an estimated 70% effort reduction. The intervention wasn't a better model. It was three guardrails and a smaller blast radius.
The pattern across both stories is the same. Inference is becoming its own market with its own silicon because the workload demands different physics than training. AI-generated code is becoming its own deployment surface because it has different failure modes than human code. Treating either as a continuation of the previous regime is how you end up explaining a 13-hour outage to your board.
What this means for the inventory you already own
If you're running inference on GPU-only infrastructure and you signed multi-year capacity contracts in 2024 or 2025, you are now the one holding the bag on cost-per-token economics that are about to reset. The Nvidia-Groq racks ship in H2 2026. AWS-Cerebras is available now. The Feynman generation, planned for 2027 and beyond, is supposed to fuse GPU and LPU on a single die — which is also Nvidia telling you the bolted-on bridge architecture they're shipping next year is a transitional product. Samsung is the foundry for the first generation, and Samsung's advanced-node yields have historically lagged TSMC's. Treat any H2 2026 supply assumption as upside, not baseline.
The action item isn't to switch silicon. It's to make switching possible. If your serving stack has hard CUDA dependencies, custom TensorRT kernels, or batching logic that assumes GPU cluster topology, you've accumulated lock-in against a market that's about to fragment in your favor. The teams that win the next 24 months will be the ones who can route a workload to whichever chip is cheapest that quarter without rewriting application code.
Meanwhile, the cost optimizations that work on every architecture — request batching, KV-cache reuse, speculative decoding, quantization — are still where the immediate money is. Do those first. They pay back regardless of which silicon you settle on.
The thing to do this week
Classify your AI coding use cases by blast radius, in writing, before Friday. Not as a policy document — as a one-page table. Test generation, documentation, internal tooling, migration scripts: low blast radius, ship aggressively, NYT-style read-only constraints. Code review augmentation, non-critical service code: medium, staged rollout, automated scope detection in CI. Auth flows, payment paths, infrastructure-as-code, data pipelines, anything touching secrets: high, mandatory senior review, deny-by-default file permissions, no exceptions for velocity.
Then instrument it. Tag AI-assisted commits in metadata. Track the percentage of AI-generated code per repo as a leading indicator. Add a blast-radius estimator to CI that counts files changed and services affected, and routes anything above threshold to a senior reviewer automatically. Amazon's senior-sign-off mandate is unsustainable as a manual process; the only version of it that scales is the one your CI enforces.
The Trivy compromise via the GitHub Actions pull_request_target pattern is the third leg of the same week's lesson. Your security scanner, running in elevated context inside your pipeline, was a supply chain attack vector. Grep your org for pull_request_target today. The same way you're going to grep your codebase for CUDA assumptions next quarter, and the same way you should already be grepping every AI-assisted PR for senior review.
The inference era and the AI-code era arrived in the same news cycle. Both reward the teams that built abstractions instead of bets.
◆ Behind the synthesis
Six specialist takes that fed this piece.
The piece above is one stream in my voice. Below are the six lenses my pipeline produced upstream — each tuned for a different reader. Use them when you want the angle that matters most to your role.
-
Amazon just confirmed what every engineering org needs to hear: AI-generated code caused a 6-hour retail outage and a 13-hour AWS disruption, forcing mandatory senior sign-off on all junior/mid-level AI-assisted code changes.
Amazon's AI-generated code caused 19 hours of combined production outages, METR proved SWE-bench overstates AI code quality by 2x, McKinsey's AI platform got rooted via textbook SQ…
14 sources · 7 min Read → -
A GitHub Actions misconfiguration exploiting pull_request_target workflows compromised 48 repositories including Trivy — the container security scanner likely running inside your CI/CD pipeline right now.
Your CI/CD pipeline trusts Trivy, which was just compromised through a GitHub Actions flaw affecting 48 repos — while Amazon confirmed that AI-generated code caused a 13-hour AWS o…
14 sources · 6 min Read → -
Nvidia just paid $20B to license Groq's inference-specialized LPU and integrate 256 chips into its own server racks — the first time Nvidia has built another company's silicon into its own systems.
Nvidia's $20B deal to put Groq's inference chips into its own server racks officially ends the GPU-for-everything era — benchmark GroqCloud now and start abstracting your serving l…
14 sources · 7 min Read → -
Lovable added $100M ARR in a single month with 146 employees ($2.74M per head) while Amazon convened senior engineers after AI-generated code caused a 6-hour retail outage and 13-hour AWS disruption — and then mandated human sign-off on all junior/mid AI-assisted code changes.
AI coding tools are simultaneously generating $2.74M ARR per employee and 6-hour production outages at Amazon — the teams that win will segment use cases by blast radius, not unifo…
14 sources · 8 min Read → -
Nvidia just paid $20B to license Groq's inference-specialized LPU and ship dedicated 256-chip inference racks — the first concrete admission from the dominant AI hardware maker that GPUs alone can't serve the agent-era inference load.
Nvidia paying $20B for Groq's inference chip, Amazon pulling emergency governance on AI-generated code after dual production outages, Anthropic forming a PE joint venture to push A…
14 sources · 9 min Read → -
Nvidia just paid $20B to license Groq's inference chip into its server racks — the first time it has ever integrated a third-party AI processor — officially splitting AI compute into two distinct investable categories.
Nvidia paying $20B to license Groq's inference chip — while $4B+ in AI funding deployed in a single week with Lovable posting $2.74M ARR per employee — confirms AI compute is split…
14 sources · 8 min Read →