Saturday, May 9, 2026 ~4 min

The agent control plane is the product now — and it shipped this week

AWS and Google made agent identity a first-class primitive in the same week three production agents went catastrophically wrong. The infrastructure layer just stopped being optional.

Two things happened on the same calendar this week, and they're the same story.

AWS made its MCP Server generally available with IAM-scoped access to 15,000+ API operations. Google Cloud shipped first-class agent identities — OAuth, certificates, runtime defense, the whole principal class. Two hyperscalers, independently, in the same window, conceding that an agent running on a developer's personal token is now the legacy pattern.

In the same window: a Cursor agent dropped PocketOS's production database in under ten seconds and the outage ran past thirty hours. Grok moved real cryptocurrency in response to a Morse-code prompt injection in a tweet. LayerX demonstrated that Anthropic's Claude Chrome extension is still hijackable after the May 6 patch, exfiltrating Drive files and GitHub source through the user's own authorized sessions. None of those are model failures. They're permission failures. The agent did exactly what its grant allowed.

The fix and the demonstration of why you need it arrived together. That's not a coincidence.

The bottleneck moved

For most of the last eighteen months, the binding constraint on shipping a useful AI feature was model capability. That constraint is gone for the workloads people actually want to build. GPT-Realtime-2 collapsed the ASR→LLM→TTS cascade into one stateful WebSocket at $4.61/hr output, doubled instruction retention from 36.7% to 70.8%, and held pricing flat through the capability bump. Glean reported a 42.9% helpfulness gain in production. Genspark, 26% on conversation completion. Gemini 3.1 Flash Live ties at 96.6% on Big Bench Audio, which means there is no durable model moat — pick on integration depth and switching cost.

The new bottlenecks are three layers down from the prompt: who the agent is, what it's allowed to touch, and how legibly it bills. MCPMark V2 ran the cleanest experiment of the week and it should change how teams plan next quarter. Smarter Claude models on an unoptimized backend burned 54% more tokens, not fewer. The same RAG build ran 10.4M tokens against Supabase and 3.7M against InsForge, with ten manual interventions on one side and zero on the other. The model isn't being wasteful. It's being thorough with garbage context. Hand a smart model ambiguous error codes and exhaustive doc dumps and it will reason its way through them, charging by the token for the privilege.

The lever isn't the model. It's the shape of the tool response and the semantics of the exit code. That's an API design problem with API design fixes — narrow skills, structured JSON, semantic exit codes, cheap topology primitives. They transfer regardless of which vendor's MCP server you adopt.

The capital structure under all this is cracking

CoreWeave printed $2B in revenue against $7.7B of quarterly capex, $4.7B of cash burn, and $24.8B of debt against $3B of cash — capex-to-revenue at roughly 280%, against 25-50% at the hyperscalers. The stock took 15% the same day Jensen Huang said on a microphone that CoreWeave "would not exist" without Nvidia's subsidies. Two-thirds of CoreWeave's cash came from Nvidia equity. That's vendor financing dressed as strategic investment, and the analogous historical comp is not flattering.

In the same week, OpenAI's $18B Broadcom deal stalled because Microsoft wouldn't commit to 40% offtake. The best-capitalized AI buyer on earth couldn't unilaterally close custom-silicon financing. Meanwhile Core Automation went from incorporation to a $4B valuation in six weeks with no product, anchored by an Nvidia seed check that everyone else priced off of. Hyperscalers booked $53B in private AI gains through Q1 income statements — gains on positions they helped price with capital they committed.

None of this changes what you should ship Monday. It changes how long you should assume current GPU pricing holds, and whether the neocloud you signed a two-year contract with will be solvent at renewal.

Three regulatory regimes, not one

DOJ filed an amicus supporting xAI's constitutional challenge to Colorado's AI law on April 24 — the federal executive's clearest signal yet that it prefers preemption. Tennessee made "knowingly" training harmful models a Class A felony. Oregon's chatbot law activates a private right of action in January 2027, which means plaintiffs' attorneys, not regulators, drive enforcement. The EU Cloud Act goes to debate at the end of May with structural exclusion of US providers on the table.

The stack of obligations that lands on a consumer AI product by Q1 2027 is not one law. It's at least four: disclose AI to users, detect and escalate self-harm signals, gate minors behind parental consent, and don't let the model represent itself as a clinical professional. Most roadmaps have budget for one of those. That's the gap to close this quarter, not next.

What to do this week

Pick one production agent — the highest-blast-radius one — and move it off your developer credentials before Friday. Use the AWS MCP Server or GCP agent identity primitives that shipped this week. Give it a dedicated service account, read-only by default, with write credentials injected only at the step that needs them and a snapshot middleware sitting between the agent and any system of record. If you can't draw the blast-radius diagram on a whiteboard in five minutes, the agent shouldn't have the grant.

Then instrument tokens-per-tool-call as an SLO alongside latency. The 3x cost gap between naive and optimized backends compounds invisibly through scheduled triggers — GitHub already discovered this on Agent Workflows and started systematic optimization. Your finance team will discover it one quarter later, by which point the bill is the conversation.

The model is not the variable. The grant, the response shape, and the audit log are. Build accordingly.

◆ Behind the synthesis

Six specialist takes that fed this piece.

The piece above is one stream in my voice. Below are the six lenses my pipeline produced upstream — each tuned for a different reader. Use them when you want the angle that matters most to your role.

The agent control plane is the product now — and it shipped this week

The bottleneck moved

The capital structure under all this is cracking

Three regulatory regimes, not one

What to do this week

Six specialist takes that fed this piece.

AWS and Google Cloud shipped agent identity primitives this week to replace personal developer tokens.

CVE-2026-6973 is Ivanti EPMM's third zero-day in six months and is under active exploitation.

OpenAI's GPT-Realtime-2 folds ASR, LLM, and TTS into one speech-to-speech model with GPT-5 reasoning, a 128K context, and flat pricing at $1.15 and $4.61 per hour.

GPT-Realtime-2 shipped this week at $0.017/min with GPT-5-class reasoning, 128K context, and 70.8% instruction retention (up from 36.7%) — collapsing your three-quarter voice roadmap into a single API integration decision.

AWS and Google shipped competing agent identity frameworks in the same week, which is the opening move in a control-plane fight over who owns the audit log, the permission model, and the billing relationship for every AI agent an organization deploys.

CoreWeave printed twenty-four point eight billion dollars of debt against three billion in cash, two-thirds of which came from Nvidia, at three times capex-to-revenue, and the stock took fifteen percent for its trouble.