How should I prioritize the cost audit across different AI workflows?

Split workflows along two axes: load-bearing versus exploratory, and harness-replaceable versus not. Load-bearing workflows on replaceable harnesses are renegotiation candidates with Anthropic inside the 30-day window; load-bearing workflows that can't move should pilot Codex on the 2-month free offer this week. Exploratory usage should either follow whichever vendor is currently subsidizing or stop paying metered rates entirely.

Why did ServiceNow burn its full-year Anthropic budget by May, and how do I avoid the same outcome?

ServiceNow's CDIO couldn't identify which users or workloads drove the spend because Anthropic doesn't ship the per-user, per-workload telemetry enterprise buyers need. The fix is instrumentation, not budget — ship per-customer, per-feature inference cost telemetry before your next AI feature launch so you can see consumption before the invoice arrives, not after.

Is the OpenAI Codex 2-month free offer worth piloting given switching costs?

Yes, for at least one load-bearing workflow within 14 days. The displacement offer has a shot clock that expires when Anthropic stabilizes pricing, and OpenAI is specifically targeting the moment of developer frustration after losing the business adoption lead (32.3% vs Anthropic's 34.4% per Ramp). A pilot gives you negotiating leverage with Anthropic even if you don't migrate.

How much human QC capacity should I plan for AI-generated content pipelines?

Build a 20% human QC buffer into sprint capacity. Duolingo validated at scale that roughly one in five AI outputs is unusable and requires human intervention, and they reversed their blanket AI mandate as a result. Treat this as a structural COGS line rather than a quality problem that better prompting will eliminate.

Edition 2026-05-28 · read as Product

AnthropicEndsHarnessDiscount,OpenAIPounceswithCodex

Sources: 36
Words: 1,475
Read: 7min

Topics Agentic AI LLM Inference AI Capital

◆ The signal

Anthropic eliminates the 70-90% implicit discount third-party harness users have been living inside, effective June 15 — separate credit pools, overage at API rates. ServiceNow publicly admitted burning its full-year Anthropic budget by May 2026. Your per-developer AI cost assumption is wrong by roughly an order of magnitude, you have 30 days to model the impact, and OpenAI is offering 2 months free Codex specifically to exploit the resulting frustration. Run the cost audit before the weekend, not after.

◆ INTELLIGENCE MAP

01
Enterprise AI Cost Model Breaks June 15
act now
Anthropic's June 15 pricing restructure ends subsidized third-party tool access. ServiceNow burned its full-year budget by May. Duolingo's 20% AI slop rate means human QC is a permanent line item. The era of cheap, predictable AI inference costs ended — 30 days to remodel.
30
days until pricing change
8
sources
- Implicit discount ending
- ServiceNow budget burn
- AI content slop rate
- GPU demand ratio
1. Before June 1520
2. After June 15200
3. OpenAI Codex offer0
02
MCP Becomes the Enterprise Agent Standard
monitor
SAP (€100M fund), ServiceNow (Action Fabric), and Notion (Developer Platform) all shipped MCP-based agent architectures the same week. Procurement managers are asking 'can our agents call this directly?' Your headless API surface is now a retention risk measured against the next renewal cycle.
€100M
SAP partner fund
6
sources
- Agentic token volume
- Data-ready enterprises
- Agent memory precision
- MCP scope time
1. Agentic workloads59
2. Traditional AI calls41
03
Multi-Model Routing Is Now Default Architecture
monitor
Anthropic hit 34.4% business adoption vs OpenAI's 32.3% (Ramp data) — but the real signal is that engineers route per-task without procurement noticing. Vercel's 200K+ teams show multi-model routing as the production norm. Switching cost is approximately one week, not one quarter.
34.4%
Anthropic business share
9
sources
- Anthropic share
- OpenAI share
- Anthropic revenue growth
- Switching time
1. Anthropic34.4
2. OpenAI32.3
04
The PM Role Unbundles Into Build vs. Coordinate
background
Elena Verna shipped Lovable's enterprise pricing page to production alone — work that previously required PM + designer + engineer + a week of calendar. She spends 90% of time building, almost no meetings. Lovable has zero PMs. The coordination tax is the part AI eliminates first.
90%
time spent building
3
sources
- Verna build time
- Lovable PMs
- Traditional ship time
- HI-C ship time
1. HI-C model (build)90
2. Traditional PM (coordinate)70
05
AI Provider Trust Degrades Alongside Market Position
monitor
Anthropic planned for 10x growth, got 80x. The result: Claude Code silently nerfed, paying Pro subscribers lost access mid-cycle as a 'growth experiment,' corporate accounts banned without notice. The vendor with the most market momentum also has the least predictable platform behavior.
80x
growth vs 10x plan
5
sources
- Growth vs plan
- Colossus lease
- Valuation offered
- OpenAI capex
1. Planned growth10
2. Actual growth80

◆ DEEP DIVES

The June 15 Pricing Cliff — Your AI Unit Economics Have 30 Days

What Changed This Week

A developer opens Zed on June 15 and runs the same prompt she ran on June 14. The output is identical. The bill is not. Anthropic has announced that starting June 15, Claude usage through third-party tools (Conductor, Zed, Openclaw, T3 Code, Cline) gets a separate credit pool equal to the plan's dollar amount, with overage billed at full API rates. First-party Claude app and Claude Code usage is untouched. The 50% rate limit bump for two months is a grace period, not a permanent concession.

What this actually closes is the 70-90% implicit discount that many development teams quietly built into their unit economics. A developer who was effectively paying $20/month for Claude through a third-party harness now faces $200/month-equivalent metered usage for the same workload.

The era of subsidized AI inference through integrations is ending. If a product's AI capabilities route through third-party tools, model the cost impact this week, not after the June 15 switch.

ServiceNow Is the Public Version of Your Problem

ServiceNow's CDIO Kellie Romack confirmed her team burned through the entire full-year Anthropic budget before May 2026. She cannot say which users drove it or which workloads consumed it, because Anthropic does not ship the telemetry enterprise buyers expect. PagerDuty and National Life Group describe the same problem. National Life Group's Nimesh Mehta calls Anthropic 'great for consumer usage but not great for companies.'

Here is what the data actually shows. AI costs are unpredictable because usage scales with customer success. A feature pitched as saving two hours gets run eleven times a day instead of the three the pricing model assumed. Retention improves. Margin goes sideways, then down.

The Competitive Window OpenAI Just Opened

Sam Altman offered 2 months free Codex to enterprise customers who switch within 30 days, timed to Anthropic's moment of developer frustration. Criticism from Theo, Jeremy Howard, Matt Pocock, and Omar Sanseviero landed within hours of the Anthropic announcement. OpenAI lost the business adoption lead for the first time (32.3% vs 34.4% per Ramp) and is now buying it back with displacement pricing.

The Duolingo Calibration Point

Duolingo publicly acknowledged their blanket AI mandate failed. 20% of AI-generated content is unusable and requires human QC, and they reversed the policy. The planning implication is that human-in-the-loop capacity has to assume one in five AI outputs needs intervention. That is a permanent COGS line, not a quality issue that gets prompted away.

The Decision Framework

	Load-bearing workflow	Exploratory usage
Harness replaceable	Renegotiate with Anthropic inside 30-day window	Move to whichever vendor is subsidizing
Not replaceable	Pilot Codex on free offer this week	Stop paying metered rates immediately

Action items

Audit all Claude usage via third-party harnesses and model projected cost impact of June 15 change by end of next week
Pilot OpenAI Codex on 2-month free tier for one load-bearing workflow within 14 days
Ship per-customer, per-feature inference cost telemetry before your next AI feature launch
Build 20% human QC buffer into sprint capacity for any AI content pipeline

Sources:Your AI cost model breaks June 15 · A finance lead at ServiceNow opened the Anthropic invoice · A product manager opened three vendor pricing pages this week · Duolingo's 20% AI slop rate is your quality bar · A engineer on a small team pushed a deploy on Tuesday

02
Headless Enterprise Arrives — MCP Convergence Means Ship an Agent API or Lose the Renewal
Three Platform Bets Landed the Same Week
SAP shipped a Knowledge Graph for agent context and stood up a €100M partner fund for Autonomous Enterprise. ServiceNow launched Action Fabric, which decouples workflow logic from the UI and exposes it through MCP servers so third-party AI agents can execute it. Notion released a Developer Platform with a markdown API, external data sync, agent tool building, and code execution, hosting Claude and Codex as 'teammates.'
Vendors do not stand up hundred-million-euro funds for features. They stand them up for platform bets they intend to defend for years. Three of the largest enterprise vendors landing in the same place in the same week is a commitment signal, not a coincidence.
A procurement manager at a Fortune 500 asked three vendors the same question this week: 'Can our agents call this directly, or do my people have to click through your UI?' Two didn't have an answer. The third moved to the next stage.
The Data Readiness Gap Creates a GTM Problem
Only 15% of organizations have the data foundation for agentic AI. They are spending millions anyway. Nearly half cite data quality and lineage as the primary blocker. The failure pattern is what teams tell themselves will not happen to them: buy, fail to activate, churn, blame vendor. Fivetran is positioning on the 'foundation' narrative. The counter-play is helping customers succeed across data maturity tiers instead of shipping capabilities into an unprepared market.
Microsoft's published agent memory architecture is the first real benchmark a PM can spec against: 400-500 memories, 97.2% retention precision, with consolidation, forgetting, and delayed maturation. Put those numbers in the PRD. The 'forgetting' mechanism is the counterintuitive one. Infinite context sounds like a feature and kills relevance on a predictable timeline.
59% of Token Volume Is Already Agentic
Vercel's AI Gateway data across 200,000+ production teams confirms agentic workloads carry the majority of token volume. Anthropic captures 61% of spend, with Opus doing the reasoning. Google captures 38% of volume, with Flash doing fast and cheap. Most large teams route across multiple providers by workload type. An architecture that still assumes request-response as the primary pattern is built for the minority consumption model.
What This Means for Your Product
The product decision is not 'build an agent.' It is whether the workflows the product owns can be invoked by an agent that is not yours, without a human clicking through the UI, by Q4. If the answer is no, the agent living in the buyer's stack will route around the product to reach the system of record directly. The product becomes a reporting surface on top of someone else's execution layer.
The work is smaller than the deck will suggest for most teams. A week of scoping, 2-4 weeks of build, assuming the underlying API is not already a mess. The window before this shows up in RFPs is 2-3 quarters.
Action items
- Audit your product's API surface for agent-consumability: can a third-party AI agent discover, authenticate, and execute core workflows without UI?
- Scope an MCP-compatible headless layer against your existing API by end of quarter
- Add data readiness qualification gate to enterprise onboarding before contract signing
- Spec agent memory features with 400-500 memory ceiling and explicit forgetting mechanism, referencing Microsoft's 97.2% precision benchmark
Sources:59% of AI traffic is now agentic · A customer success lead at a mid-market SaaS company · A head of sales loaded the target account list on Monday · Your AI cost model breaks June 15

The PM Role Splits — Coordination Is the Part AI Eliminates First

The Worked Example

Elena Verna — former head of growth at Amplitude, Miro, Dropbox, and SurveyMonkey — now spends 90% of her time building at Lovable, almost no meetings. She shipped Lovable's enterprise pricing page to production herself. In a traditional org, that work needed a PM, a designer, engineers, and roughly a week of calendar time. She did it in hours.

Lovable has zero product managers. Engineers talk to users, write specs, ship code, and read the feedback themselves. The company is growing fast enough that the absence is not an oversight. It is the operating model. They are hiring Growth PMs in parallel to Verna, not reporting to her. That is not a growth team. It is a roster of autonomous operators.

The PM value proposition decomposes into three pillars: cross-functional coordination, customer and market judgment, and strategic prioritization. Pillar one is what fills a PM's calendar. It is also what AI-enabled flat orgs are eliminating.

The 'Average Intelligence' Reframe

AI does not make a PM world-class at design or engineering. It makes them average-to-good at everything at once. For a PM who already thinks across functions, that is an opening, but only if the saved hours go into shipping rather than coordinating other people who are shipping. The PMs who come through this shift look less like project managers and more like mini-GMs who prototype and iterate in the artifact directly.

Abridge Shows the Wedge-to-Platform Path

Abridge's three-act sequencing in healthcare answers the order-of-operations question: save time (documentation), then save money (prior auth and billing), then save lives (clinical decision support). Each act unlocks a different buyer, contract size, and release cadence. They compressed health system releases from quarterly/semi-annual to monthly, a 4-6x acceleration, by embedding 'clinician scientists' (MDs who are full-stack engineers) inside product teams.

The lesson is not the $5.3B valuation. The lesson is that the valuation followed a wedge a clinician could describe in one sentence: 'I edit two lines and sign instead of typing for forty minutes.' The platform conversation came after the wedge earned distribution.

The Diagnostic for Monday

	Engineers talk to users weekly	Feedback filtered through decks
Prioritization has named owner	Works without PMs (Lovable cell)	Still needs PMs
Prioritization is emergent	Fragile — works until it doesn't	Highest dysfunction risk

The real call is not whether to unbundle the PM role. It is which of the four jobs — user research, prioritization, spec-writing, cross-team coordination — quietly goes missing the day the title goes away. Write that down before the reorg meeting, not after.

Action items

Calculate your personal build-vs-coordinate ratio this week — benchmark against Verna's 90% building target
Ship one small project end-to-end using AI tools (pricing page, landing page, experiment) without engaging cross-functional team
Map your product's three-act expansion ladder: identify current wedge, revenue-expansion play, and mission-critical endgame
Identify 1-2 senior ICs who might be more productive with full autonomy and fewer reports — evaluate HI-C role experiment

Sources:A product manager at a Series B company opened Lovable's careers page · A clinician finishes a patient visit

◆ QUICK HITS

Update: Anthropic capacity — Colossus 1 lease (220K GPUs from xAI) confirmed, with commitments to double Claude Code 5-hour limits and remove peak throttling. Verify delivery within 2-4 weeks before trusting capacity claims.
A engineer on a small team pushed a deploy on Tuesday
AI persona drift degrades significantly within 8 dialogue rounds (Li et al., COLM 2024) — if shipping multi-turn AI features, add drift detection to acceptance criteria using 'canary phrase' monitoring in system prompts.
AI persona drift quantified at 8 rounds
Claude Code shipped /goal autonomous mode with evaluator-judge architecture: separate Haiku model reads conversation transcripts and judges completion against 4,000-char condition spec. Reference pattern for any autonomous AI workflow.
A staff engineer kicked off Anthropic's autonomous coding mode
Google changed AI Overviews handling of 'best' queries — listicles self-ranking your product as #1 may now surface competitors instead. Run immediate SEO audit on 'best [category]' content.
Duolingo's 20% AI slop rate is your quality bar
Apple exploring AI agent governance for App Store (possible WWDC June 2026) — agents that spawn sub-apps bypass review pipeline. Architect agent capabilities within declared boundaries now.
Apple's agent App Store changes your distribution strategy
Google's Universal Commerce Protocol embeds Affirm + Klarna BNPL directly into Gemini/Search shopping — if you touch checkout or commerce, evaluate integration feasibility this quarter.
Google's Universal Commerce Protocol is your next integration decision
AI endpoints get indexed by Shodan within 3 hours and attract 175 hijacking attempts/week — add per-endpoint spend caps and auto-key-rotation as P0 before any LLM endpoint ships to production.
A backend engineer shipped a new inference endpoint on a Tuesday afternoon
CRM seats declining but spend rising 83%: Jason Lemkin cut Salesforce from 10+ humans to 2 humans + 1 API seat, now pays $22K (up from $12K). System of record becomes system consumed at API layer.
A sales operations lead opened her CRM three times on Tuesday
Microsoft hedging against OpenAI dependency — actively seeking startup deals as alternatives. If single-vendor on OpenAI, make leaving cheap via abstraction layer.
The Download from MIT Technology Review

◆ Bottom line

The take.

Your AI cost assumptions break in 30 days: Anthropic's June 15 pricing change eliminates the 70-90% implicit developer discount, ServiceNow already burned its full-year budget by May, and the GPU market is structurally scarce (4:1 demand ratio, OpenAI locking $20B in Cerebras capacity). Simultaneously, enterprise buyers are asking one question — 'can our agents call this without a UI?' — and SAP just put €100M behind the answer being yes. The PM who ships cost telemetry and an MCP-compatible API surface this quarter keeps the customer. The one who ships another dashboard feature loses them to whoever the agent picks instead.

Frequently asked

What exactly changes for Claude usage through third-party tools on June 15?: Claude usage through harnesses like Zed, Cline, Conductor, Openclaw, and T3 Code moves to a separate credit pool equal to the plan's dollar amount, with overage billed at full API rates. First-party Claude app and Claude Code usage is unaffected. The 50% rate limit bump is a two-month grace period, not a permanent concession, and effectively removes the 70-90% implicit discount many teams had baked into unit economics.
How should I prioritize the cost audit across different AI workflows?: Split workflows along two axes: load-bearing versus exploratory, and harness-replaceable versus not. Load-bearing workflows on replaceable harnesses are renegotiation candidates with Anthropic inside the 30-day window; load-bearing workflows that can't move should pilot Codex on the 2-month free offer this week. Exploratory usage should either follow whichever vendor is currently subsidizing or stop paying metered rates entirely.
Why did ServiceNow burn its full-year Anthropic budget by May, and how do I avoid the same outcome?: ServiceNow's CDIO couldn't identify which users or workloads drove the spend because Anthropic doesn't ship the per-user, per-workload telemetry enterprise buyers need. The fix is instrumentation, not budget — ship per-customer, per-feature inference cost telemetry before your next AI feature launch so you can see consumption before the invoice arrives, not after.
Is the OpenAI Codex 2-month free offer worth piloting given switching costs?: Yes, for at least one load-bearing workflow within 14 days. The displacement offer has a shot clock that expires when Anthropic stabilizes pricing, and OpenAI is specifically targeting the moment of developer frustration after losing the business adoption lead (32.3% vs Anthropic's 34.4% per Ramp). A pilot gives you negotiating leverage with Anthropic even if you don't migrate.
How much human QC capacity should I plan for AI-generated content pipelines?: Build a 20% human QC buffer into sprint capacity. Duolingo validated at scale that roughly one in five AI outputs is unusable and requires human intervention, and they reversed their blanket AI mandate as a result. Treat this as a structural COGS line rather than a quality problem that better prompting will eliminate.

◆ Same day, different angle

Read this day as…

◆ Recent in product