How fast should we expect Chinese inference pricing to pressure US AI budgets?

The frontier-pricing assumption underneath most AI budgets has a 6-12 month shelf life. DeepSeek V4 Pro is already pricing at roughly 11x below Claude Opus on input and 28x on output, with Z.ai and MiniMax disclosing 50-70% gross margins at those prices. Vendor contracts signed this quarter without a cost-collapse scenario will be renegotiated under duress.

What gross margin should an AI-native P&L actually assume?

Stress-test against a 17-25% gross margin floor, not the 70-80% historically assumed for SaaS. At 1M users and $120 ARPU under current inference pricing, the math lands at 17% gross and 11% net, because reasoning models consume 10-100x more tokens and offset per-token price drops. Board decks still modeling 70%+ margins describe a business that does not exist at this cost structure.

Which promotional tactics still work when an AI agent is the buyer?

Only product ratings produced reliable positive lift across a study of 16,000 AI shopping rounds. Seven of eight traditional CRO tactics — scarcity badges, anchored discounts, countdown timers, bundle psychology — failed or produced negative lift, and GPT-5 actively penalized aggressive promotional cues. Schema markup separately showed zero measurable citation lift across 1,885 pages.

What does OpenAI deprecating finetuning mean for AI differentiation strategy?

It collapses the middle path, forcing a binary choice between RLFT on open weights or pure prompt engineering on hosted APIs. Tier 1 (Cursor, Cognition) compounds proprietary data into weights and requires ML research engineers and self-hosted infrastructure; Tier 2 competes on system design, retrieval, and integration speed. Straddling produces neither the compounding moat of the first nor the velocity of the second.

How should pricing architecture change as agents replace human interaction?

Move to a stacked model: seat-based pricing for humans plus consumption-based pricing for agent activity. Box and Salesforce are converging on this because enterprise interaction is shifting from 90% human / 10% agent toward the inverse within a few years, and token allocation is growing from roughly 1% to 10% of enterprise spend. Seat-only pricing leaves the multiplicative agent demand uncaptured.

Edition 2026-05-14 · read as Leader

ChineseLabsUndercutUSInferencePricingby10-28x

Sources: 33
Words: 1,609
Read: 8min

Topics LLM Inference Agentic AI AI Capital

◆ The signal

Chinese labs are pricing inference 10-28x below the US frontier and still running 50-70% gross margins, which is what 4-7x compute efficiency looks like when export controls force it. The frontier-pricing assumption underneath most AI budgets has a 6-12 month shelf life. A vendor contract signed this quarter without a cost-collapse scenario in the model is one that gets renegotiated under duress instead of by choice.

Key facts

DeepSeek V4 Pro prices inference at $0.43 per million input tokens versus roughly $4.73 for Claude Opus, an 11x input and 28x output gap, while Z.ai's GLM-5 reports 50% gross margins at $1 per million tokens.
Chinese AI labs are extracting 4-7x more performance per compute unit than US scaling curves predict, and China's monthly token volume now runs 2.25x US and Western volumes combined (9 quadrillion vs 4 quadrillion).
A study of 16,000 AI shopping rounds across four models found 7 of 8 traditional promotional tactics produced negative lift with agent buyers, with only product ratings yielding reliable positive signal.
OpenAI deprecated its finetuning APIs, eliminating the middle path between consuming hosted models and running open weights, while Cognition raised at a $25B valuation on an RLFT-on-open-weights strategy.
Google retired ChromeOS after 15 years and folded it into Android with Gemini Intelligence as the primary interaction layer, with Acer, ASUS, Dell, HP, and Lenovo shipping Googlebooks in fall 2026.

◆ INTELLIGENCE MAP

01
AI Cost Structure Inverts: Chinese Efficiency + Margin Collapse
act now
Chinese labs extract 4-7x more capability per compute unit while pricing 10-28x below US competitors at 50-70% margins. Simultaneously, AI-native businesses structurally cap at ~17% gross margin vs SaaS's 70%. 4B-parameter models now match frontier performance. The premium-compute thesis is being outflanked.
4-7x
Chinese efficiency advantage
4
sources
- Chinese pricing gap
- Chinese gross margins
- AI-native gross margin
- China token volume
- US token volume
1. DeepSeek V4 Pro0.43
2. Claude Opus 4.64.73
3. Z.ai GLM-51
02
Agent-First Platforms Replace the Interface Layer
monitor
Google killed ChromeOS, merged with Android, and made Gemini the primary interaction layer with 5 OEM partners shipping fall 2026. Amazon replaced its search bar with an AI agent that buys from competitor sites. Study of 16,000 AI shopping rounds: 7 of 8 traditional conversion tactics fail. The interface is no longer the product.
5
OEM partners signed
6
sources
- OEMs signed
- Promo tactics that fail
- Ship timeline
- Only tactic that works
1. Product ratings85
2. Scarcity badges12
3. Anchored pricing15
4. Bundle upsells8
5. Countdown timers5
03
OpenAI Finetuning Kill Forces a Three-Year Strategy Bet
act now
OpenAI deprecated finetuning APIs, eliminating the middle path between 'consume frontier as utility' and 'run your own weights.' The market bifurcates: top 1% (Cursor, Cognition) invest in RLFT on open models as their moat. Everyone else competes on prompt engineering — a discipline every competitor has equal access to. Straddling is the worst outcome.
1%
who should run own weights
3
sources
- Cognition valuation
- GB200 latency gain
- PD disaggregation gain
- Anthropic run rate
1. Tier 1: Own RLFT1
2. Tier 2: Prompt eng.99
04
Enterprise AI Adoption Bottleneck — Metrics Are Broken
monitor
Employees at Amazon, Meta, and Microsoft are gaming internal AI adoption metrics. Google, OpenAI, and Anthropic all launched enterprise deployment arms in the same week — an industry admission that implementation, not capability, is the binding constraint. PE firms (Vista $750M, Blackstone, KKR) becoming primary distribution channel.
$750M
Google-Vista AI fund
5
sources
- Frontier labs deploying
- Google deploy engineers
- PE partners engaged
- Token spend share
1. GoogleForward-deployed engineering team (hundreds)
2. OpenAIDeployCo with consulting partners
3. AnthropicPE joint venture
4. Vista/KKR/BXStructured AI distribution funds
05
Macro Regime Shift: Inflation Crosses Wages, Fed Paralyzed
background
April CPI hit 3.8% against 3.6% wage growth — first negative real-wage crossover since 2023. New Fed chair Warsh inherits a market pricing hikes while his instinct is cuts. 10-year at 4.46%. Cheap money is not arriving on any schedule that helps 2026 planning. Platform-layer players consolidating while application-layer gets squeezed.
3.8%
April CPI
2
sources
- April CPI
- Wage growth
- 10-year yield
- Energy YoY
- Microsoft YTD
1. Feb CPI2.4
2. Mar CPI3.3
3. Apr CPI3.8
4. Apr Wages3.6

◆ DEEP DIVES

01
The Compute-Scale Thesis Has a Price Tag — And Chinese Labs Just Published It
The Numbers That Rewrite the Budget
Data from fourteen Chinese AI labs undercuts the premise behind roughly a trillion dollars of planned US AI infrastructure spending. The labs are extracting 4-7x more performance per compute unit than US scaling curves predict. The deployable compute gap widened from 3x in 2023 to 8x in 2026, and yet the capability gap is six to eight months. Those three numbers do not sit comfortably together unless the scaling curves are wrong.
The constraint created the capability. Export controls forced efficiency innovation that is now a permanent Chinese advantage. Removing the controls does not remove the capability.
The Pricing Floor US Labs Cannot Match
DeepSeek V4 Pro prices at $0.43 per million input tokens against roughly $4.73 for Claude Opus, an 11x gap on input and 28x on output. Z.ai ships GLM-5 at a dollar per million tokens and reports 50% gross margin. MiniMax claims 70% enterprise margins at comparable prices. A reasonable skeptic would call these loss leaders. The margin disclosures say otherwise, and the cost to serve is genuinely lower.
China's token volume now runs 2.25x US and Western volumes combined. Nine quadrillion tokens per month against four. That is an inference-stage flywheel, and it compounds the efficiency advantage rather than sitting beside it.
Why This Isn't Just a China Story
The same economics are showing up domestically from a different direction. Research has now produced 4-billion-parameter recursive language models matching Claude Sonnet 4.6 through architecture rather than scale. Cactus Needle distilled Gemini into 26M parameters running at 6,000 tokens per second on consumer hardware. Perceptron entered video analysis at 80-90% below incumbent pricing.
Two other angles arrive at the same destination. Academic research is producing small-model parity without scale, and AI-native businesses are landing at 17% gross on the P&L side. Add the Chinese efficiency story and three independent pressures agree on one claim: the premium attached to frontier access has a shorter shelf life than the contracts being signed against it.
What This Means for the AI P&L
Run the math at 1 million users and $120 ARPU under current inference pricing. The result is 17% gross margin, 11% net. Reasoning models consume 10 to 100x more tokens than their predecessors, which offsets the per-token price drops entirely. The five-year LTV projection built on 80% gross margins describes a business that does not exist at this cost structure.
The Strategic Fork
Jensen Huang joining Trump's China delegation is the tell that export controls are being renegotiated. If restrictions ease, Chinese labs pair frontier silicon with efficiency work they have already done. The compute-constrained-China premise has a shorter half-life than most 2026 plans assume.
The decision this quarter is whether the AI spend assumes a durable efficiency premium on frontier hardware, or assumes that premium compresses as Chinese techniques diffuse. The two assumptions produce very different twelve-month P&Ls. Most capex plans quietly assume the first. The second is now the more defensible forecast, and the gap between defensible and quietly assumed is where next year's write-down lives.
Action items
- Commission scenario analysis: model your AI cost structure and competitive position if Chinese models reach parity within 12 months at 10-28x pricing advantage
- Evaluate Chinese open-source models (Qwen3, DeepSeek R1, Kimi K2.5) for non-differentiated workloads and cost-sensitive inference
- Rebuild your AI-native P&L with 17-25% gross margin floor — stress-test pricing and packaging against this range
- Architect inference routing for multi-model deployment with dynamic cost optimization — accept the engineering tax now to own optionality
Sources:Exponential View · TLDR Founders · TLDR AI · Bloomberg Technology · TLDR Crypto
02
The Interface Layer Is Being Eaten — Your Product May Become a Tool Agents Call
Three Platforms Made the Same Move This Week

Google retired ChromeOS after fifteen years and folded it into Android, making Gemini Intelligence the primary interaction layer. The wording matters, because the framing is OS-as-model rather than assistant bolted on. Five OEM partners (Acer, ASUS, Dell, HP, Lenovo) ship Googlebooks in fall 2026. Amazon replaced its search bar with an AI agent and added "Buy for Me" functionality that purchases from competitor websites. Meta is positioning privacy-preserving AI as the interface for WhatsApp messaging. The channel underneath every consumer business is being rewritten on roughly the same timeline.

The interface is becoming the model. The channel underneath is being rewritten on the same timeline. A product can be reduced to a tool that Gemini calls on the user's behalf.

What Dies When Agents Are the User

A study of 16,000 AI shopping rounds across four models found that seven of eight traditional promotional mechanisms either failed or produced negative lift when the buyer was an agent. Scarcity badges, anchored discounts, countdown timers, bundle psychology, the entire CRO toolkit built for human cognitive biases, stopped working. Only product ratings produced reliable positive signal, and GPT-5 actively penalized aggressive promotional cues. That is the polite way of saying the model has been trained to distrust the exact playbook the last decade of e-commerce was optimized around.

Schema markup, separately, showed zero measurable citation lift across 1,885 pages. The machinery built to make products legible to search engines is not the machinery that makes them preferable to agents.

Where Defensibility Reassembles

The a16z thesis has clarified. AI makes recreating 80% of any system of record trivially cheap, and the remaining twenty percent, which is the undocumented SOPs and exception handling and compliance edge cases nobody wrote down, is where the moat actually lives. Value migrates in two directions at once.
- Downward: into data models, permissions, workflow logic, agent authorization.
- Upward: into networks, proprietary data generation, real-world execution.
Companies stuck in the middle, meaning pure software with no network effects and no regulatory moat, get compressed from both sides. Box CEO Levie frames the flip as enterprise software interaction moving from 90% human / 10% agent to 10% human / 90% agent within three years, with token and compute allocation growing from roughly one percent to ten percent of enterprise spend. A reasonable skeptic would point out that CEO timelines of this kind run late by a factor of two. The reasonable skeptic is probably correct. The direction of travel is not in dispute.

The Pricing Architecture That Survives

Levie's stacking model, seat pricing for humans plus consumption-based pricing for agent activity, is the architecture the market is converging on. An enterprise with 1,000 human seats generating ten times that activity from agents needs pricing that captures the ten times, not the one. Salesforce publicly conceded UI-layer stickiness is a depreciating asset with its April 2026 headless product announcement. When the largest SaaS company stakes its future on the data layer rather than the interface, the question every technology executive owes the board is where their own defensibility actually lives.

The timeline is not symmetric. Interface-heavy businesses have two quarters before the problem shows up in conversion. System-of-record businesses have four. The teams that rebuild data and ratings infrastructure now will compound the advantage over the next eight quarters. The teams that wait will spend those quarters explaining why conversion held everywhere except the channel that was actually growing.
Action items
- Map your product surface area against Google's agent-first model — identify where Gemini Intelligence intermediates your user relationship by end of Q3
- Audit your product portfolio: identify which products retain value if 70%+ of interactions come through agents, and which would be commoditized
- Prototype the stacking pricing model — seat + consumption architecture — for your top 3 products by Q4
- Strip aggressive promotional tactics from commerce pages and redirect budget to ratings infrastructure and structured product data within 90 days
Sources:AI Breakfast · Techpresso · a16z · Casey Newton · TLDR Marketing · TLDR

OpenAI Kills Finetuning — The AI Differentiation Strategy Just Bifurcated

The Middle Path Is Gone

OpenAI deprecated its finetuning APIs this week. For two years, teams building on the frontier had a comfortable third option between "consume the model as-is" and "run your own weights." That option is gone. Combined with Sora's cancellation and the Symphony agent tease, the through-line is clear: OpenAI is becoming a product company that deploys agents on the customer's behalf, not a platform that rents building blocks.

The bifurcation was already there. Hosted finetuning was the diplomatic fiction that let leadership avoid picking. That fiction is no longer available.

Two Tiers, Two Businesses

Dimension	Tier 1: RLFT on Open Weights	Tier 2: Prompt Engineering
Who	Top 1% (Cursor, Cognition)	Everyone else (99%)
Moat	Proprietary data baked into weights	System design, data pipelines
Cost	High (own infrastructure)	Lower (API consumption)
Defensibility	Compounding (model improves with use)	Shared (competitors have equal access)
Hiring profile	ML research engineers	Full-stack + prompt engineers

The talent pools barely overlap. The infrastructure commitments run on different timelines. The worst outcome is straddling — doing 'a little RLFT' produces neither the compounding advantage of deep customization nor the speed of a pure prompting stack.

The Evidence for Tier 1

Cursor and Cognition are doubling down on open-model RLFT and treating it as their durable differentiation. Cognition just raised at $25B. The market is paying up for applied intelligence over foundational research — the pricing power is moving toward the application layer. GB200 benchmarks show 40-47% latency improvements with up to 7x per-GPU throughput via PD disaggregation, making self-hosted large MoE serving economically viable for the first time.

The Evidence for Tier 2

Context windows got longer. Retrieval ate most finetuning use cases. Prompt engineering became a real discipline. For 99% of companies, the honest answer is that their "finetuning moat" was tone, format, and domain vocabulary that a decent RAG pipeline handles anyway. The question is whether that's acceptable when every competitor has the same access to the same frontier.

The Org Design Implications

GitLab restructured into 60 agent-embedded teams and publicly stated software will be "built by machines, directed by people." Anders Hejlsberg (TypeScript creator) predicts developers become project managers directing AI agents. The tier you pick determines your org model: Tier 1 optimizes for specification quality and model training expertise. Tier 2 optimizes for system architecture and integration speed.

This is not a procurement exercise. It is a three-year commitment to a competitive posture. This quarter's decision is next quarter's recruiting plan, infrastructure budget, and pricing floor.

Action items

Determine which tier your organization should operate in and present the recommendation to your board within 60 days
If Tier 1: initiate open-model RLFT capability buildout — hire 2-3 ML research engineers and secure GPU infrastructure commitments before GB200 capacity is absorbed
If Tier 2: invest in systematic prompt engineering infrastructure, long-context approaches, and outcome measurement — your differentiation is system design and data, not model customization
Audit engineering org for developer-as-PM transition readiness — identify roles where specification quality matters more than implementation speed

Sources:AINews · TLDR Founders · TLDR IT · The Pragmatic Engineer · TLDR AI

◆ QUICK HITS

Update: Shai-Hulud supply chain worm now wipes systems when operators attempt token revocation — creating a hostage dynamic that inverts standard incident response playbooks
Risky.Biz
Update: Microsoft -16% YTD (worst big tech performer) as TCI exits — market actively discounting conglomerate structures against AI-focused purity plays
Martin Peers
Employees at Amazon, Meta, and Microsoft are gaming internal AI adoption metrics — running unnecessary tasks to hit leaderboards, meaning your own AI KPIs are likely measuring performance theater, not value
TLDR IT
April CPI hit 3.8% against 3.6% wage growth — first negative real-wage crossover since 2023, with new Fed chair Warsh inheriting a market pricing hikes against his dovish instincts
Morning Brew
shadcn/ui has become the de facto standard for AI-generated interfaces across Figma Make, Cursor, and Claude — your design system is being routed around one PR at a time
TLDR Design
Coinbase x402 processed 178.7M agentic payments ($42.4M) through Amazon Bedrock — stablecoin rails settling in fractions of a cent vs. card networks' 3-4 cents, with no viable alternative at scale
TLDR Crypto
21% of top ML conference peer reviews are now fully AI-generated — detection works against bulk slop but skilled writers evade trivially, making provenance infrastructure the only durable answer
The Algorithmic Bridge
Jensen Huang boarded Air Force One for Trump's China trip — Nvidia's presence signals export control regime is being actively renegotiated, not defended
Bloomberg Technology

◆ Bottom line

The take.

The AI cost structure just inverted from three directions simultaneously: Chinese labs deliver comparable capability at 4-7x the compute efficiency and 10-28x lower pricing, small models match frontier performance at 4 billion parameters, and AI-native businesses structurally cap at 17% gross margins instead of SaaS's 70%. Any AI strategy, budget, or vendor contract built on frontier-pricing assumptions has a 6-12 month shelf life — and the companies gaming their own AI adoption metrics internally are proving that most organizations haven't even honestly measured what they're getting for the spend they've already committed.

Frequently asked

How fast should we expect Chinese inference pricing to pressure US AI budgets?: The frontier-pricing assumption underneath most AI budgets has a 6-12 month shelf life. DeepSeek V4 Pro is already pricing at roughly 11x below Claude Opus on input and 28x on output, with Z.ai and MiniMax disclosing 50-70% gross margins at those prices. Vendor contracts signed this quarter without a cost-collapse scenario will be renegotiated under duress.
What gross margin should an AI-native P&L actually assume?: Stress-test against a 17-25% gross margin floor, not the 70-80% historically assumed for SaaS. At 1M users and $120 ARPU under current inference pricing, the math lands at 17% gross and 11% net, because reasoning models consume 10-100x more tokens and offset per-token price drops. Board decks still modeling 70%+ margins describe a business that does not exist at this cost structure.
Which promotional tactics still work when an AI agent is the buyer?: Only product ratings produced reliable positive lift across a study of 16,000 AI shopping rounds. Seven of eight traditional CRO tactics — scarcity badges, anchored discounts, countdown timers, bundle psychology — failed or produced negative lift, and GPT-5 actively penalized aggressive promotional cues. Schema markup separately showed zero measurable citation lift across 1,885 pages.
What does OpenAI deprecating finetuning mean for AI differentiation strategy?: It collapses the middle path, forcing a binary choice between RLFT on open weights or pure prompt engineering on hosted APIs. Tier 1 (Cursor, Cognition) compounds proprietary data into weights and requires ML research engineers and self-hosted infrastructure; Tier 2 competes on system design, retrieval, and integration speed. Straddling produces neither the compounding moat of the first nor the velocity of the second.
How should pricing architecture change as agents replace human interaction?: Move to a stacked model: seat-based pricing for humans plus consumption-based pricing for agent activity. Box and Salesforce are converging on this because enterprise interaction is shifting from 90% human / 10% agent toward the inverse within a few years, and token allocation is growing from roughly 1% to 10% of enterprise spend. Seat-only pricing leaves the multiplicative agent demand uncaptured.

◆ Same day, different angle

Read this day as…

◆ Recent in leader

ChineseLabsUndercutUSInferencePricingby10-28x

◆ INTELLIGENCE MAP

◆ DEEP DIVES

The Numbers That Rewrite the Budget

The Pricing Floor US Labs Cannot Match

Why This Isn't Just a China Story

What This Means for the AI P&L

The Strategic Fork

Three Platforms Made the Same Move This Week

What Dies When Agents Are the User

Where Defensibility Reassembles

The Pricing Architecture That Survives