Product daily

Edition 2026-05-11 · read as Product

GLM-5.1BeatsGPT-5.4asSAPClosesAgentAPIAccess

Sources
11
Words
1,562
Read
8min

Topics AI Capital LLM Inference Agentic AI

◆ The signal

A product manager shipping on top of a frontier model this week watched GLM-5.1, a 744B-parameter MIT-licensed release, edge GPT-5.4 on SWE-Bench Pro by 58.4 to 57.7, and watched SAP close its APIs to third-party AI agents the same week. The model got cheaper to swap. The data got harder to reach. The honest question for the next sprint is which column a given product sits in: does it own a proprietary data pipe, or does it own the approval step that turns model output into a decision someone will sign. Owning neither is a two-quarter problem.

◆ INTELLIGENCE MAP

  1. 01

    Platform Agent Lockout Reshapes AI Distribution

    act now

    SAP blocked all third-party AI agents except its own Joule and Nvidia's NemoClaw, then spent €1B acquiring Prior Labs to build its own AI. Simultaneously, Anthropic's 10 finance agents caused FactSet to drop 8% in a day. Sierra proved standalone agents work at $150M ARR. The data layer is closing while agent startups succeed by owning the outcome.

    $150M
    Sierra ARR
    3
    sources
    • SAP AI lab invest
    • Sierra valuation
    • Fortune 50 penetration
    • FactSet drop
    1. Sierra ARR150
    2. SAP AI Lab1100
    3. FactSet Loss8
  2. 02

    Open-Weight Model Beats Frontier — Free

    act now

    GLM-5.1 scores 58.4 on SWE-Bench Pro (MIT license, $0) vs. GPT-5.4 at 57.7. xAI Grok 4.3 ships at $1.25/M input tokens with 1M context. Meanwhile 45% of practitioners say OpenAI lost default status. The cost floor for production AI just reset — models are free or near-free while incumbents charge 5-10x more for equivalent quality.

    $1.25
    Grok 4.3 per M tokens
    4
    sources
    • GLM-5.1 SWE-Bench
    • GPT-5.4 SWE-Bench
    • Grok 4.3 input price
    • Lost default status
    1. 01GLM-5.1 (MIT, $0)58.4
    2. 02GPT-5.457.7
    3. 03Claude Opus 4.657.3
  3. 03

    AI Monetization Warning Signs Multiply

    monitor

    Figma reports 75% of paying users consume AI credits weekly, but analysts flag unpredictable billing as churn risk. Snap's $400M Perplexity partnership collapsed in under 6 months. TCI liquidated $8B in Microsoft on the thesis AI displaces Office. Short sellers are now targeting 'clean' incumbents for tech disruption. The market is stress-testing every AI revenue model simultaneously.

    $8B
    TCI MSFT liquidation
    4
    sources
    • Figma AI credit usage
    • Snap-Perplexity deal
    • TCI Microsoft sell
    • Industry AI ROI gap
    1. AI Infra Spend '26700
    2. AI Revenue '2540
  4. 04

    Supply Chain Fragility Hits Three Vectors Simultaneously

    background

    pgBackRest (critical PostgreSQL backup tool) died because its sole maintainer's company was acquired — no succession plan. A CNCF project (Antrea) was compromised via its Trivy security scanner. NVIDIA GPUs have a confirmed Rowhammer vulnerability bypassing IOMMU. Meta's 267TB piracy lawsuit creates model provenance liability for anyone shipping Llama-family models.

    267TB
    Meta piracy allegation
    1
    sources
    • pgBackRest status
    • Meta training data
    • NVIDIA vuln teams
    • CVE active exploit
    1. CVE-2026-31431Active exploit, patch now
    2. pgBackRestMaintainer lost, no fork
    3. Antrea CI/CDCNCF project compromised
    4. Meta lawsuit267TB provenance risk

◆ DEEP DIVES

  1. 01

    Platform Data Wars: SAP Closes the Gate While Anthropic Storms the Castle

    Two Moves, One Pattern: The Data Layer Is the New Moat

    A product manager at a mid-market ERP vendor spent Tuesday morning reading two news items and closed the tab with a worse roadmap than she woke up with. SAP blocked all third-party AI agents from its APIs, reserving access for its own Joule and Nvidia's NemoClaw, and in the same week committed €1B to acquire Prior Labs for a proprietary lab focused on structured enterprise data. The same week, Anthropic shipped 10 purpose-built finance agents covering pitchbooks, credit memos, KYC, and month-end close, wired through Microsoft 365 and Moody's data. FactSet dropped 8% in a session.

    Read together, the two moves describe one pattern. SAP is pulling the fence inward. Anthropic is routing around the fence by partnering with alternative data (Moody's) and reaching the workflow through Microsoft's pipes. The thing being pitched is "AI agents." The thing actually being done is a renegotiation of who charges rent on enterprise data access.

    The question on a PM's desk is no longer 'how good is our AI' — it is 'who owns the pipes our AI needs, and have they started charging rent yet.'

    Sierra Proves the Standalone Agent Category Is Real

    Sierra crossed $150M ARR serving more than 40% of the Fortune 50, and raised $950M at a $15B+ valuation. Those are production deployments at companies whose procurement takes two quarters to say yes. Bret Taylor separated the thing most teams conflate: an "AI assistant for X" is not the same product as "AI that owns the outcome of X." Enterprise buyers paid separately, rather than wait for the incumbent's marketplace, because the second framing commands roughly 10x the monetization.

    Short Sellers Confirm the Displacement Clock

    Viceroy Research, the firm that shorted Wirecard, is now publicly screening for "high margin businesses with clean balance sheets that happen to be in the crosshairs of a new technology roadmap." That is professional capital running the displacement thesis as a systematic filter. Competitive narrative has to answer the AI-moat question before the short report lands, not after.

    The 2x2 for Your Integration Roadmap

    Platform API DependencyThin Integration / Own Data
    Platform admin buyer⚠️ Danger cell — SAP just redrew thisModerate risk
    Line-of-business buyerVulnerable — migrate this quarter✅ Safe cell — build here

    Any integration roadmap that assumes a major platform's API will stay open should be rewritten this sprint, with longer procurement cycles and rev-share priced in. The work this week is naming the one integration on the roadmap to kill before engineering spends another six weeks on an API that may be gone by the time the feature ships.

    Action items

    • Audit all integrations that depend on SAP, Salesforce, ServiceNow, or Workday APIs and classify each as 'endorsed partner' vs 'at risk of lockout' by end of this sprint
    • Evaluate whether your product owns the data, the workflow, or the approval step — document which for your top 5 revenue-generating features by next planning cycle
    • Model a 'Sierra-style' outcome ownership positioning for your core use case — what measurable business result could you guarantee end-to-end?

    Sources:A procurement lead at a Fortune 500 opened her SAP admin console this week · An analyst at a mid-sized asset manager opened FactSet this morning · A designer opened Figma on Tuesday morning

  2. 02

    The Open-Weight Parity Moment: Your Vendor Contract Reprices This Quarter

    The Numbers That Changed This Week

    A platform lead pulled up her inference contract this week and noticed the renewal is in February. Zhipu AI released GLM-5.1, a 744B-parameter MoE model (40B active per token) under an MIT license. It scores 58.4 on SWE-Bench Pro, above GPT-5.4 at 57.7 and Claude Opus 4.6 at 57.3. It ships with 200K context and 8-hour autonomous execution. This is not "almost as good." It is 0.7 points better than the closed frontier on a coding benchmark, and the license costs nothing.

    The same week, xAI priced Grok 4.3 at $1.25 per million input tokens ($2.50 output) with 1M context, always-on reasoning, native multimodal, and live web search. That is roughly 5x cheaper than GPT-5.4 and Claude Opus 4.7 for equivalent-class capabilities.

    Any unit economics model that assumed the inference cost of six months ago is now either leaving margin on the table or exposed to a competitor who will pass the savings through.

    The Sentiment Shift Has Numbers

    A poll of 201 AI practitioners found 45% believe OpenAI has lost its default-lab position. Only 16% see it as a two-horse race with Anthropic. Another 20% think open-weight models close the gap first. That is sentiment, not usage. Practitioner sentiment leads procurement behavior by 6-12 months. OpenAI meanwhile is redirecting attention to government distribution: $1 ChatGPT for federal agencies, a $200M Pentagon contract, lobbying spend up 577% to $1.76M. A lab winning on model quality does not need to price ChatGPT at a dollar.

    Separate the Two Decisions

    Teams keep bundling these into one story. They are two different decisions with two different owners:

    1. The open-weights decision: whether to self-host GLM-5.1 for coding and structured workloads. This is an infra investment with a capex payback calculation, and it belongs to the platform team.
    2. The routing decision: whether to send cheap or tolerant traffic to Grok 4.3 while keeping the incumbent on quality-critical paths. This is a sprint-level change, not a migration, and it belongs to whoever owns the gateway.

    Google DeepMind's Gemma 4 Multi-Token Prediction drafters add a third lever: 3x inference speedup without quality loss via speculative decoding, available across vLLM, MLX, and Transformers. For a voice agent or a code-completion surface where the 2-3 second round trip was the real product blocker, sub-second becomes achievable.

    The Decision Framework

    Tolerates Model SwapNeeds Regression Testing
    Contract reprices in 90 daysSelf-host estimate this sprintUse Grok pricing as negotiation anchor
    Volume growth is the driverRoute cheap traffic nowBenchmark on your actual prompts first

    The honest recommendation: staff one cell this sprint and let the other wait a sprint. Hedging both reads prudent in the deck and costs a quarter on the roadmap. Use Grok 4.3 at $1.25/M as the contract renewal anchor regardless of which cell gets the headcount.

    Action items

    • Run a cost-performance comparison of GLM-5.1 (self-hosted, $0) and Grok 4.3 ($1.25/M) against your current provider for your top 3 inference-heavy features — complete by Friday
    • Use xAI Grok 4.3 pricing ($1.25/M input) as the anchor in your next OpenAI/Anthropic renewal or volume commitment discussion — brief your procurement team this week
    • Build or validate a model abstraction layer that allows per-workload routing across 2-3 providers without feature-level code changes — scope by end of sprint
    • Re-score latency-gated features in your backlog assuming 3x inference speedup (Gemma 4 MTP) — bring updated estimates to next planning meeting

    Sources:A product lead opened her cost model on Tuesday morning · A head of platform opened her vendor review doc this week · A product manager opened the agent evaluation dashboard on a Tuesday morning · $700B AI spend vs $40B revenue

  3. 03

    AI Partnership & Monetization Models Under Stress: Three Case Studies

    Snap-Perplexity: $400M Deal Dead in 6 Months

    A BD lead at Snap closed a $400M partnership with Perplexity and six months later was writing the post-mortem. The deal "amicably ended" in Q1 2026 after the two companies could not agree on "a path to broader rollout," and Snap's 2026 guidance now assumes zero revenue contribution. Two well-capitalized companies with aligned strategic interest still could not agree on what the product looks like at 10x scale. This reads as a BD failure. It is a product alignment failure that surfaced at the BD table.

    An AI partnership term sheet should carry explicit answers before signing. Start with UX at 10x scale and who owns product decisions when usage diverges from projections. The kill criteria come last, which is the section most decks skip.

    Figma: 75% AI Credit Consumption Is a Countdown, Not a Milestone

    A designer at a Figma customer burns through her weekly AI credits by Wednesday and keeps working without them for the rest of the week. 75% of paying customers consume AI credits weekly, and usage is increasing over time. Figma picked a hybrid model: bundled credits per seat plus optional overage. The bill for that generosity is -$0.26 EPS. The question Thursday's earnings call should answer is what the cohort that hits a credit ceiling actually does. If they upgrade, the model works. If they use the feature less, 75% consumption is measuring frustration, not value.

    Adam Mansfield at UpperEdge notes enterprise buyers are "anxious about unpredictable AI bills." The monetization 2x2 has predictable vs. variable cost to customer on one axis and predictable vs. variable margin to vendor on the other. Figma picked predictable for the customer and variable for itself. Thursday's print ($316M expected, +38.5% YoY) will show whether that cell is sustainable.

    The $700B Subsidy Won't Last

    Hyperscalers are spending $700B on AI infrastructure in 2026 against roughly $40B in total industry AI revenue from 2025. That is a 17.5x investment-to-revenue gap. Every API call today is priced below true economic cost. Either AI revenue expands 17x or the spending contracts. Both outcomes break current pricing equilibria. The PMs planning for this are stress-testing features at 2-3x current API costs and locking in rates where the contracts allow.

    The Pattern Across All Three

    Each case surfaces the same structural risk: AI monetization built on assumptions that won't survive contact with scale. Snap already ran the experiment and wrote off $400M. Figma's bet is that credits create value rather than frustration, and the 75% number reads either way depending on what the credit-ceiling cohort does next quarter. The broader industry assumes today's inference pricing holds, which is the assumption with the shortest half-life. The diagnostic to bring to the next product review: name the kill criterion that would end the bet and the cohort whose behavior would trigger it. A monetization plan that cannot answer both is still a deck.

    Action items

    • Structure any pending AI partnership deals with explicit product rollout milestones, shared KPIs, scale-path UX definition, and kill criteria — add these clauses before signing, not after
    • Pull the cohort of users who hit any AI credit/usage limit in the last 30 days and analyze their 7-day post-limit behavior — do they upgrade, reduce usage, or churn?
    • Stress-test your AI feature unit economics at 2-3x current API/compute costs — identify which features break and which survive

    Sources:A designer opened Figma on Tuesday morning · A procurement lead at a Fortune 500 opened her SAP admin console this week · $700B AI spend vs $40B revenue

◆ QUICK HITS

  • Update: Stanford AI Index quantifies agent failure rate at 33% on structured tasks — META ProgramBench shows 0% full-completion across 200 real-world coding tasks. Design recovery UX as first-class surface, not fallback.

    A product manager opened the agent evaluation dashboard on a Tuesday morning

  • CopilotKit raised $27M for its open-source AG-UI protocol embedding AI agents inside enterprise apps — already serving Cisco, Docusign, and Deutsche Telekom. Evaluate as build-vs-adopt decision for your embedded agent features.

    A procurement lead at a Fortune 500 opened her SAP admin console this week

  • SubQ emerged from stealth claiming 12M-token native context window with ~1,000x attention compute reduction — if validated, chunking-heavy RAG architectures become technical debt. Keep modular.

    A procurement lead at a Fortune 500 opened her SAP admin console this week

  • Anthropic Claude Skills uses progressive disclosure (tiny descriptions in memory, full instructions load on match) — each Skill takes 15-30 minutes to build but creates a non-portable Claude-specific asset. Cap investment at rebuild-in-one-sprint threshold.

    A product lead opened her cost model on Tuesday morning

  • OpenAI Codex Chrome extension reads DOM states and operates inside authenticated browser sessions (Salesforce, Gmail, LinkedIn) — schedule a threat-modeling session for what happens when agents operate inside your product via the UI, not your API.

    A product lead opened her cost model on Tuesday morning

  • pgBackRest (critical PostgreSQL backup tool) is dead — sole maintainer's company acquired, no succession plan. Audit your dependency tree for bus-factor-1 projects, especially those with recent M&A at the maintainer's employer.

    A security engineer opened the same dependency graph three times this morning

  • NVIDIA GPUs confirmed vulnerable to Rowhammer in GDDR memory bypassing IOMMU — three independent research teams verified, one achieving complete system control. Flag for infra/security team.

    A security engineer opened the same dependency graph three times this morning

  • Thematic packaging outperforms raw product: Tuttle's UFOD ETF repackages standard defense holdings under a narrative and captures 10x engagement. Identify 2-3 existing features that could be repackaged for specific audience segments.

    Thematic packaging is outperforming raw product

◆ Bottom line

The take.

An MIT-licensed open model now beats GPT-5.4 on coding benchmarks while costing zero, xAI undercuts incumbents by 5x at $1.25/M tokens, and SAP just proved platforms will lock third-party agents out of their data without warning — all in the same week. The model layer commoditized, the data layer closed, and institutional money ($8B TCI liquidation, Viceroy shorts) is actively betting against every SaaS company that owns a workflow but not the data underneath it. The work this week: use the new price floor as your contract renewal anchor, audit which of your integrations depend on APIs that could close, and identify whether your moat is the model (worthless), the workflow (compressing), or the data plus approval step (durable).

— Promit, reading as Product ·

Frequently asked

How should I decide between self-hosting GLM-5.1 and routing traffic to Grok 4.3?
Treat them as two different decisions with two different owners. Self-hosting GLM-5.1 is a capex-style infrastructure investment owned by the platform team and justified by workloads that tolerate a model swap. Routing cheap or latency-tolerant traffic to Grok 4.3 at $1.25/M input is a sprint-level gateway change. Staff one cell this sprint and let the other wait — hedging both costs a quarter.
What does SAP closing its APIs to third-party AI agents actually mean for my roadmap?
It means any integration assuming a major platform's API stays open needs to be rewritten this sprint, with longer procurement cycles and rev-share priced in. SAP reserved access for Joule and Nvidia's NemoClaw and bought Prior Labs for €1B the same week. Salesforce, ServiceNow, and Workday are likely to follow within 2–3 quarters, so audit dependencies now and classify each as endorsed-partner or at-risk-of-lockout.
If my product owns neither a proprietary data pipe nor the approval step, what do I do?
Pick one and start building toward it this quarter, because owning neither is a two-quarter problem. Owning the approval step — the moment model output becomes a decision someone signs — is usually faster to reach than building a proprietary data pipe. Sierra's $150M ARR shows buyers pay roughly 10x more for outcome ownership than for feature-level AI assistants, so reframe the pitch from 'AI-powered X' to 'we own the result of X.'
How do I know if my AI credit consumption metric is healthy or a warning sign?
Look at what users do after they hit the ceiling, not at the consumption rate itself. Pull the cohort that hit any AI credit limit in the last 30 days and track 7-day behavior: upgrade means the pricing model is harvesting willingness-to-pay, reduced usage or churn means high consumption is measuring frustration. Figma's 75% weekly consumption reads either way until that cohort analysis exists.
How should I price-test AI features given the $700B infrastructure spend versus $40B revenue gap?
Stress-test unit economics at 2–3x current API and compute costs and flag which features break. The 17.5x investment-to-revenue gap means today's inference is subsidized, and correction will arrive as price increases, rate limits, or both. Features that only pencil at current prices are bets on the subsidy continuing, not durable products — lock in rates where contracts allow and use Grok 4.3 pricing as the negotiation anchor.

◆ Same day, different angle

Read this day as…

◆ Recent in product

Keep reading.