Edition 2026-05-17 · read as Data Science
AnthropicEndsFlat-RateClaudePlans,BreakingCostModels
- Sources
- 36
- Words
- 1,969
- Read
- 10min
Topics Agentic AI LLM Inference AI Regulation
◆ The signal
Anthropic killed the flat-rate subscription model this week — Claude plans now convert to dollar-matched API credits, evaporating the 70-90% effective discount power users were getting on Agent SDK, GitHub Actions, and third-party harness calls. The same week, Vercel's production data confirmed 59% of all inference tokens are now agentic multi-turn traces. Your cost model is wrong on both the unit price and the workload shape simultaneously. Re-price every Claude-backed pipeline before the June 15 third-party tool credit cliff, or your Q3 budget is a fiction.
◆ INTELLIGENCE MAP
01 Anthropic Metering Cliff: Subscription Subsidy Dies June 15
act nowClaude subscriptions now cap programmatic usage at dollar-equivalent API credits. ServiceNow burned its full-year Claude budget by May. Opus 4.7 tripled image costs. OpenAI launched a 2-month free Codex promo the same day — a counter-offensive against the exact developers Anthropic just alienated.
- Effective discount lost
- Opus image cost jump
- Anthropic B2B share
- OpenAI B2B share
- June 15 deadline
- Anthropic B2B share34.4
- OpenAI B2B share32.3
02 59% Agentic Traffic: Eval Harnesses Score the Minority
act nowVercel's 200K-team production index shows 59% of tokens are multi-turn agentic traces, up from <20% six months ago. Anthropic captures 61% of spend (Opus), Google captures 38% of volume (Flash). Cost models built on 3:1 I/O ratios are off by 5x — agentic traces run 15:1 input-heavy with heavy cache divergence across vendors.
- Agentic token share
- Anthropic spend share
- Google volume share
- Six months ago
- Agentic I/O ratio
03 Lakehouse & MLOps CVE Cluster: Iceberg, Polaris, Argo CD
monitorApache Iceberg (CVSS 9.9) lets attackers redirect table metadata writes to attacker-controlled S3 — poisoning training data silently. Polaris (CVSS 9.9) broadens credentials cross-tenant. Argo CD (CVSS 9.6) exposes plaintext K8s Secrets. PraisonAI was weaponized within 4 hours of CVE disclosure. The entire data-to-model pipeline is now a named attack surface.
- Iceberg CVSS
- Polaris CVSS
- Argo CD CVSS
- PraisonAI time-to-exploit
- AI endpoint scan time
- 01Iceberg metadata redirect9.9
- 02Polaris credential leak9.9
- 03Argo CD secret extract9.6
- 04n8n SQLi + OAuth theft9.8
- 05Kestra SQLi9.8
04 Training Efficiency: 2-360x Cost Reduction Claims Landed
monitorThree research drops shift training economics: Nous TST delivers 2-3x wall-clock at matched FLOPs (validated 270M→10B MoE), Datology hit +11.7 VLM benchmark points at 17x less compute via pure curation, and NVIDIA Star Elastic claims 360x cheaper model-family production from one post-training run. GPU demand remains 4:1 at Nebius.
- TST wall-clock gain
- Datology compute saving
- Star Elastic family cost
- GPU demand:supply
- Nebius YoY growth
05 Autonomous Cyber Capability Crosses AISI Threshold
backgroundAnthropic's Mythos is the first model to clear both UK AISI simulated attack ranges — full network takeover in controlled environments. Mozilla's custom harness surfaced 271 Firefox bugs with the same model that found just 1 CVE in curl with a generic scanner. The 271-to-1 delta proves the harness, not the model, determines offensive yield.
- Mozilla bugs found
- curl bugs found
- MDASH Windows fixes
- Palo Alto products scanned
- AISI ranges cleared
- Mozilla (custom harness)271
- curl (generic scan)1
◆ DEEP DIVES
01 Anthropic's Metering Cliff: Every Claude Workload Needs Re-Pricing Before June 15
What Changed
Anthropic converted Claude subscriptions from flat-rate developer tools into dollar-matched API credit accounts. Every programmatic call now meters at list price: Agent SDK, claude-p, GitHub Actions, batch evals running through third-party harnesses. The 70-90% discount that power users were extracting via alternative harnesses is gone. Starting June 15, Claude usage through Zed, Conductor, OpenCode, and T3 Code gets a separate credit bucket with no rollover and overflow billed at API rates.
Why This Happened Now
Anthropic hired a CFO and is targeting an October IPO. Margin-per-token is now a board-level metric, and the alt-harness subsidy was the first line item to cut. At Code with Claude, Dario Amodei said they planned for 10x growth and hit 80x. That is an 8x miss against the capacity plan. The stopgap: leasing xAI's entire Colossus 1 cluster (220,000+ GPUs) to keep existing customers served.
The Cross-Source Pattern
Nine independent sources reported adjacent signals that resolve into one narrative:
Signal Source Implication ServiceNow burned full-year Claude budget by May Enterprise reporting Token economics at scale are wildly unpredictable without telemetry No native per-user/per-tool usage attribution CIO interviews You build the observability or you discover the overrun in finance Opus 4.7 tripled image-processing cost Ramp economist Per-modality pricing is shifting without notice OpenAI launched 2-month free Codex enterprise switch Altman announcement OpenAI pricing a counter-offensive at exact devs Anthropic alienated Ramp: Anthropic 34.4% vs OpenAI 32.3% Spend telemetry Enterprise procurement is genuinely multi-vendor now The contradiction worth surfacing: Anthropic is winning enterprise share while raising prices. Reliability is degrading at the same time, which the price move does not explain. The pattern only fits if switching costs are high enough that customers absorb short-term pain. Which is also why vendor abstraction is worth doing now, while there is leverage to negotiate.
The vendor that just rented a competitor's datacenter to keep your API live does not have the capacity margin to absorb your cost complaints. Price accordingly.
What Your Stack Needs This Sprint
The June 15 deadline is 30 days out. The reconciliation work that pays back, in rough order of payoff:
- Reconciliation against the new credit cap is the first job. Agent SDK runs, GitHub Actions, batch evals: anything sitting on a Max plan is now metered at list. Projecting monthly burn against the cap surfaces which jobs exhaust credits before month-end.
- A gateway with per-tenant tagging (LiteLLM, Portkey, or in-house) is the minimum bar. No Claude call should leave infra without
tenant_id,feature_id, and a prompt-family hash. Anthropic has explicitly offloaded observability to the customer. - The OpenAI Codex 2-month evaluation is an asymmetric-payoff free option, worth running on the existing harness with matched prompts and tool schemas. The thing pass rates don't tell you is how agents solve. That is the comparison that matters.
Action items
- Audit all Claude-backed workloads (Agent SDK, GitHub Actions, batch evals) and project monthly token burn against new credit caps by end of this sprint
- Deploy LLM gateway with per-user/per-feature/per-tenant tagging in front of all Claude traffic within 2 weeks
- Activate OpenAI's 2-month Codex enterprise switch promo and run head-to-head evaluation against Claude on your top 5 production task classes
- Re-price any multimodal pipeline using Opus 4.7 against GPT-4V and Gemini on your actual image workload
Sources:Claude just metered your agent SDK calls · Anthropic passes OpenAI in B2B · Claude Code latency on long-context requests drifted upward · Anthropic ships no per-user usage telemetry · Vercel published a number worth sitting with · Agentic traffic crossed fifty-nine percent
02 59% Agentic Traffic: Your Eval Harness and Cost Model Both Measure the Wrong Thing
The Production Reality Has Shifted Under the Measurement
Vercel's AI Gateway, covering 200,000 teams over 7 months, puts agentic workloads at 59% of all token volume. Six months ago that number was under 20%. The composition of production inference has moved faster than most teams' eval harnesses and cost models have tracked.
The spend-volume split is the structural finding: Anthropic captures 61% of spend via Opus on planning and reasoning nodes, while Google captures 38% of volume via Flash on high-throughput utility calls. That is a tiered-routing signature emerging in production without most teams designing for it explicitly.
Why Existing Harnesses Are Measuring the Minority
Most eval harnesses still score single-turn responses against reference answers. That design was correct in 2023. It scores the 41% minority of production traffic today. The 59% that is agentic — multi-turn, tool-calling, bursty, with heavy cache reuse on some providers and none on others — is unmeasured.
Dimension Single-turn eval (legacy) Trajectory eval (needed) Metric Accuracy on final answer Task completion + cost path + tool-call precision Cost model input 3:1 I/O ratio 15:1 input-heavy with variable cache-hit rates Failure detection Wrong final answer 40K-token planning loop that eventually gives up Provider comparison $/1M tokens $/successful-task at matched quality A forecast built on last year's 3:1 ratio is off by roughly 5x on spend, and the error is not symmetric across vendors. One customer reported burning monthly budget in 9 days. The cause was not a bad prompt. The workload mix shifted under the model.
Adjacent Signals That Confirm the Pattern
Abridge runs 80M+ clinical conversations/year through a constellation of models with cheap-fast triage in front and expensive reasoning behind it. That is the same tiered pattern the Vercel data implies. SAP's Autonomous Enterprise and ServiceNow's Action Fabric both converged on Knowledge Graph + MCP-exposed workflows for agent execution. CyberGym showed multi-agent decomposition (scan → debate → exploit) outperforming monolithic models. The single-model, single-vendor, single-turn assumption fails on every axis at once.
Glean's benchmark — off-the-shelf MCP uses 30% more tokens than a tuned knowledge graph — points at a specific failure mode. MCP tool listings inflate context windows, and naive tool outputs return verbose blobs where a reranked snippet would do. The benchmark is vendor-published, methodology undisclosed. The thing this number doesn't tell you is how it behaves on your tool surface. Run your own measurement.
If 59% of your tokens are agentic but 100% of your evals are single-turn, you're flying instruments-out on the majority of production traffic.
The Rebuild
Two pieces need rewriting this quarter, and neither is glamorous:
- Eval harness: Add trajectory-level metrics — task success rate, tool-call precision/recall, steps-to-completion, cost-per-successful-task, recovery-from-error rate. These correlate with production behavior. Single-turn scores do not.
- Cost model: Treat input-to-output ratio and cache-hit rate as inputs, not constants. Segment by workload type (agentic vs interactive) and provider. The Vercel data shows customers already route this way. The forecasting should match.
Action items
- Add trajectory-level metrics (tool-call F1, steps-to-completion, cost-per-successful-task) to eval harness alongside existing single-turn benchmarks within 2 weeks
- Run a 1-hour spike comparing your MCP/tool-calling overhead vs. a retrieval-first baseline on 100 production traces this week
- Segment inference cost reporting by workload type (agentic vs single-shot) and update forecasting model to use measured I/O ratio and cache-hit rate per provider
- Prototype decompose-debate-verify pipeline on one auto-verifiable workload (code gen, SQL, extraction) and compare against single-agent baseline at matched token cost
Sources:Agentic traffic crossed fifty-nine percent · Vercel published a number worth sitting with · The CyberGym result · MCP plus knowledge graphs · Abridge runs model routing across 100M conversations
03 Lakehouse Trust Boundary Collapsed: Iceberg CVSS 9.9 Lets Attackers Poison Your Training Data
This Week's CVEs Cluster in the Data Layer, Not the Model
This week's CVE disclosures hit the data and MLOps stack with unusual concentration. The previously-reported LiteLLM KEV (covered May 12) was the gateway. The new disclosures extend the attack surface to the storage layer underneath it:
- Apache Iceberg (CVE-2026-42812, CVSS 9.9): attackers with table-write permission can redirect metadata pointers to attacker-controlled S3 prefixes, after which the next query reads poisoned Parquet and the next training run ingests corrupted features without an obvious signal.
- Apache Polaris (CVE-2026-42809/10/11, CVSS 9.9): credential-broadening bugs enable cross-tenant access. Chained with the Iceberg bug, there is a plausible path from a compromised analyst notebook to cross-tenant data theft.
- Argo CD 3.2.x/3.3.x (CVE-2026-42880, CVSS 9.6): low-privilege users can extract plaintext Kubernetes Secrets from any reachable namespace, including model-registry tokens, HF PATs, and cloud credentials.
Why This Is Worse Than a Standard Patch Tuesday
These are not obscure memory-corruption bugs. They are authorization failures and unsafe input handling in Python, Go, and Java tools that shipped fast. The pattern mirrors what happened to web frameworks a decade ago: ML infrastructure grew quickly and is now receiving security attention for the first time.
Component CVSS Blast Radius in Your Stack Patch Priority Apache Iceberg 9.9 Poisoned tables, corrupted training data, silent feature drift Immediate Apache Polaris 9.9 S3/GCS creds, cross-tenant access Immediate Argo CD 3.2/3.3 9.6 All K8s Secrets in reachable namespaces Immediate + rotate n8n workflow 9.8 Workflow DB, OAuth sessions This week Kestra ≤1.3.3 9.8 Pipeline metadata, schedules This week The Iceberg metadata redirect is the one most ML teams should worry about. Default lakehouse observability watches row-level changes, not pointer mutations, so an attacker who redirects table metadata to a poisoned S3 prefix gets the next
SELECT *to read fabricated data, the nextspark.read.table()to ingest it into features, and the next model to train on it. Nothing alerts because the schema didn't change, only the data underneath it.The CVE that matters for training data isn't the loudest one; it's the one that changes what your query returns without changing what your schema reports.
Honeypot Data Quantifies the Window
Separately, honeypot research showed exposed AI inference endpoints (Ollama, LangServe, MCP servers) get indexed by Shodan within 3 hours and absorbed 113,000+ requests in a month, with 23% specifically probing AI paths. PraisonAI was weaponized within 4 hours of CVE disclosure. That is the observed patching window, not a theoretical one.
Concrete Monday Morning Actions
- Iceberg/Polaris: catalog audits and explicit storage credential scoping are the cheap mitigations, with write-path allowlisting for table metadata locations as the next step. A 30-day diff on metadata pointers tells you whether anything has already moved.
- Argo CD: patch to ≥3.2.12 / ≥3.3.10 and rotate every K8s Secret in reachable namespaces. Rotation costs more than the patch and is still worth doing, because the patch does not unleak anything that already left.
- Orchestrators: n8n and Kestra service accounts should be scoped to minimum-needed per workflow. These tools typically run with broad cloud credentials because they orchestrate everything, which is exactly the failure mode the Argo CD bug exploits.
Action items
- Audit Iceberg/Polaris catalog configurations for metadata write-path allowlisting and enforce explicit storage credential scoping by end of week
- Patch Argo CD to ≥3.2.12/≥3.3.10 and rotate every K8s Secret in namespaces it can read — including HF tokens, model-registry creds, and cloud SA keys
- Run a dependency scan across n8n, Kestra, Spring Cloud Config, and Redis in your ML orchestration stack and pin to patched versions
- Add metadata-pointer integrity monitoring to lakehouse observability (alert on storage-location changes for production tables) this quarter
Sources:LiteLLM landed in the KEV catalog · An Ollama endpoint exposed to the public internet · PraisonAI auth bypass exploited in 4 hours · SANS AtRisk
04 Training Efficiency Drops: Where Your Next Quarter's Compute Savings Actually Live
Three Claims, Three Different Bets
The marginal dollar in model improvement has moved away from raw compute toward recipe engineering and data curation. Three research results this week point that direction, each from a different angle.
Work Claim Scale Validated Inference Impact Replication Risk Nous Research TST 2-3x wall-clock at matched FLOPs 270M → 10B-A1B MoE None — no architecture change at inference Medium; single-source, but clean NVIDIA Star Elastic 360x cheaper to derive model-size family; 7x better than SOTA compression Not specified Produces family of sizes from one post-training run High; headline number from lab Datology VLM curation +11.7 pts on 20 benchmarks; 17x less compute 2B and 4B params Lower response FLOPs — real serving win Medium; benchmark-selection risk Which One to Spike First
TST is the immediate candidate. It's a pretraining recipe change with no inference-side cost. If it replicates, it's a 2-3x on wall-clock without touching the serving stack. The token-superposition mechanism is validated from 270M to 10B, which covers the range most teams fine-tune in. A 1B continued-pretraining run against a matched-FLOPs baseline would settle it in under a week of compute.
Datology's result is the clearest evidence this year that the marginal dollar in VLM training has moved from compute to curation. Beating InternVL3.5-2B by about 10 points at 17x less compute, purely through data selection, means a well-curated 2B can compete with a poorly-curated 10B. For teams sitting on proprietary multimodal data, this reframes the ROI of data engineering against scaling runs.
Star Elastic's 360x deserves the most skepticism. Lab-reported numbers of this magnitude routinely shrink 10x under independent evaluation. Even at 36x, producing a family of model sizes from a single post-training run would restructure how teams build size tiers for deployment, eliminating separate fine-tuning runs per target device class.
The Compute Context
These drops arrive against a 4:1 GPU demand-to-supply ratio at Nebius, which posted 684% YoY revenue growth and guided to $3-3.4B in 2026. Cisco corroborates: AI networking orders jumped from $5B to $9B. The capacity crunch is real, and replicated efficiency gains are the only knob teams control while GPU allocations are gated.
Meanwhile, only 15% of organizations have the data foundation for agentic AI at scale (Fivetran survey, n=undisclosed). Data quality and lineage are cited by roughly 50% as the #1 blocker. The thing this doesn't tell you is how many of the agent projects funded this quarter are actually data-platform projects with an agent on top. Best guess from the field: about half. Worth naming before the budget locks.
The marginal dollar in model improvement has moved from compute to recipe and curation. A well-curated dataset at 17x less compute beats a poorly-curated one at full spend.
Action items
- Spike Token Superposition Training on a 1B continued-pretraining run against a matched-FLOPs baseline within the next 2 weeks
- Lock H2 2026 GPU reservations across 2+ providers (Nebius, CoreWeave, hyperscaler) before end of quarter
- Score every agentic AI project on the roadmap against Fivetran readiness dimensions (quality, lineage, governance) before committing Q3 budget
- Run ANALYZE/compute-stats coverage audit across Iceberg/Delta tables and add stats maintenance to table-level SLAs
Sources:DuckDB shipped a client-server mode · Claude just metered your agent SDK calls · The 4:1 ratio is the headline number · The Information AM
◆ QUICK HITS
Update: LiteLLM KEV (covered May 12) — Iceberg, Polaris, Argo CD, n8n, and Kestra all shipped CVSS 9.0+ vulnerabilities in the same cycle, expanding the attack surface from LLM gateways to the entire data-to-model pipeline
SANS AtRisk
DuckDB shipped Quack HTTP client-server protocol — Spark/Glue jobs under ~100GB that run on two-node clusters are now credibly replaceable with ECS Fargate + DuckDB + Terraform at 50%+ cost reduction
DuckDB shipped a client-server mode this week
Kafka Share Groups break partition==max-consumers ceiling with ~linear 8x scaling at 32 instances on I/O-bound workloads — stops over-partitioning as a scaling hack for LLM enrichment consumers
DuckDB shipped a client-server mode this week
Duolingo publicly pegs AI-generated content 'slop' rate at ~20% requiring human QC — rare production quality number to calibrate eval thresholds against
Duolingo's twenty percent AI slop rate
AI agents bypass legacy bot detection at 81% success rate — any user-facing surface feeding experiments or online learning is already ingesting undetectable agent traffic
MCP plus knowledge graphs
LLM-as-a-Verifier beats LLM-as-a-Judge on tie-rate and decision accuracy by decomposing criteria into repeated binary verifications with token-level scoring — a one-day harness rewrite
An Ollama endpoint exposed to the public internet
TML-Interaction-Small reports 0.40s full-duplex turn-taking latency vs 1.18s for GPT-Realtime-2.0 — a 3x gap on the metric that determines perceived naturalness in voice agents
TML is reporting 0.40 seconds of full-duplex latency
Cerebras IPO closed +70% at $311; OpenAI's $20B commitment is the first dollar-weighted signal that wafer-scale inference is production-viable — benchmark CS-3 API against H100 for your inference stack
Cerebras IPO validates non-Nvidia silicon
Persona drift in LLM agents measurable within 8 conversational turns (Li et al., COLM 2024) — embed a verbal tic canary and log the turn index where it drops as a free drift signal
AI personas drift within eight turns
PyTorch 2.12 shipped MX quantization export — the specific hook that lets you deploy low-precision models to inference runtimes without bespoke conversion tooling
The CyberGym result
◆ Bottom line
The take.
Anthropic killed the flat-rate Claude subscription this week (now metered API credits), Vercel confirmed 59% of production tokens are agentic multi-turn traces your eval harness doesn't score, and Apache Iceberg shipped a CVSS 9.9 that lets attackers silently redirect your training data to a poisoned S3 bucket. Your cost model, your eval harness, and your lakehouse trust boundary all broke simultaneously — fix the cost telemetry before June 15, rebuild the eval around trajectories before the next model swap, and patch Iceberg before someone poisons a feature table you won't notice until the model degrades.
Frequently asked
- How do I figure out which Claude workloads break under the new credit metering?
- Audit every programmatic Claude caller — Agent SDK, claude-p, GitHub Actions, batch evals, and third-party harnesses like Zed, Conductor, OpenCode, and T3 Code — and project monthly token burn at list price against the new credit cap. Anything that previously rode a Max plan was getting a 70-90% effective discount that disappears June 15, and there is no native per-user attribution, so you need a gateway with tenant/feature/prompt-family tagging to see the overrun before finance does.
- Why are single-turn evals and 3:1 I/O cost models suddenly wrong?
- Because 59% of production tokens are now agentic multi-turn traces with tool calls, bursty context, and provider-specific cache-hit behavior, per Vercel's data across 200,000 teams. Single-turn accuracy scores measure the 41% minority, and a 3:1 input-output assumption underestimates agentic spend by roughly 5x. Add trajectory metrics (tool-call F1, steps-to-completion, cost-per-successful-task) and segment cost forecasts by workload type and provider.
- Which of this week's CVEs actually threatens training data integrity?
- The Apache Iceberg metadata redirect (CVE-2026-42812, CVSS 9.9) is the one to prioritize for ML pipelines. An attacker with table-write permission can repoint metadata to an attacker-controlled S3 prefix, so subsequent queries and training runs ingest poisoned Parquet without any schema change to trip default lakehouse monitoring. Pair it with the Polaris credential bugs and Argo CD Secret extraction (rotate everything reachable, don't just patch) for the full blast radius.
- Of the new training-efficiency results, which is worth a spike this sprint?
- Nous Research's Token Superposition Training is the cleanest first bet: 2-3x wall-clock at matched FLOPs, validated from 270M to 10B-A1B MoE, with no inference-side architecture change. A 1B continued-pretraining run against a matched-FLOPs baseline settles replication in under a week. Datology's VLM curation result (+11.7 pts at 17x less compute) is the better long-term signal that data engineering now beats scaling for multimodal, but it requires reworking your curation pipeline.
- Should I take OpenAI's 2-month free Codex enterprise switch?
- Yes, as an asymmetric-payoff free option — run it on your existing harness against matched prompts and tool schemas on your top 5 production task classes. The pass rates matter less than how each agent solves: tool-call precision, recovery from errors, and cost-per-successful-task on agentic traces. Vendor leverage peaks while both labs are fighting for spend, and Anthropic's recent reliability drift plus the metering change make a credible second source worth pricing now.
◆ Same day, different angle
Read this day as…
◆ Recent in data science
Keep reading.
- Princeton's ICML 2026 audit added GPT 5.5, Gemini 3.5 Flash, and Claude Opus 4.7 and found zero meaningful reliability improvement over pred…
- Hugging Face Transformers has an RCE path that fires from model config files — not pickle weights — across 2.2 billion installs.
- Anthropic ended the flat-rate Claude subsidy this week.
- Anthropic killed the flat-rate Claude subscription this week.
- Anthropic quietly killed the 70-90% effective discount on programmatic Claude usage — subscriptions now convert to dollar-matched API credit…