How do I figure out if my Claude workloads will hit the 3-5x invoice spike?

Audit every Claude-backed workload running through Agent SDK, claude-p, GitHub Actions, batch evals, and third-party harnesses, then reconcile projected token burn against the new dollar-matched credit cap. The implicit 70-90% programmatic discount is gone, so any team that budgeted flat subscription cost is silently accruing overage. Deploy an LLM gateway like LiteLLM or Portkey with per-user, per-feature tagging and daily token budget alerts before the June 15 third-party tool split creates a separate, non-rolling credit bucket.

If 59% of tokens are agentic, what should my eval harness actually measure?

Add trajectory-level metrics alongside single-turn benchmarks: tool-call precision/recall, steps-to-completion, cost-per-successful-task, recovery-from-error rate, and per-node model attribution. Single-turn final-answer accuracy hides 40,000-token planner loops and can't distinguish a clean first attempt from a 3-retry recovery. Also instrument per-node token cost so utility calls (summarization, extraction, query rewriting) route to Flash/Haiku-class models while Opus stays on planning.

Why did the same Claude model find 271 bugs for one team and only 1 CVE for another?

The harness was the variable. Mozilla wrapped Claude Mythos Preview in a custom agentic, fuzzer-integrated pipeline with reproducible test cases, ephemeral VMs, and CI integration on landing patches, surfacing 271 bugs including UAFs and sandbox escapes. Stenberg ran an out-of-box scan against curl and got 5 claims with 1 real CVE. Same weights, two orders of magnitude difference — value accrues to how you probe × mutate × measure, not to the model alone.

What's the concrete data-poisoning path through the new Iceberg and Polaris CVEs?

A compromised analyst notebook uses Polaris credential-broadening bugs (CVE-2026-42809/10/11, CVSS 9.9) to escalate to full S3/GCS credentials, then exploits Iceberg (CVE-2026-42812, CVSS 9.9) to redirect table metadata pointers to an attacker-controlled S3 prefix. The next training run ingests silently poisoned Parquet. Default lakehouse logging tracks row changes, not pointer changes, so row counts and schema checks pass cleanly while the underlying bytes have moved.

Is the OpenAI Codex 2-month-free promo worth taking seriously as a hedge?

Yes — it's an asymmetric-payoff free evaluation with a 60-day window, dropped into the news cycle the same day Anthropic metered programmatic usage. Run a head-to-head with matched prompts and tool schemas against your current Claude workflows. Ramp's April data shows Anthropic at 34.4% versus OpenAI at 32.3%, the first lead change, so OpenAI is pricing a counter-offensive at exactly the developers Anthropic just alienated. Worst case you confirm Claude parity; best case you find a migration path before the October IPO pricing settles.

Edition 2026-05-23 · read as Data Science

AnthropicEndsClaude'sHidden70-90%ProgrammaticDiscount

Sources: 36
Words: 1,633
Read: 8min

Topics Agentic AI LLM Inference AI Regulation

◆ The signal

Anthropic converted Claude subscriptions to dollar-matched API credits across Agent SDK, GitHub Actions, and third-party harnesses, which retires the implicit 70-90% programmatic discount that a lot of teams quietly built their unit economics on. OpenAI posted a 2-month-free Codex enterprise switch promo into the same news cycle, which is the playbook we have watched both vendors run before. Workloads not reconciled against the new credit cap will run 3-5x last week's invoice. That is a pricing decision either way.

Key facts

Anthropic converted Claude subscriptions to dollar-matched API credits across Agent SDK, GitHub Actions, and third-party harnesses, eliminating the prior 70-90% programmatic discount.
Unreconciled Claude workloads will run 3-5x last week's invoice under the new credit cap, and a separate third-party tool credit bucket activates June 15.
Vercel's AI Gateway data shows agentic workloads at 59% of token volume across 200,000 teams, with Anthropic capturing 61% of dollars and Google capturing 38% of tokens via Flash.
Mozilla's custom agentic harness surfaced 271 bugs in Firefox using Claude Mythos Preview, while an out-of-box scan of curl produced only 1 real CVE — same model, different harness.
Apache Iceberg CVE-2026-42812 (CVSS 9.9) lets attackers redirect table metadata to attacker-controlled S3 prefixes, creating a silent training-data poisoning path that default lakehouse logging does not detect.

◆ INTELLIGENCE MAP

01
Anthropic's Triple Squeeze: Metering + 80x Capacity Miss + June 15 Cliff
act now
Anthropic metered programmatic usage, admitted an 8x capacity planning miss (80x growth vs 10x planned), leased xAI's entire 220K-GPU Colossus 1 cluster, and will split third-party tool credits on June 15. ServiceNow already burned its full-year Claude budget. OpenAI is pricing a counter-offensive.
80x
growth vs capacity plan
11
sources
- Growth vs plan
- Colossus GPUs
- Ramp share lead
- June 15 deadline
1. Planned growth10
2. Actual growth80
02
59% Agentic Tokens: Your Eval Harness Measures the Minority
monitor
Vercel's AI Gateway production index shows 59% of all tokens are now agentic multi-turn traffic. Anthropic captures 61% of spend (Opus), Google 38% of volume (Flash). Single-turn eval harnesses, cost models built on 3:1 I/O ratios, and per-request routing assumptions are measuring last year's workload shape.
59%
tokens now agentic
6
sources
- Agentic token share
- Anthropic spend share
- Google volume share
- MCP token overhead
1. Agentic traffic59
2. Single-turn traffic41
03
AI Cyber Capability Crosses AISI Threshold: Harness Dominates Model
monitor
Mythos is the first model to clear both AISI simulated attack ranges. Mozilla's custom harness found 271 Firefox bugs with Claude Mythos vs 1 CVE from an out-of-box scan on curl — same model, 271:1 yield difference. MDASH shipped 16 real Windows patches. Google detected AI-built cybercrime tooling in the wild.
271:1
harness yield gap
7
sources
- Mozilla bugs found
- curl bugs found
- MDASH Windows fixes
- AISI ranges cleared
1. Mozilla (custom harness)271
2. curl (out-of-box)1
04
Training Efficiency: 2-360x Compute Reductions Ship This Quarter
background
Nous TST delivers 2-3x wall-clock speedup at matched FLOPs with no inference architecture change (validated 270M→10B). NVIDIA Star Elastic claims 360x cheaper model-size families from one post-training run. Datology beats InternVL3.5-2B by 10 pts at 17x less compute via curation alone.
2-3x
TST wall-clock speedup
1
sources
- TST speedup
- Star Elastic saving
- Datology compute cut
- SWE-ZERO corpus
1. TST pretraining3
2. Star Elastic family360
3. Datology VLM curation17
05
Lakehouse & Orchestrator CVEs: Iceberg/Polaris CVSS 9.9
act now
Apache Iceberg (CVE-2026-42812) lets attackers redirect table metadata to attacker-controlled storage — poisoning training data silently. Apache Polaris has three CVSS 9.9 credential-broadening bugs. Argo CD 9.6 exposes plaintext K8s secrets. PraisonAI agent framework was weaponized in 4 hours post-disclosure.
9.9
Iceberg/Polaris CVSS
3
sources
- Iceberg CVSS
- Polaris CVEs
- PraisonAI time-to-exploit
- NGINX RCE age
1. 01Iceberg metadata redirect9.9
2. 02Polaris cred broadening9.9
3. 03Argo CD secret leak9.6
4. 04NGINX rewrite RCE9.1

◆ DEEP DIVES

Anthropic's Triple Squeeze: Metering, Capacity Crisis, and the June 15 Pricing Cliff

What Changed This Week

Three distinct Anthropic moves converged into a single cost event for any team running Claude programmatically:

Programmatic usage is now metered. Claude subscriptions convert to dollar-matched API credits across Agent SDK, claude-p, GitHub Actions, and third-party harnesses. The implicit 70-90% effective discount that alt-harness users had been getting is dead.
The 80x capacity admission. Dario Amodei conceded at Code with Claude on May 6 that Anthropic planned for 10x growth and hit 80x. The emergency patch: leasing xAI's entire Colossus 1 cluster (220,000+ GPUs) spanning H100, H200, and GB200.
June 15 third-party tool split. Starting in 30 days, Claude usage through Conductor, Zed, OpenCode, and T3 Code gets a separate credit bucket. No subsidized tokens, no rollover, overflow bills at API rates.

The Cross-Source Pattern

Multiple independent sources confirm this is a coordinated pricing tightening ahead of Anthropic's October IPO. ServiceNow's CDIO has already burned the full-year Claude budget by May. National Life Group's CIO publicly called Claude 'not great for companies' wanting per-user monitoring. Anthropic provides no native per-user telemetry and no SLAs — unusual for a dependency on the critical path of production features.

The vendor that captures 34.4% of enterprise spend cannot tell customers which user burned the tokens.

Meanwhile, OpenAI dropped a 2-month-free Codex enterprise switch promo the same day Anthropic metered usage. Ramp's April data shows Anthropic edging OpenAI 34.4% vs 32.3% — the first lead change. This is OpenAI pricing a counter-offensive against the exact developers Anthropic just alienated.

What Benchmarks From Before May 7 Are Now Stale

Surface	Before	After (May 7-14)
Claude Code (Pro/Max)	5-hour limit	Doubled
Peak-hours throttle	Reduced limits	Removed
Opus API rate limits	Squeezed during crunch	'Substantially raised'
Fleet composition	Anthropic-managed	Heterogeneous (incl. GB200)

Any architectural decision made against April numbers — aggressive caching, prompt compression, provider migration — is calibrated to conditions that no longer exist.

The Practical Implications

Token consumption in agentic workflows is non-linear. A reflection loop or tool-use chain can 10x spend per task without proportional quality gain. With no native cost attribution at the user or prompt level, the overage shows up the way it showed up at ServiceNow — after the money is gone. The workaround is entirely on the customer: gateway-level logging with per-tenant tagging, daily budget alerts, and hard caps per feature.

Action items

Audit every Claude-backed workload (Agent SDK, claude-p, GitHub Actions, batch evals) and reconcile projected token burn against the new credit cap this sprint
Deploy an LLM gateway (LiteLLM/Portkey) with per-user, per-feature tagging and daily token budget alerts before June 15
Run a 2-month head-to-head Codex evaluation under OpenAI's enterprise switch promo with matched prompts and tool schemas
Re-baseline Claude throughput and latency benchmarks post-Colossus integration before shipping any workaround from the April crunch period

Sources:Claude just metered your agent SDK calls · Claude Code latency on long-context requests drifted upward · Anthropic ships no per-user usage telemetry · Anthropic passes OpenAI in B2B · Vercel published a number worth sitting with

59% Agentic Tokens, 100% Single-Turn Evals: The Measurement Gap That's Costing You 5x

The Production Reality

Vercel's AI Gateway production index — the first multi-tenant usage snapshot this quarter — puts agentic workloads at 59% of all token volume across 200,000 teams. Six months ago it was under 20%. That composition shift is faster than the completion-to-chat transition, and most production stacks were architected before it started.

The spend-versus-volume split is the routing signal. Anthropic captures 61% of dollars through Opus on reasoning and planning nodes. Google captures 38% of tokens through Flash on throughput and utility calls. Expensive models plan, cheap models fan out. Teams without a tiered router are paying Opus rates for work Flash does at 5-10x lower cost.

Why Single-Turn Evals Are Lying

Most agent eval harnesses still score a single response against a reference answer. That was the right call in 2023. With 59% of traffic now multi-step tool loops, three things break:

Cost models break. Input-to-output ratios on agentic traces moved from roughly 3:1 to roughly 15:1. A forecast built on last year's ratio is off by about 5x on spend.
Accuracy masks cost paths. A planner that burns 40,000 tokens arguing with itself and then gives up can still post 90%+ final-answer accuracy. The thing this doesn't tell you is where the bill lives.
Tool-call reliability is invisible. An end-state-only score cannot distinguish a clean first attempt from a 3-retry recovery, and the production bottleneck is the second one.

Abridge's production architecture across 80M+ clinical conversations is the cleanest validation of the alternative: constellation-of-models with fast/slow routing, LLM judges calibrated against human-annotated ground truth, and memory externalized from weights into event-driven stores.

If 59% of your tokens are agentic but 100% of your evals are single-turn, you're flying instruments-out.

The Duolingo Anchor

Duolingo disclosed a ~20% unusable rate for AI-generated content at scale. Production quality numbers at that resolution are rare. They also reversed the blanket 'evaluate all employees on AI usage' policy after observing performative adoption with no productivity lift. Both points calibrate expectations. Well-resourced teams ship 1-in-5 outputs that need human rework, and token-usage-as-KPI is a Goodhart trap.

What the Eval Harness Needs

Metric	What It Catches	Effort
Tool-call precision/recall	Wrong tool selections, hallucinated args	Days
Steps-to-completion	Wandering planners, retry storms	Days
Cost-per-successful-task	Efficiency at constant quality	Hours
Recovery-from-error rate	Resilience vs. brittleness	Week
Per-node model attribution	Which step needs Opus vs. Flash	Week

Action items

Add trajectory-level metrics (tool-call F1, steps-to-completion, cost-per-successful-task) to eval harness alongside existing single-turn benchmarks this sprint
Instrument per-node token cost in your agent graphs and route utility calls (summarization, extraction, query rewriting) to Flash/Haiku-class models
Add LLM-judge-to-human-annotator agreement as a tracked metric, re-calibrated quarterly
Benchmark your LLM output acceptance rate against Duolingo's 20% slop baseline and adjust HITL staffing accordingly

Sources:Agentic traffic crossed fifty-nine percent · Vercel published a number worth sitting with · Abridge runs model routing across 100M conversations · Duolingo's twenty percent AI slop rate · AI Gateway data puts agentic workloads at fifty-nine percent

Mythos Cleared Both AISI Ranges — And the 271:1 Harness Result Proves What Actually Matters

The Capability Threshold

Anthropic's Claude Mythos Preview is the first model to clear both UK AISI simulated attack ranges, a discrete ladder running from 'advanced persistence' to 'full network takeover.' The prior Mythos generation topped out one tier below. GPT-5.5-cyber cleared one of the two. AISI is already building harder tests because the current ones are saturating, which is the more informative signal.

This is not a benchmark delta. It is a discrete unlock: a task no prior model could complete end-to-end is now routinely clearable. Congress is pressing Anthropic through closed-door briefings, and the reported center of gravity is shifting from CISA (defensive) to NSA (offensive/intelligence).

The 271:1 Harness Result

Two teams ran Claude Mythos against large C codebases in the same month. The results diverge by two orders of magnitude:

Dimension	Mozilla + Firefox	Stenberg + curl
Model	Claude Mythos Preview	Claude Mythos Preview
Harness	Custom agentic, fuzzer-integrated	Out-of-box scan
Bugs surfaced	271 (UAFs, sandbox escapes)	5 claimed, 1 real CVE
CI integration	Yes — patches scanned on landing	None

Daniel Stenberg's verdict on the curl result: 'primarily marketing.' Mozilla's security team is integrating it into CI for all landing patches. Same model, same weights. The harness is the variable that moved.

This lines up with what Microsoft's MDASH showed separately: 16 real Windows vulnerabilities patched in May Patch Tuesday, found by a multi-model ensemble system. The pattern is that value accrues to the Cartesian product of how you probe × how you mutate × how you measure, not to the weights alone.

When a frontier model yields 271 bugs for one team and 1 CVE for another against the same language, the harness is the product, not the model.

Offensive AI Is Now a Detected Incident Class

Google's threat intelligence team identified a hacking group actively using LLMs to build cybercrime tooling, the first production-grade detection behind the post-Mythos misuse concerns. Palo Alto's AI-driven scanning surfaced serious vulnerabilities across 130+ products. The attacker economics have shifted. Inference is cheap, orchestration is cheap, and the expensive line item was always the operator. The model replaced the operator.

For teams shipping coding agents, the implications are concrete:

Refusal-rate harnesses measure the wrong bottleneck. The thing a refusal rate doesn't tell you is whether the model can execute the chain. A staged rubric (recon, initial access, lateral movement, persistence, exfil) run against every model upgrade is the closer match.
Agent trajectory telemetry is security telemetry. If a frontier model can execute a takeover in a lab, misuse attempts in production chain tool calls the same way. Log agent action sequences and train a lightweight classifier on known-bad trajectories.
Patch SLAs need to benchmark against inference time, not human-weeks. Vulnerability discovery that used to take teams months now takes model-minutes.

Action items

Add a cyber-capability tier to your model eval harness — include AISI-style staged attack tasks for any model with tool/shell access before next model upgrade
Spike a domain-specific agentic vuln-discovery harness on one internal service, modeled on Mozilla's pattern (reproducible test cases + ephemeral VMs + integration with existing pipelines)
Instrument agent action sequences in production and alert on tool-use patterns matching recon→lateral-movement→persistence signatures
Compress critical-patch SLA from quarterly to monthly cadence and add CVE volume monitoring

Sources:Mythos cleared the AISI attack ranges · Mozilla shipped 271 bugs · CyberGym result multi-agent ensembles · The headline claim is that AI models have reached full network takeover · PraisonAI weaponized within four hours · Google's report of a threat actor using AI

Lakehouse Trust Boundary Collapsed: Iceberg, Polaris, and Argo CD Create a Data-Poisoning Path

Three CVEs That Chain Into Training Data Corruption

A new batch of critical CVEs lands on the exact infrastructure most ML teams run in production. These are not generic patch advisories. They target the data layer, which is the trust boundary between what a model trains on and what an attacker can manipulate.

The Attack Chain

Apache Polaris (CVE-2026-42809/10/11, CVSS 9.9) — Credential-broadening bugs let an attacker with limited catalog access escalate to full S3/GCS credential exposure, enabling cross-tenant data access.
Apache Iceberg (CVE-2026-42812, CVSS 9.9) — An attacker with table-write permission can redirect metadata pointers to an attacker-controlled S3 prefix, so the next query reads poisoned Parquet and the next training run ingests silently corrupted features.
Argo CD (CVE-2026-42880, CVSS 9.6) — Read-only users can extract plaintext Kubernetes Secrets. For teams running model services via Argo CD, that means model-registry tokens, HuggingFace PATs, and cloud credentials.

The combined path: compromised analyst notebook, Polaris credential escalation, Iceberg metadata redirect, poisoned training data. The failure mode is silent. Default lakehouse logging covers row changes, not pointer changes, so a row-count and schema check passes cleanly while the bytes have moved underneath.

Additional ML-Stack Vectors

Component	CVE / CVSS	Impact
PraisonAI (agent framework)	CVE-2026-44338 / high	Auth bypass; exploited in 4 hours post-disclosure
NGINX rewrite module	CVSS 9.1	Unauthenticated RCE; 18 years latent; affects model-serving ingress
n8n workflow orchestrator	CVE-2026-42233 / 9.8	SQLi + OAuth token theft
Kestra orchestrator	CVE-2026-38428 / 9.8	SQLi into pipeline metadata

The PraisonAI four-hour exploitation window resets the floor. Agent frameworks, model-serving gateways, and workflow orchestrators all shipped fast, and are now receiving the security attention web frameworks got a decade ago. The thing the CVSS score doesn't measure is blast radius inside an ML platform: a read-only Argo CD bug is a 9.6, but for a team that stores HuggingFace PATs as Secrets, the practical impact is the model registry.

If the reference architecture runs Iceberg, Polaris, Argo CD, or any modern orchestrator, there is patching homework before the next experiment and credential-rotation homework before the next sprint.

Action items

Patch Iceberg/Polaris catalog configurations immediately: enforce explicit storage credential scoping, add write-path allowlisting for table metadata locations, and audit pointer mutations in catalog logs
Patch Argo CD to ≥3.2.12 / ≥3.3.10 and rotate every Kubernetes Secret in namespaces it can read — including model-registry tokens and HuggingFace PATs
Patch NGINX across all inference gateways and ingress-nginx controllers; audit rewrite-module usage in model routing configs
Inventory all agent frameworks (PraisonAI, LangChain, CrewAI, AutoGen), pin versions, subscribe to CVE feeds, and enforce same-day patching cadence

Sources:LiteLLM landed in the KEV catalog this week · An Ollama endpoint exposed to the public internet · PraisonAI an open-source multi-agent framework · The Hacker News NGINX RCE

◆ QUICK HITS

DuckDB shipped Quack HTTP client-server mode — credible Spark-on-Glue replacement for single-node ETL under ~100GB; benchmark your p95 query before migrating
DuckDB shipped a client-server mode this week
Kafka Share Groups report ~linear 8x throughput at 32 consumers by decoupling parallelism from partition count — spike on your most partition-bound embedding/enrichment consumer first
DuckDB shipped a client-server mode this week
Only 15% of organizations have the data foundation for agentic AI (Fivetran); ~50% cite data quality/lineage as #1 blocker — score target domains before greenlighting agent projects
DuckDB shipped a client-server mode this week
TML-Interaction-Small reports 0.40s turn-taking latency vs 0.57s (Gemini Flash) and 1.18s (GPT-Realtime-2.0) — a 3x gap on the metric that determines perceived naturalness in voice agents
TML is reporting 0.40 seconds of full-duplex latency
Update: Exposed AI inference endpoints (Ollama, MCP, LangServe) indexed by Shodan within 3 hours; 23% of honeypot traffic now targets AI-specific paths like /.well-known/mcp.json
An Ollama endpoint exposed to the public internet gets picked up by Shodan in about three hours
Nebius GPU demand ratio at 4:1 (customers per GPU), 684% YoY revenue growth, guiding $3-3.4B in 2026 — lock H2 GPU reservations across 2+ providers before quarterly sellouts
The 4:1 ratio is the headline number
Opus 4.7 tripled image processing cost silently — re-price multimodal inference budgets and eval Gemini/GPT-4V on your actual vision workload
Anthropic passes OpenAI in B2B
SAP (€100M partner fund) and ServiceNow (Action Fabric) both converged on Knowledge Graph + MCP as the enterprise agent architecture — RAG-over-docs losing ground to structured KG grounding
MCP plus knowledge graphs is the combination showing up
AI agents bypass legacy bot detection at 81% success rate — retrain abuse models with agent-generated traffic and add behavioral/request-graph features before next model refresh
MCP plus knowledge graphs is the combination showing up
Gemini reproducibly emits real phone numbers from training data (3 independent cases) — add PII extraction eval suite (canary insertion + divergence attacks) to LLM CI before next release
Gemini is the latest model to surface PII from its training data
New COSO/PCAOB guidance requires deterministic execution and tamper-evident audit trails for ML in regulated finance — audit seed management, GPU non-determinism flags, and model-artifact immutability now
The transformer underwriting models are outperforming

◆ Bottom line

The take.

Anthropic metered your Claude subscriptions overnight, admitted an 8x capacity planning miss, and set a June 15 deadline for third-party tool pricing — all while 59% of production tokens shifted to agentic workloads your single-turn eval harness can't measure, and Apache Iceberg/Polaris shipped CVSS 9.9 bugs that create a silent path from compromised notebook to poisoned training data. The week's action list: reconcile Claude spend before the credit cap bites, add trajectory-level metrics to the eval harness, and patch the lakehouse before the next training run ingests something an attacker put there.

Frequently asked

How do I figure out if my Claude workloads will hit the 3-5x invoice spike?: Audit every Claude-backed workload running through Agent SDK, claude-p, GitHub Actions, batch evals, and third-party harnesses, then reconcile projected token burn against the new dollar-matched credit cap. The implicit 70-90% programmatic discount is gone, so any team that budgeted flat subscription cost is silently accruing overage. Deploy an LLM gateway like LiteLLM or Portkey with per-user, per-feature tagging and daily token budget alerts before the June 15 third-party tool split creates a separate, non-rolling credit bucket.
If 59% of tokens are agentic, what should my eval harness actually measure?: Add trajectory-level metrics alongside single-turn benchmarks: tool-call precision/recall, steps-to-completion, cost-per-successful-task, recovery-from-error rate, and per-node model attribution. Single-turn final-answer accuracy hides 40,000-token planner loops and can't distinguish a clean first attempt from a 3-retry recovery. Also instrument per-node token cost so utility calls (summarization, extraction, query rewriting) route to Flash/Haiku-class models while Opus stays on planning.
Why did the same Claude model find 271 bugs for one team and only 1 CVE for another?: The harness was the variable. Mozilla wrapped Claude Mythos Preview in a custom agentic, fuzzer-integrated pipeline with reproducible test cases, ephemeral VMs, and CI integration on landing patches, surfacing 271 bugs including UAFs and sandbox escapes. Stenberg ran an out-of-box scan against curl and got 5 claims with 1 real CVE. Same weights, two orders of magnitude difference — value accrues to how you probe × mutate × measure, not to the model alone.
What's the concrete data-poisoning path through the new Iceberg and Polaris CVEs?: A compromised analyst notebook uses Polaris credential-broadening bugs (CVE-2026-42809/10/11, CVSS 9.9) to escalate to full S3/GCS credentials, then exploits Iceberg (CVE-2026-42812, CVSS 9.9) to redirect table metadata pointers to an attacker-controlled S3 prefix. The next training run ingests silently poisoned Parquet. Default lakehouse logging tracks row changes, not pointer changes, so row counts and schema checks pass cleanly while the underlying bytes have moved.
Is the OpenAI Codex 2-month-free promo worth taking seriously as a hedge?: Yes — it's an asymmetric-payoff free evaluation with a 60-day window, dropped into the news cycle the same day Anthropic metered programmatic usage. Run a head-to-head with matched prompts and tool schemas against your current Claude workflows. Ramp's April data shows Anthropic at 34.4% versus OpenAI at 32.3%, the first lead change, so OpenAI is pricing a counter-offensive at exactly the developers Anthropic just alienated. Worst case you confirm Claude parity; best case you find a migration path before the October IPO pricing settles.

◆ Same day, different angle

Read this day as…

◆ Recent in data science

AnthropicEndsClaude'sHidden70-90%ProgrammaticDiscount

◆ INTELLIGENCE MAP

◆ DEEP DIVES

What Changed This Week

The Cross-Source Pattern

What Benchmarks From Before May 7 Are Now Stale

The Practical Implications

The Production Reality

Why Single-Turn Evals Are Lying

The Duolingo Anchor

What the Eval Harness Needs

The Capability Threshold

The 271:1 Harness Result

Offensive AI Is Now a Detected Incident Class

Three CVEs That Chain Into Training Data Corruption

The Attack Chain

Additional ML-Stack Vectors

◆ QUICK HITS

The take.

Frequently asked

◆ RELATED THREADS