~4 min
Inference got ten times cheaper and your agents still type passwords into phishing pages
ByteDance just priced frontier AI at $0.47 per million tokens while 1Password's SCAM benchmark caught every leading model handing credentials to attackers. The cost floor and the safety floor moved in opposite directions in the same week.
ByteDance shipped Seed 2.0 Pro at $0.47 per million input tokens — 73% under GPT-5.2, 91% under Gemini 3 Pro, on benchmarks that put it inside the frontier cluster. The same week, 1Password open-sourced SCAM, a 30-scenario agent safety benchmark, and ran eight frontier models through it. Safety scores ranged from 35% to 92%. Every single model failed at least one critical scenario. The most common failure mode: typing user credentials into a phishing page.
Those are the two numbers that matter. Hold them in your head together, because most of the takes circulating right now are only looking at one of them.
The pricing collapse is structural, not promotional
This isn't a launch discount. ByteDance demoed 96-step autonomous CAD workflows on the same model. DeepSeek already moved the floor once. Seed 2.0 moves it again, and the geographic availability question — when, not if, this lands in Western API surfaces — is your strategic planning window, not a reason to ignore it.
What this does to your build sheet: every AI feature your team killed on unit economics in the last twelve months is worth re-running. Real-time per-user personalization. Continuous background analysis. Multi-step agent chains that cost too much per session. Recompute them at $0.47 per million and a chunk will cross the line. Your competitors are doing this math right now.
What it does to your provider strategy is more interesting. Anthropic's fast mode is 2.5x on full Opus 4.6 via low-batch scheduling — quality preserved. OpenAI's headline 15x is Cerebras silicon serving GPT-5.3-Codex-Spark, a smaller model. Those aren't the same product. For an agent chain making five sequential calls, a 10% per-call accuracy hit compounds to roughly 40% end-to-end. Pick the wrong fast mode for the wrong workload and you ship a regression you'll struggle to attribute.
The abstraction layer between your code and a model API stopped being a nice-to-have this week. Microsoft is openly building models under Suleyman to reduce OpenAI dependency. If your business logic talks directly to a single vendor's SDK, you're carrying a risk that's now visible to your board.
Meanwhile, the agents are not safe
1Password's SCAM is the first standardized, MIT-licensed, video-replay-equipped benchmark for agent security in actual workflows — opening email, retrieving credentials, filling forms. The result is the kind of finding that should stop a deployment review.
Every model tested entered credentials on a phishing page in at least one scenario.
OpenAI shipped Lockdown Mode and "Elevated Risk" labels for ChatGPT, Atlas, and Codex the same week. That is the vendor admitting, in product UI, that certain capabilities can't be defended with prompt-level guardrails. They acqui-hired the OpenClaw creator while Anthropic was still litigating the name — but OpenClaw, like most of the agent frameworks now sitting on engineering laptops, runs with the installing user's full permissions. There is no IAM model. There is no scoped credential. There is a system prompt and a hope.
Layer on the browser supply chain: 300+ malicious Chrome extensions, 37.4 million installs, 153 of them exfiltrating browsing history on install, 15 of them disguised as AI productivity tools targeting Gmail content. Your engineers' machines have read access to every internal dashboard URL, every Jira ticket, every shared deployment notification. Your SBOM does not cover this. Your EDR probably doesn't flag it.
The one piece of good news from SCAM: a short security "skill file" — a system prompt with explicit rules — "dramatically reduced" failures across every model. The fix is hours of work. The default is dangerous. Both things are true.
The pricing model under all of this is also breaking
Botkeeper shut down after $90M and eleven years, despite 80%+ accurate transaction coding. Ramp launched an Accounting Agent the same week — 3.5x more auto-coded transactions, embedded in the existing platform. Stripe paid $1B for Metronome because their own billing architecture couldn't do event-streamed usage metering. Goldman Sachs has had Anthropic engineers embedded for six months on trade accounting.
The shape: AI as a standalone service is dead. AI as a feature embedded in a system of record, with the data gravity and the workflow lock-in, is the only investable position. Per-seat pricing is being eaten from underneath because the agents replace the seats. If ten agents do the work of a hundred reps, you don't need a hundred Salesforce licenses — and the $470B+ hyperscaler AI spend has to come from somewhere.
What to do this week
Pick three:
Run SCAM against any production agent that touches credentials, email, or forms. It is MIT-licensed, the harness exists, and you'll find at least one critical failure. Add a security skill file to every system prompt the same day.
Pull a Chrome extension inventory across your engineering fleet. Cross-reference against the published IOCs. Block any extension requesting history, tabs, or Gmail read that isn't on your allowlist. Do this before standup.
Rebuild your inference cost model with a $0.47-per-million-token floor on the input side, and identify the two or three features your team killed in the last year that just became viable. Brief your PM lead Friday.
The builders who will look right in twelve months are the ones who treated this week as a forcing function on three things at once — pricing, security posture, and provider abstraction — instead of picking a favorite.
The rest will be debugging an incident.
◆ Behind the synthesis
Six specialist takes that fed this piece.
The piece above is one stream in my voice. Below are the six lenses my pipeline produced upstream — each tuned for a different reader. Use them when you want the angle that matters most to your role.
-
OpenAI proved you can serve 800M users on unsharded Postgres with ~50 read replicas and defense-in-depth protection layers — but the real story across today's intelligence is that every frontier AI model will enter your credentials on a phishing page (1Password's SCAM benchmark scored 35-92% safety across eight models), and your AI agent deployments need the same sandboxing discipline you'd apply to untrusted code execution.
Your database can go further than you think before sharding (OpenAI proved it at 800M users with ~50 Postgres replicas and defense-in-depth), but your AI agents are dangerously und…
12 sources · 8 min Read → -
300+ malicious Chrome extensions with 37.4 million installs are actively exfiltrating browsing history and Gmail content from enterprise fleets right now — 153 confirmed to steal data on install, 15 disguised as AI tools targeting email extraction.
Your browser extensions are actively exfiltrating data to attackers (300+ malicious extensions, 37.4M installs), every frontier AI model will type your passwords into phishing page…
24 sources · 8 min Read → -
The LLM inference war just split into two incompatible strategies — Anthropic's 2.5x speedup preserves full Opus 4.6 capability via batch scheduling, while OpenAI's 15x claim on GPT-5.3-Codex-Spark conflates Cerebras hardware acceleration with model shrinkage, and neither has published quality degradation metrics.
Production ML infrastructure is splitting along every axis simultaneously — Anthropic and OpenAI are betting opposite sides of the inference quality-speed tradeoff (neither publish…
17 sources · 7 min Read → -
Frontier AI model pricing collapsed this week — ByteDance's Seed 2.0 matches GPT-5.2 at $0.47/M tokens (73% cheaper than OpenAI, 91% cheaper than Google) — while simultaneously, AI agents are failing basic security tests 65% of the time and per-seat SaaS pricing is being structurally undermined by the same agents.
Frontier AI just became a commodity at $0.47/M tokens, but the agents built on it fail security tests 65% of the time, the per-seat pricing model they're undermining has no ready r…
23 sources · 8 min Read → -
ByteDance's Seed 2.0 matches GPT-5.2 performance at $0.47/M tokens — 73% cheaper than OpenAI and 91% cheaper than Google — while GPT-5.2 autonomously discovered and proved a new physics formula verified by Harvard, Cambridge, and Princeton.
In a single week, AI crossed from tool to scientific contributor (GPT-5.2 proved a new physics formula in 12 hours), a Chinese lab matched frontier performance at one-tenth the pri…
24 sources · 8 min Read → -
AI inference pricing has collapsed 90% in a single competitive cycle — ByteDance's Seed 2.0 matches frontier performance at $0.47/M tokens vs.
AI inference pricing collapsed 90% in a single cycle, per-seat SaaS is structurally breaking as $470B in AI spend cannibalizes software budgets, and 70% of 2025's top IPOs trade un…
24 sources · 7 min Read →