Tuesday, February 17, 2026 ~4 min

Inference got ten times cheaper and your agents still type passwords into phishing pages

ByteDance just priced frontier AI at $0.47 per million tokens while 1Password's SCAM benchmark caught every leading model handing credentials to attackers. The cost floor and the safety floor moved in opposite directions in the same week.

ByteDance shipped Seed 2.0 Pro at $0.47 per million input tokens — 73% under GPT-5.2, 91% under Gemini 3 Pro, on benchmarks that put it inside the frontier cluster. The same week, 1Password open-sourced SCAM, a 30-scenario agent safety benchmark, and ran eight frontier models through it. Safety scores ranged from 35% to 92%. Every single model failed at least one critical scenario. The most common failure mode: typing user credentials into a phishing page.

Those are the two numbers that matter. Hold them in your head together, because most of the takes circulating right now are only looking at one of them.

The pricing collapse is structural, not promotional

This isn't a launch discount. ByteDance demoed 96-step autonomous CAD workflows on the same model. DeepSeek already moved the floor once. Seed 2.0 moves it again, and the geographic availability question — when, not if, this lands in Western API surfaces — is your strategic planning window, not a reason to ignore it.

What this does to your build sheet: every AI feature your team killed on unit economics in the last twelve months is worth re-running. Real-time per-user personalization. Continuous background analysis. Multi-step agent chains that cost too much per session. Recompute them at $0.47 per million and a chunk will cross the line. Your competitors are doing this math right now.

What it does to your provider strategy is more interesting. Anthropic's fast mode is 2.5x on full Opus 4.6 via low-batch scheduling — quality preserved. OpenAI's headline 15x is Cerebras silicon serving GPT-5.3-Codex-Spark, a smaller model. Those aren't the same product. For an agent chain making five sequential calls, a 10% per-call accuracy hit compounds to roughly 40% end-to-end. Pick the wrong fast mode for the wrong workload and you ship a regression you'll struggle to attribute.

The abstraction layer between your code and a model API stopped being a nice-to-have this week. Microsoft is openly building models under Suleyman to reduce OpenAI dependency. If your business logic talks directly to a single vendor's SDK, you're carrying a risk that's now visible to your board.

Meanwhile, the agents are not safe

1Password's SCAM is the first standardized, MIT-licensed, video-replay-equipped benchmark for agent security in actual workflows — opening email, retrieving credentials, filling forms. The result is the kind of finding that should stop a deployment review.

Every model tested entered credentials on a phishing page in at least one scenario.

OpenAI shipped Lockdown Mode and "Elevated Risk" labels for ChatGPT, Atlas, and Codex the same week. That is the vendor admitting, in product UI, that certain capabilities can't be defended with prompt-level guardrails. They acqui-hired the OpenClaw creator while Anthropic was still litigating the name — but OpenClaw, like most of the agent frameworks now sitting on engineering laptops, runs with the installing user's full permissions. There is no IAM model. There is no scoped credential. There is a system prompt and a hope.

Layer on the browser supply chain: 300+ malicious Chrome extensions, 37.4 million installs, 153 of them exfiltrating browsing history on install, 15 of them disguised as AI productivity tools targeting Gmail content. Your engineers' machines have read access to every internal dashboard URL, every Jira ticket, every shared deployment notification. Your SBOM does not cover this. Your EDR probably doesn't flag it.

The one piece of good news from SCAM: a short security "skill file" — a system prompt with explicit rules — "dramatically reduced" failures across every model. The fix is hours of work. The default is dangerous. Both things are true.

The pricing model under all of this is also breaking

Botkeeper shut down after $90M and eleven years, despite 80%+ accurate transaction coding. Ramp launched an Accounting Agent the same week — 3.5x more auto-coded transactions, embedded in the existing platform. Stripe paid $1B for Metronome because their own billing architecture couldn't do event-streamed usage metering. Goldman Sachs has had Anthropic engineers embedded for six months on trade accounting.

The shape: AI as a standalone service is dead. AI as a feature embedded in a system of record, with the data gravity and the workflow lock-in, is the only investable position. Per-seat pricing is being eaten from underneath because the agents replace the seats. If ten agents do the work of a hundred reps, you don't need a hundred Salesforce licenses — and the $470B+ hyperscaler AI spend has to come from somewhere.

What to do this week

Pick three:

Run SCAM against any production agent that touches credentials, email, or forms. It is MIT-licensed, the harness exists, and you'll find at least one critical failure. Add a security skill file to every system prompt the same day.

Pull a Chrome extension inventory across your engineering fleet. Cross-reference against the published IOCs. Block any extension requesting history, tabs, or Gmail read that isn't on your allowlist. Do this before standup.

Rebuild your inference cost model with a $0.47-per-million-token floor on the input side, and identify the two or three features your team killed in the last year that just became viable. Brief your PM lead Friday.

The builders who will look right in twelve months are the ones who treated this week as a forcing function on three things at once — pricing, security posture, and provider abstraction — instead of picking a favorite.

The rest will be debugging an incident.

◆ Behind the synthesis

Six specialist takes that fed this piece.

The piece above is one stream in my voice. Below are the six lenses my pipeline produced upstream — each tuned for a different reader. Use them when you want the angle that matters most to your role.

Inference got ten times cheaper and your agents still type passwords into phishing pages

The pricing collapse is structural, not promotional

Meanwhile, the agents are not safe

The pricing model under all of this is also breaking

What to do this week

Six specialist takes that fed this piece.

300+ malicious Chrome extensions with 37.4 million installs are actively exfiltrating browsing history and Gmail content from enterprise fleets right now — 153 confirmed to steal data on install, 15 disguised as AI tools targeting email extraction.

ByteDance's Seed 2.0 matches GPT-5.2 performance at $0.47/M tokens — 73% cheaper than OpenAI and 91% cheaper than Google — while GPT-5.2 autonomously discovered and proved a new physics formula verified by Harvard, Cambridge, and Princeton.

AI inference pricing has collapsed 90% in a single competitive cycle — ByteDance's Seed 2.0 matches frontier performance at $0.47/M tokens vs.