Friday, March 13, 2026 ~4 min

The trillion-dollar SaaS reset has a reliability problem

The market just repriced per-seat software based on agents replacing humans. The same week, McKinsey's AI platform fell to a 1998-era SQL injection in two hours.

On January 29, software lost over a trillion dollars of market cap in a week. ServiceNow dropped 11% on a beat-and-raise. Microsoft — the most AI-forward incumbent on earth — shed $360B in a single session. The market wasn't punishing execution. It was repricing the model.

In the same cycle, an autonomous AI red-team called CodeWall found an unauthenticated SQL injection in McKinsey's internal LLM platform, Lilli, and exfiltrated 46.5 million chat messages, 728,000 files, and the entire proprietary RAG knowledge base. Time-to-pwn: under two hours. Vulnerability class: OWASP A03, the one we've been patching since the Clinton administration.

Those two stories are the same story. Hold them together and the through-line gets uncomfortable: the market is pricing a future where agents replace humans across the SaaS stack, and the platforms those agents will run on can't keep a junior pentester out for an afternoon.

What the repricing actually says

The SaaS thesis collapsing isn't "AI is hot." It's three load-bearing assumptions failing at once.

Per-seat pricing assumed your customer was a human with a license. When ten agents replace fifty knowledge workers, the math compresses by 80%. Human-centric UI assumed the consumer of your product had eyeballs and patience for a dashboard. Agents want JSON, idempotent endpoints, and cursor pagination. Code-complexity moats assumed years of proprietary engineering couldn't be replicated. Cursor doubled to a $50B valuation in four months replicating exactly that.

The tell that this is structural and not a sentiment swing: Atlassian cut 10% of staff to fund an AI pivot in the same week Oracle and Salesforce publicly dismissed the SaaSpocalypse narrative. Incumbents in denial while one of their peers restructures is the cleanest leading indicator you'll get. Anthropic shipped a one-click ChatGPT-to-Claude migration tool, drove a 295% spike in ChatGPT uninstalls, and hit #1 on the App Store. Switching cost is a design choice, not a moat, and Anthropic just proved it.

What the McKinsey breach actually says

If the firm that bills $500/hr for digital transformation shipped an AI platform with no auth on a SQL-injectable endpoint, the base rate for enterprise AI platforms is catastrophically lower than the keynote slides suggest.

This isn't a one-off. CodeWall separately chained four individually-low-severity bugs into admin on a hiring platform — autonomously. Perplexity's Comet browser was social-engineered into executing a phishing flow in under four minutes. CISA added n8n to its Known Exploited Vulnerabilities catalog with 24,700 instances still internet-facing, every one of them a credential vault for whatever pipelines it orchestrates. HPE Aruba CX switches have an unauthenticated admin takeover at near-CVSS-10. And in the most operatically grim story of the cycle, a ransomware negotiator at DigitalMint allegedly ran ALPHV/BlackCat attacks against his own clients, extracting $75.25M with insider knowledge of their insurance limits.

The pattern across all of these: the AI-era attack surface is mostly not AI. It's web app basics on novel platforms, automation tools holding production credentials, and trust relationships nobody threat-modeled.

The reliability gap is also a code gap

METR's data this week: roughly half of AI-generated PRs that pass SWE-bench get rejected by human maintainers. Poor quality, breaking adjacent code, missing the actual intent. 88% of AI proofs-of-concept never reach production. Google published controlled experiments showing reasoning models hallucinate intermediate chain-of-thought steps that propagate into wrong final answers — a failure mode your output-only monitoring is blind to.

The market is pricing in agent dominance. The agents are shipping code that humans reject half the time and reasoning over hallucinated premises. Both things are true. Resolving the contradiction is the work of the next eighteen months, and most of the value will accrue to whoever owns the layer between "the model generated something" and "a human accepted it."

The contradiction is your roadmap

The companies that come out of this intact aren't the ones moving fastest on AI features. They're the ones who figure out, this quarter, where their actual moat lives — proprietary data, workflow embeddedness, customer relationships — and rebuild the product and pricing around those, while hardening the platform underneath so it doesn't end up as the next Lilli.

If I had one week and one team, here's what I'd do.

Monday: pull every internal AI platform — chatbots over your knowledge base, RAG ingestion endpoints, agent tools that touch the database — and run an unauthenticated SQLi and authz check against each. Not a pen test next quarter. This week. The McKinsey breach is your business case; nobody on the board will argue.

Tuesday: inventory every workflow automation tool in the org — n8n, Zapier, Make, the internal Python script your data team uses. Map what credentials each holds. Patch n8n, rotate anything that touched it, and put the inventory on a recurring quarterly review.

Wednesday: model your P&L under 30%, 50%, and 70% per-seat-to-agent-consumption conversion over 36 months. Bring three pricing architectures — outcome-based, consumption-based, hybrid — to leadership before someone on the board reads the ServiceNow chart and asks first.

Thursday: pick one product surface and make it agent-operable. A real CLI with --output json, idempotent ops, structured errors, scoped credentials per tool. Not a chatbot wrapper. The interface AI agents will actually drive.

Friday: discount every AI-velocity claim in your roadmap by 50% and replan. If your sprint commitments assume the SWE-bench numbers, your sprint commitments are fiction.

The trillion-dollar repricing and the two-hour breach are telling you the same thing. Bet on the future. Audit the present. Don't confuse one for the other.

◆ Behind the synthesis

Six specialist takes that fed this piece.

The piece above is one stream in my voice. Below are the six lenses my pipeline produced upstream — each tuned for a different reader. Use them when you want the angle that matters most to your role.

The trillion-dollar SaaS reset has a reliability problem

What the repricing actually says

What the McKinsey breach actually says

The reliability gap is also a code gap

The contradiction is your roadmap

Six specialist takes that fed this piece.

HPE Aruba CX switches have an unauthenticated admin-takeover vulnerability at near-maximum CVSS — zero credentials required — and 24,700 n8n workflow automation instances are exposed to actively-exploited RCE that leaks every credential and API key your automations touch.

A DigitalMint ransomware negotiator allegedly ran ALPHV/BlackCat attacks against companies that then hired his firm to negotiate — extracting $75.25M across at least 10 attacks, with single payments reaching $26.8M, while using confidential negotiation data to maximize extortion.

Google published controlled experiments proving that reasoning-enabled LLMs hallucinate intermediate chain-of-thought steps that propagate into final-answer errors — a failure mode your final-answer-only monitoring is blind to.

The January 29 'SaaSmagedon' erased $1T+ in software market cap — and ServiceNow dropping 11% despite beating earnings proves the market is repricing the entire SaaS category structurally, not punishing poor performers.

McKinsey's enterprise AI platform Lilli was breached via basic SQL injection in 2 hours — 46.5M chat messages and 728K sensitive files exposed — while Perplexity's Comet AI browser was weaponized for phishing in under 4 minutes.