~4 min
The day the floor and ceiling of AI engineering moved at once
Anthropic crossed $30B ARR and shipped a model that finds zero-days in everything you run. In the same week, controlled studies confirmed AI-written code ships 41% more bugs. Both numbers matter. Neither cancels the other.
Three things landed on April 7 that don't fit in the same story most people are telling about AI.
Anthropic disclosed $30B+ in annualized revenue, roughly tripling in four months and overtaking OpenAI's $25B. Claude Mythos Preview hit 93.9% on SWE-bench Verified — a 13-point jump over February's state of the art — and in the process autonomously discovered thousands of zero-day vulnerabilities, including a 27-year-old bug in OpenBSD and a flaw in FFmpeg that survived five million prior fuzzing runs. Project Glasswing, a 40+ company coalition with $100M in Anthropic credits, is now racing to patch critical infrastructure before open-weight models reach parity. Alex Stamos puts that window at roughly six months.
And in the same news cycle, controlled experiments confirmed AI coding tools produce 41% more bugs alongside their 26% speed gain. Meta's 85,000 employees burned 60 trillion tokens last month with no proven link to outcomes. GitHub is at 90% availability under 14x year-over-year agent traffic. Fewer than 3% of organizations can demonstrate ROI on the AI tools 84% of their developers are using.
The ceiling of what AI engineering can do moved up sharply. The floor of what it actually does in production is alarmingly low. Both are true. The discomfort of holding both is the point.
The Mythos disclosure is not another benchmark story
Ignore the SWE-bench number for a second. The relevant capability is that Mythos finds five separate vulnerabilities in a single codebase and composes them into novel exploit chains. Your defense-in-depth strategy assumes independent failure modes across layers. AI-driven chaining systematically violates that assumption. Your existing SAST, DAST, and fuzzing tools find individual issues. They don't reason about chains.
The capability emerged from general reasoning improvements, not specialized cybersecurity training. That means every frontier lab pursuing reasoning crosses this threshold eventually. Anthropic got there first and decided to ring the alarm rather than ship quietly. Glasswing is the defensive sprint; the six-month window is the deadline.
If your production stack depends on OSS that isn't in a Glasswing partner's scanning scope, those vulnerabilities won't be found defensively in time. Map your dependency tree against the coalition's coverage this week. Anything orphaned — older C/C++ libraries, niche codecs, crypto components without a major-vendor sponsor — is exposed in a way your current vulnerability management cadence cannot resolve.
The floor is what should scare you more
The ceiling case is OpenAI's Frontier team: roughly seven engineers, 1M lines of production code, zero human-written, zero pre-merge review, $2-3K/day in tokens. Real, impressive, and aggressively cherry-picked. The team itself says it shouldn't be extrapolated. It was greenfield. The build system had to stay under sixty seconds or the agents thrashed. The first six weeks were ten times slower than human coding before the loop reached escape velocity.
The floor case is everywhere else. Vercel auto-merges 58% of PRs in its monorepo. GitHub is straining under 17M monthly agent PRs. Trivy — the security scanner you trust to find vulnerabilities — was supply-chain compromised and used to exfiltrate 340GB from the European Commission. GrafanaGhost proved that any internal tool with AI features and outbound network access is a prompt-injection exfiltration channel your SIEM cannot see. OpenClaw shipped six critical authorization CVEs in six weeks; 63% of its 135,000 public instances run unauthenticated.
The common thread across the floor: the scaffolding around the model is where the failures are concentrating. The model is roughly constant. The harness, the auth layer, the build system, the review gate, the supply chain trust model — that's where things break, and that's where most teams haven't done the work.
What the revenue number actually tells you
Anthropic at $30B ARR with gross margins ten percentage points below expectations is the data point your portfolio and your vendor strategy both need to absorb. Inference costs are not deflating on the timeline most people priced in. The 3.5 GW TPU deal doesn't come online until 2027. Anthropic also just moved third-party Claude Code access to pay-as-you-go with one week's notice — the kind of unilateral repricing that happens when your unit economics are under pressure and your platform power lets you push the cost downstream.
If your product's economics assume flat-rate compute or stable per-token pricing through 2027, rebuild the model. The vendor that just overtook the market leader is telling you, through its margins and its billing changes, that API prices are subsidized on borrowed time.
What to do this week
Pick one repository — your highest-traffic production service — and instrument three numbers alongside whatever velocity metric your team currently celebrates: defect rate per AI-assisted PR, rework cycles before a PR reaches production-quality, and the percentage of merged PRs that received zero human review on security-relevant paths. Run it for four weeks. The 41% bug increase is invisible until you measure it; once you measure it, the conversation about AI tool ROI becomes possible.
While that's running, do the unglamorous Glasswing audit: which of your critical OSS dependencies have a major-vendor sponsor scanning them with frontier models, and which don't. The orphaned ones are your six-month problem. Everything else is downstream of those two pieces of data.
◆ Behind the synthesis
Six specialist takes that fed this piece.
The piece above is one stream in my voice. Below are the six lenses my pipeline produced upstream — each tuned for a different reader. Use them when you want the angle that matters most to your role.
-
Anthropic's Claude Mythos Preview — 93.9% on SWE-bench Verified, up 13 points from SOTA in February — has discovered exploitable zero-days in the Linux kernel, FFmpeg, OpenBSD, and every major browser, including chains of 5 vulnerabilities composed into novel exploits.
AI just found exploitable zero-days in Linux, OpenBSD, FFmpeg, and every major browser — and the capability goes open-weight in 6 months. Meanwhile, your security scanner (Trivy) w…
40 sources · 9 min Read → -
Anthropic's Claude Mythos Preview has autonomously discovered thousands of high-severity zero-day vulnerabilities across every major OS, browser, and the Linux kernel — including bugs undetected for 27 years — and Alex Stamos estimates open-weight models will replicate this capability within 6 months.
AI just discovered thousands of zero-days in every major OS and browser, and open-weight models will replicate this capability within 6 months — while simultaneously, AI-generated…
39 sources · 7 min Read → -
Gemma 4 crossed 2 million downloads in its first week and runs at 40 tokens/second on-device via MLX — simultaneously, FIPO credit assignment pushed AIME from 50% to 58% and OLMo 3's async RL achieved 4x training throughput.
Gemma 4 runs at 40 tok/s on-device and crossed 2M downloads in week one while FIPO and async RL revealed 2-4x post-training headroom — but the open-weight ecosystem faces three sim…
39 sources · 7 min Read → -
OpenAI Frontier shipped 1M lines of production code with 7 engineers and zero human-written code in 5 months — while controlled experiments elsewhere show AI coding tools produce 41% more bugs alongside 26% speed gains, and Meta's 85,000 employees burned 60 trillion tokens last month with zero proven ROI.
OpenAI proved 7 engineers can match a 500-person org's code output — but the industry's own data shows AI tools ship 41% more bugs, Meta's 85,000 employees can't link 60 trillion t…
40 sources · 11 min Read → -
Anthropic overtook OpenAI at $30B ARR — tripling in four months — but the bigger risk for your org today: controlled experiments now show AI coding tools produce 41% more bugs despite 26% speed gains, GitHub is at 90% availability under 14x agent traffic, and fewer than 3% of organizations can prove AI tool ROI.
Anthropic just overtook OpenAI at $30B ARR, but the bigger story is that your AI investment may be net-negative: controlled data shows 41% more bugs from AI coding tools, GitHub is…
40 sources · 8 min Read → -
Anthropic disclosed $30B+ annualized revenue — tripled from ~$9B in four months — definitively surpassing OpenAI's $25B and entering Fortune 100 revenue territory while still private.
Anthropic tripled to $30B+ ARR in four months and overtook OpenAI — the fastest revenue ramp in enterprise software history — while OpenAI's own CFO was frozen out for questioning…
40 sources · 7 min Read →