Saturday, April 4, 2026 ~5 min

A two-person company hit $1.8B because the moat moved

Medvi's $20K-to-$1.8B run rate isn't an outlier — it's the same week Gemma 4 went Apache 2.0, GitHub fell to 90% uptime under agent load, and chain-of-thought stopped working. The boring layers are where the fight is now.

Matthew Gallagher spent $20,000, hired his brother, wired up ChatGPT, Claude, Midjourney, ElevenLabs, and a couple of regulated-services APIs, and built a GLP-1 telehealth business doing $401M in year one and tracking $1.8B in year two at 16.2% net margins — triple Hims, with two employees. Replit's CEO confirmed the one-person billion-dollar company has shipped. Nine independent newsletters covered the same story this week. The breadth of attention is itself the signal.

At the same time, Google released Gemma 4 under Apache 2.0 — a 31B dense model tied with Kimi K2.5 (744B) and GLM-5 (1T) on Arena, and a 26B MoE that activates 3.8B parameters and runs at 162 tok/s on a single 4090. OpenAI killed Sora after burning a million dollars a day and watching DAUs halve. GitHub's effective uptime cratered to about 90% — roughly 2.5 hours of daily degradation — because Claude Code traffic grew 6x in three months and the platform's stateful infrastructure was built for humans. Wharton, Apple ML, and Anthropic all published the same uncomfortable finding: chain-of-thought prompting on reasoning models buys you 2.9% accuracy at 20–80% latency cost, and on Gemini Flash 2.5 it's net negative.

None of these are isolated stories. They are the same story told from six angles: the moat moved, and most teams haven't noticed where it went.

What stopped being defensible

The model layer. Gemma 4 under Apache 2.0 means the next Medvi-style founder doesn't even pay for inference if they're willing to self-host. Qwen3.6-Plus is matching Opus 4.5 on SWE-bench. Sebastian Raschka's reverse-engineering shows Gemma 4 31B is architecturally near-identical to Gemma 3 27B — the jump came from training recipe, not architecture. Apple's Simple Self-Distillation paper got a 12.9pp gain on LiveCodeBench by fine-tuning Qwen3-30B on its own outputs with no filtering, no RL, no verifier. Sample, train, ship. That's the whole method.

Headcount as a proxy for capability. Two people did $401M in revenue. Chatbase hit $9M ARR with 18 people and zero outside capital. RevenueCat saw 40%+ growth in net-new developers shipping production apps in March alone. The minimum viable team for a competitive business has collapsed into single digits in any vertical where the regulated layer is API-accessible.

Prompt-engineering folklore. If you're still telling reasoning models to think step by step, you're paying 30–70x per query for accuracy that's flat or negative. Reasoning traces hide shortcut usage 61–75% of the time, and unfaithful traces are longer than faithful ones — the most polished output is the one most likely to be wrong. Stop trusting the trace. Verify the answer.

What started mattering more

The harness. Hermes Agent's pluggable memory across seven backends. LangChain shipping Claude Code → LangSmith tracing. Cursor 3 rebuilt as a multi-agent fleet manager. The model-harness training loop — capture traces, fine-tune an open model on your domain, deploy, repeat — is the actual flywheel. Apache 2.0 makes those traces yours to train on. If you're not logging structured execution traces from every agent run today, you're burning training signal you'll wish you had next quarter.

Routing. The inverted-U Apple ML documented is real: standard models beat reasoning models on easy tasks, reasoning wins the middle, both collapse on the hardest problems. A complexity classifier in front of your endpoints is now a six-figure infrastructure decision. DeepSeek R1 at $2.19/M tokens versus o3-mini at $4.40/M is a 2x gap with comparable quality on AIME. The teams who route well will quietly outspend the teams who don't on everything else.

Proprietary data and the regulated edge. Medvi outsourced doctors and pharmacy to CareValidate and OpenLoop. The interesting investment isn't Medvi — it's the picks-and-shovels companies that make the regulated layer API-accessible for the next hundred Medvis. Anthropic paid $400M for an 8-month-old, pre-revenue bio startup. That's the number to remember when someone tells you vertical AI is overpriced.

Resilience. GitHub at 90% is your CI/CD's new ceiling. Microsoft absorbed the platform into its AI group, eliminated the CEO role, and let Copilot fall to third behind Claude Code and Cursor. Three of the February–March incidents were failover paths that worked in testing and broke in production — the classic distributed-systems failure mode. Mirror your top repos to a second remote this week. Stand up self-hosted runners for deployment-critical pipelines. Git is distributed by design; you've been using it as if it weren't.

The macro that breaks the optimistic model

Google, Amazon, and Anthropic all throttled simultaneously. Kent Beck's read is correct: the binding constraint is investor patience, not silicon. Andreessen says the AI supply chain is sold out for three to four years and old Nvidia chips are appreciating. Power users are spending $1,000/day on Claude tokens. Meta committed $27B to a single 7.5GW data center. Oil hit $111. Blue Owl gated private credit redemptions on software-company exposure.

The assumption that inference costs decline on a steep curve — the assumption underneath most 2026 AI roadmaps — is fragile. Build for the case where costs plateau. Stress-test pricing against flat unit economics for two years.

What to do this week

One exercise. Pick your top three revenue lines. For each, write a one-page Medvi threat model: what could a 5-person AI-native team build against this with $50K and 60 days, and which of your defenses survive? If the answer is "most of what we do, at 1/100th the cost," you have a quarter — not a year — to decide what's actually proprietary and double down on it. Everything else is a legacy tax a competitor will arbitrage.

Then audit your prompts. Strip chain-of-thought from anything hitting a reasoning endpoint. Ship a complexity router in front of your model selection. Run Apple's self-distillation method on your best fine-tuned model — the experiment costs hours and the upside is double-digit accuracy gain. Mirror your critical repos. Log every agent trace. Pick one vertical where your regulatory or data moat is genuine and invest there like it's the only thing you own.

Because it is.

◆ Behind the synthesis

Six specialist takes that fed this piece.

The piece above is one stream in my voice. Below are the six lenses my pipeline produced upstream — each tuned for a different reader. Use them when you want the angle that matters most to your role.

A two-person company hit $1.8B because the moat moved

What stopped being defensible

What started mattering more

The macro that breaks the optimistic model

What to do this week

Six specialist takes that fed this piece.

GitHub's availability has cratered to roughly one nine (~90%) — about 2.5 hours of degradation per day — driven by a 6x surge in AI agent traffic over three months.

Google's Gemma 4 31B matches trillion-parameter models at 1/30th the size under Apache 2.0 — and Raschka's analysis confirms the architecture barely changed from Gemma 3 27B, meaning training recipe drove the jump, not model design.

A solo founder spent $20K, hired his brother, and built a $1.8B-run-rate telehealth company using AI for every function — code, ads, customer service, analytics.

A 2-person company just hit $1.8B in revenue using a $20K AI tool stack — and Google releasing frontier-competitive Gemma 4 under Apache 2.0 this week means the cost to replicate this model dropped to zero licensing.

A telehealth company built for $20K with 2 employees is on pace for $1.8B in 2026 revenue — the same week OpenAI shut down Sora after burning $1M/day with halving DAUs and killed a $1B Disney partnership.