Friday, February 20, 2026 ~4 min

The over-engineering tax is bigger than your model bill

Three independent benchmarks landed the same verdict this week: simpler beats complex across RAG, agents, and inference economics. Meanwhile your security stack just failed three live stress tests.

Three numbers from this week, in the order I'd put them on a whiteboard.

On an H100 generating tokens, you're using roughly 1% of the compute you're paying for. Not because your code is bad — because decode is memory-bound at ~1 FLOP/byte and the H100 wants 295. The gap gets worse every chip generation, since compute grows 3x every two years and bandwidth grows at half that rate. FlashAttention doesn't fix it. A bigger GPU doesn't fix it. The physics doesn't care.

FloTorch's 2026 RAG benchmark: simple 512-token recursive character splitting beats semantic and proposition-based chunking on accuracy and produces 3–5x fewer vectors. If your team has been investing sprints in clever chunking, you're paying 3–5x more for worse retrieval.

LangChain's coding agent jumped from Top 30 to Top 5 on Terminal Bench 2.0. Same model. They added self-verification and structured tracing to the harness. That's it.

Three independent domains — inference, retrieval, agents — converging on the same lesson: the dominant failure mode in production AI right now is over-engineering. The highest-leverage work this sprint isn't adding a layer. It's deleting one.

Context length is the bill nobody priced

The single most expensive product decision you'll make this year isn't which model you pick. It's how much context you give it.

A 7B model on an H100 serves 278 concurrent users at 4K context. At 128K, it serves 8. That's a 35x per-user cost increase from the same hardware, and it's because the KV cache eats GPU memory linearly while the quadratic attention term goes from 8% of total compute at 1K to 92% at 128K. Every "agent memory," "document ingestion," or "long conversation history" feature on your roadmap pushes you up that curve. Multi-agent setups where agents pass full traces between each other compound it multiplicatively.

Claude Sonnet 4.6 with a 1M token window is a great API product and an economically devastating self-host. Use the long context through providers who absorb the utilization problem. For everything you run yourself, cap defaults at 4K–8K, price longer context as a tier, and design inter-agent protocols that summarize rather than pass raw context. Treat KV cache as a first-class budgeted resource, not a side effect.

The gap between the raw compute floor (~$0.004 per million tokens at full utilization) and what APIs charge ($0.30–$1.25) is 75–330x. That's not margin. That's the operational overhead of running this stuff well — and it's the addressable market for anyone who closes it.

Patch the three things that broke this week

While we were optimizing prompts, three trust assumptions in the security stack failed simultaneously, and at least one is being actively exploited.

Dell RecoverPoint, CVE-2026-22769, CVSS 10.0. Hardcoded admin credential in tomcat-users.xml. Unauthenticated WAR deploy via /manager/text/deploy gets you root. UNC6201 is in the wild with GRIMBOLT — a native-AOT C# backdoor that strips CIL metadata, persists via convert_hosts.sh from rc.local, and pivots through VMware via Ghost NICs and iptables Single Packet Authorization on vCenter. They are deliberately targeting backup infrastructure to deny recovery. Patch today. Audit /home/kos/auditlog/fapi_cl_audit_log.log for /manager requests. Hash-check convert_hosts.sh. Deploy Mandiant's YARA rules.

ADWSDomainDump bypasses CrowdStrike Falcon and Microsoft Defender for Endpoint. It enumerates Active Directory over ADWS on port 9389 instead of LDAP. Both EDR platforms simply don't watch that protocol. This isn't a signature gap — it's an architectural one, and the tool is public. Add network-level detection on port 9389 and segment access to it before next Friday.

ETH Zurich broke zero-knowledge across Bitwarden, LastPass, and Dashlane. 25 demonstrated attacks. Roughly 60M users affected. Lightweight server impersonation during sync, exploiting feature bloat over 1990s primitives. Full paper drops at USENIX Security 2026, which means weaponized tooling follows shortly after. Start migration planning now, not when the headlines hit.

If your password manager, your backup infrastructure, and your EDR all have confirmed trust failures in the same week, the diagnosis isn't three bugs. It's a security architecture that took vendor claims at face value.

What to do this week

Profile your context-length distribution in production today. Not your max; your actual median and p95 by feature. If your median is over 8K and you don't have a unit-economics number for those sessions, you have a hidden cost bomb and you don't know how big it is.

Then pick the simplest thing on your AI roadmap and benchmark it against an even simpler version. Your semantic chunker against 512-token splits. Your model upgrade against a self-verification loop on the model you already have. Your 128K context window against retrieval over 4K. The 2026 evidence is that the simpler version wins more often than the complexity-bias of the field will let you believe.

And before any of that — patch RecoverPoint. The actively-exploited CVSS 10.0 in your DR layer outranks every architecture conversation on your calendar.

◆ Behind the synthesis

Six specialist takes that fed this piece.

The piece above is one stream in my voice. Below are the six lenses my pipeline produced upstream — each tuned for a different reader. Use them when you want the angle that matters most to your role.

The over-engineering tax is bigger than your model bill

Context length is the bill nobody priced

Patch the three things that broke this week

What to do this week

Six specialist takes that fed this piece.

Dell RecoverPoint CVE-2026-22769 (CVSS 10.0) is being actively exploited by UNC6201 via a hardcoded Tomcat credential — if you run RecoverPoint for Virtual Machines, stop reading and patch now.

CVE-2026-22769 is a CVSS 10.0 hardcoded credential in Dell RecoverPoint actively exploited by UNC6201 with a new GRIMBOLT backdoor that pivots through VMware via Ghost NICs — patch immediately and hunt for compromise indicators in your DR infrastructure.

Your AI features are hiding a 35x cost multiplier in context length, not model size — and the fix is simpler than you think.