Tuesday, April 14, 2026 ~4 min

Your AI suppliers are now your competitors, and they're degrading the service

Microsoft starved Azure to feed Copilot. Anthropic cut Claude's cache TTL 12x without telling anyone. Meanwhile your LLM routers are injecting code and APT41 is harvesting IAM creds at 0/72 detection. The abstraction layer you bought is the attack surface.

On March 6, Anthropic quietly dropped Claude Code's prompt cache TTL from 60 minutes to 5. No changelog. No announcement. A user found it in a GitHub issue. For anyone running agentic coding loops — multi-file refactors, iterative test-fix cycles, anything that re-references a large context within the hour — that's a 12x cost increase that hit your bill before it hit your awareness.

The same week, leaked analysis of thousands of Claude Code sessions suggested Opus 4.6's reasoning depth dropped roughly 67%. Developers started migrating to Codex and GPT-5.4. Anthropic's revenue went from $9B to $30B annualized in a quarter, and the company is so compute-constrained it's reportedly weighing an IPO to buy capacity it can't otherwise secure.

This is the story.

Not the model launches. Not the benchmark scores. The story is that the providers you depend on are running out of compute, and they are now allocating that compute against your interests — silently, without notice, and with full plausible deniability about whether anything changed at all.

The compute trilemma is no longer theoretical

Microsoft's CFO told Wall Street the quiet part out loud. Amy Hood confirmed Azure growth would have exceeded 40% had all incoming GPUs been allocated to external customers. Microsoft chose not to. M365 Copilot and GitHub Copilot have higher gross margins than selling raw compute, so internal workloads won. If you're an Azure customer running AI inference, you are paying retail for a service whose supplier has explicitly decided its own competing products deserve the GPUs more than you do.

Meta is the only major AI player without this problem. No enterprise cloud business means no allocation trilemma — every GPU goes to the consumer ad engine and the model lab. Three senior Stargate infrastructure executives left OpenAI for Meta this week to staff a new "Meta Compute" group. That's not opportunistic hiring. That's a multi-year capability acquisition during the most compute-constrained moment in the industry's history.

Anthropic, meanwhile, is running an enterprise land-grab while compute-starved. Three product launches in a week — Ultraplan, Claude for Word, Epitaxy — plus a leaked vibe-coding app builder inside Claude that directly targets Lovable's $6.6B business. OpenAI's own revenue chief admitted in an internal memo that the Microsoft partnership has "limited its ability to reach enterprise customers on rival cloud platforms." Anthropic is winning enterprise. OpenAI is losing infrastructure architects to Meta. Microsoft is starving Azure to feed Copilot. None of these companies' interests are aligned with yours.

The abstraction layer is the attack surface

While your providers are quietly rationing quality, the supply chain underneath them is being actively compromised at every layer.

Researchers built a proxy simulator called Mine and caught nine LLM API routers — one paid, eight free — injecting malicious code into model responses and exfiltrating secrets. If you put a router between your application and an LLM API for cost optimization, caching, or rate limiting, that router may be tampering with the code your model returns. The routers and a residential proxy botnet share C2 infrastructure with the actors who just compromised the Xygeni, Trivy, and KICs vulnerability scanners. Your LLM proxy and your security scanner are being run by the same people.

APT41 deployed an ELF backdoor at 0/72 VirusTotal detection that queries cloud metadata APIs across AWS, GCP, Azure, and Alibaba, AES-256 encrypts the harvested IAM credentials, and exfiltrates them over SMTP port 25 to 43.99.48.196. Lateral movement uses UDP broadcasts to port 6006 — TensorBoard's default. Whether that's deliberate or coincidental, your ML monitoring traffic is now indistinguishable from APT41 lateral movement without deep packet inspection.

Adobe broke its Patch Tuesday cadence yesterday for an emergency Acrobat patch. The vulnerability has been actively exploited since November. Five months. Meanwhile, AI-generated PoCs are arriving on patch day — nginx CVE-2026-27654 had a working exploit the same day the fix landed; Marimo's pre-auth RCE was weaponized in under 10 hours; Claude found and exploited a 13-year-old ActiveMQ bug in minutes. If your critical-patch SLA is 30 days, you are accepting 29 days of unnecessary exposure.

What changed about the engineering job

The abstraction layers your team built for AI velocity — model providers, inference routers, security scanners, cloud metadata services — were all designed under an implicit assumption of supplier neutrality. That assumption is dead. Your provider's internal opportunity cost calculation now directly degrades your service quality. The proxy you trusted may be tampering with your outputs. The scanner you trusted may be the threat. The metadata service your training cluster needs is the credential exfiltration channel.

This is not a security problem and a procurement problem and a reliability problem. It is one problem: every layer between your code and the silicon is now an unverified trust boundary, and the people running those layers are explicitly optimizing against you.

The operator move this week is concrete and small. Audit Anthropic API costs since March 6 — compare your cache hit rates and total spend before and after, and you'll know whether you've been silently overpaying for a month. Stand up a model routing abstraction even if you only have one provider in it today; LiteLLM and OpenRouter exist and the integration is a sprint, not a quarter. Add response-divergence canaries between your LLM router and the upstream API so you'd know if the proxy starts editing outputs. Enforce IMDSv2 across every cloud account running ML workloads, and block outbound port 25 from anything that isn't a mail server. Pin your CI/CD scanners to verified content hashes, not version tags.

None of these are research projects. Each one is a known control against a documented threat from this week. If you ship one of them by Friday, you're ahead of most of your peers. If you ship all five, you've taken back the trust boundaries that your suppliers spent the last six months hollowing out.

◆ Behind the synthesis

Six specialist takes that fed this piece.

The piece above is one stream in my voice. Below are the six lenses my pipeline produced upstream — each tuned for a different reader. Use them when you want the angle that matters most to your role.

Your AI suppliers are now your competitors, and they're degrading the service

The compute trilemma is no longer theoretical

The abstraction layer is the attack surface

What changed about the engineering job

Six specialist takes that fed this piece.

Nine LLM API routers — including one paid service — were caught actively injecting malicious code into responses and exfiltrating secrets, while the vulnerability scanners guarding your pipeline (Trivy, Xygeni, KICs) share C2 infrastructure with a router proxy botnet.

APT41 has deployed a cloud IAM credential harvester with 0/72 antivirus detection across AWS, GCP, and Azure — exfiltrating stolen keys via AES-256-encrypted SMTP to C2 at 43.99.48.196.

LinkedIn just proved your LLM embeddings are numerically blind: raw engagement counts fed as text tokens produced -0.004 correlation with embedding similarity — literally random noise.

The seat-based SaaS model just lost 50.5% of its market value in six months — and ServiceNow responded by eliminating separate AI licensing entirely, making its entire portfolio AI-native by default.

Microsoft's CFO told Wall Street that Azure growth was deliberately sacrificed to feed higher-margin internal AI products — the clearest proof yet that your cloud provider is allocating compute against your interests.