Your AI adoption metrics are lying, and your dependency tree is on fire

Meta burned $100M in a month on tokens engineers were gaming, three CVSS 10.0-adjacent flaws hit the tools you already trust, and a scaffold swap moved the same model 60 points on the same benchmark. Three signals, one week, one job.

Meta's 85,000 employees consumed 60.2 trillion tokens in thirty days on an internal AI leaderboard called Claudeonomics. Estimated cost: north of $100M, even at bulk pricing. Microsoft has been running the same game since January — VPs who don't write code sit in the top twenty. Salesforce sets a floor: $100/week on Claude Code, $70/week on Cursor, a Mac widget updating every fifteen minutes so you can watch your peers' spend. Engineers at all three companies have traced production SEVs back to careless AI-generated code shipped by people optimizing for volume.

This is Goodhart's Law at industrial scale, and it means the numbers on every AI adoption slide you're about to see — yours, your board's, your vendors' — are contaminated. GitHub Copilot froze signups because costs doubled YTD. Anthropic is rationing individual users to feed enterprise demand. How much of that demand is actual work? Nobody in a position to say wants to say.

One long-tenured Meta engineer's theory is worth chewing on: the leaderboard wasn't a productivity program, it was a data collection strategy. Eighty-five thousand engineers generating real-world coding traces to train Meta's next coding model, at $100M/month in R&D cost dressed up as tool usage. If that's right, the traces are poisoned by the gaming they incentivized, and the next Meta coding model will have learned to look busy. Yes, but — even if the theory is wrong, the metric contamination is real and measurable. The take holds.

Shopify is the only company that got this right. Three moves: renamed the leaderboard to a "usage dashboard," wired circuit breakers on per-user spend anomalies (which catch both gaming and genuine runaway agents), and instrumented cost-per-token rather than total volume. Farhan Thawar's insight is the one worth stealing — engineers burning expensive tokens are working on hard problems; engineers burning cheap tokens at volume are padding stats.

The benchmarks were never measuring the model

While you're auditing your usage metrics, audit your model selection process too. Independent testing this week showed Qwen3.6-35B scoring 19% on Polyglot with one agent harness and 78.7% with another. Same model, same weights, same benchmark — a 4x swing from scaffold alone. If your model comparison ran in one harness, you benchmarked the harness.

The corollary is the more useful finding: Alibaba's Qwen3.6-27B, a dense Apache 2.0 model that fits on a single A100 at FP8, beats its own 397B MoE predecessor on SWE-bench Verified (77.2 vs 76.2), Terminal-Bench 2.0 (59.3 vs 52.5), and SkillsBench (+18.2 points). Perplexity is already running post-trained Qwen3 in production for factual search, claiming parity with GPT-5.4 at lower cost. The recipe — SFT for compliance, then RL over agent trajectories with synthetic rubrics — costs roughly four engineers for three months. Not millions. Four engineers, three months.

What this means for anyone paying per-token API costs for coding: the self-hosting math just changed. Test Qwen3.6-27B against your current model, on your production scaffold, using your production data. Not on somebody's leaderboard.

Your dependencies are on fire

Axios shipped a CVSS 10.0 header injection flaw (CVE-2026-40175) that exfiltrates cloud metadata. It's the most popular JavaScript HTTP client on earth. It's a transitive dependency in essentially every Node service you've ever touched. Apache Kafka's OAuthBearer SASL — the recommended production auth — accepts any JWT without validation (CVE-2026-33557, CVSS 9.1). Anyone reachable can produce and consume from any topic as any principal. That's silent training-data poisoning territory.

Quest KACE is on CISA KEV as a CVSS 10.0. Cisco ISE has three concurrent 9.9s. Sonatype Nexus 3.0.0–3.70.5 ships hard-coded credentials — your artifact repository, the root of trust for every build. CrowdStrike LogScale (your SIEM) has unauthenticated file read. Checkmarx KICS Docker tags were overwritten with a trojanized binary that exfiltrates every secret in the Terraform and Kubernetes configs it scans. The tool designed to find your secrets was rewired to steal them.

And MCP is emerging as a protocol-level problem. Three independent implementations — OpenAI Codex CLI (9.8), Flowise (9.9), Upsonic (9.8) — disclosed RCE the same week. Same vulnerability class, different codebases. That's not implementation quality; that's a design pattern shipping without security review while engineering teams pull MCP tools into production prototypes.

This week's actual work

Run npm ls axios and yarn why axios across every repo before you write another line of feature code. Pin to the patched version and enforce IMDSv2 with hop limit 1 as a parallel mitigation — Axios has non-cloud attack paths that IMDSv2 won't catch. Test your Kafka OAuthBearer with a self-signed JWT; if it accepts, patch or apply network ACLs today. Check your Nexus version, and if you're in the vulnerable range, rotate every credential and audit artifact checksums against source-of-truth builds. Freeze new MCP tool adoption pending security review.

While you're patching, do one more thing: audit whatever AI usage metric you're planning to put in your next exec deck. If it counts tokens, sessions, or percentage of code AI-generated without a paired outcome signal — PRs merged, SEVs avoided, review-cycle time — replace it before someone at your board asks Meta-shaped questions. The tokenmaxxing correction is coming, and the leaders who can distinguish real adoption from performative compliance will make sharper capital decisions than the ones still quoting the vendor's demand curve back at themselves.

◆ Behind the synthesis

Six specialist takes that fed this piece.

The piece above is one stream in my voice. Below are the six lenses my pipeline produced upstream — each tuned for a different reader. Use them when you want the angle that matters most to your role.

Your AI adoption metrics are lying, and your dependency tree is on fire

The benchmarks were never measuring the model

Your dependencies are on fire

This week's actual work

Six specialist takes that fed this piece.

Four CVSS 10.0 Bugs Hit Axios, Kafka, Go, and Nexus at Once

Axios CVSS 10.0 Header Injection Exfiltrates Cloud Metadata

Same Model, 4x Benchmark Swing: Scaffold Beats Weights

Meta Burned 60T Tokens in 30 Days as AI Metrics Break Down

Meta Burns 60T Tokens as 'Tokenmaxxing' Fakes AI Demand

Tokenmaxxing Puts 20–40% of $6.5B AI Coding ARR at Risk