Open-Source AI Tops SWE-Bench as Cyber Selloff Hits Margins
Topics Agentic AI · LLM Inference · AI Capital
Open-source AI just claimed the #1 position on SWE-Bench Pro under an MIT license — the same week UBS confirmed over 50% of enterprises are actively 'containing' non-AI software spend and the selloff breached cybersecurity stocks for the first time (Palo Alto -6.7%, CrowdStrike -4%). The base model layer is commoditizing and the application layer is getting budget-cut simultaneously. If your portfolio is caught between these two forces — charging proprietary API margins or selling seats to enterprises now capping non-AI spend — the compression window just shortened to 2-3 quarters.
◆ INTELLIGENCE MAP
01 Open-Source AI Claims Benchmark Crown — Proprietary Moats Compress
act nowGLM-5.1 (MIT license) scored 58.4 on SWE-Bench Pro, dethroning GPT-5.4 and Claude Opus 4.6. Google's Gemma 4 under Apache 2.0 runs on mobile devices. The base model layer is now a commodity — value capture migrates to orchestration, edge deployment, and proprietary data layers.
- GLM-5.1 SWE-Bench
- License
- Autonomous runtime
- Tool calls/session
- 01GLM-5.1 (MIT)58.4
- 02GPT-5.4 (Proprietary)56
- 03Claude Opus 4.6 (Prop.)55
- 04Gemma 4 31B (Apache)52
02 Enterprise SaaS Selloff Breaches Cybersecurity Safe Haven
act nowUBS confirms >50% of enterprise buyer conversations now mention 'containing' non-AI software spend. ServiceNow and Snowflake dropped ~8%, but the key break is cybersecurity: Palo Alto -6.7%, CrowdStrike -4%. Short sellers are building positions in VM pure-plays (QLYS, RPD, TENB). Figma at $7.9B is 60% below Adobe's 2022 bid.
- Palo Alto Networks
- ServiceNow
- Snowflake
- Figma vs Adobe bid
- Asana YTD
03 Agent Revenue vs. Agent Reality — Usage Data Creates a Contradiction
monitorLarge-scale ChatGPT research shows decision support and writing dominate actual usage; autonomous execution barely registers. Yet Perplexity's agent pivot drove 50% MoM revenue jump to $450M ARR. Meanwhile, LaunchDarkly data shows AI code ships faster but reliability hasn't improved. The market may be overpricing pure autonomy while underpricing copilot/middleware plays.
- Perplexity ARR
- MoM growth
- Perplexity MAU
- Autonomous usage
04 Diffusion LLMs Could Unlock 100x GPU Efficiency
monitorAutoregressive LLM inference uses ~1% of A100 compute capacity. Diffusion LLMs generate tokens in parallel, shifting inference to compute-bound — where GPUs actually excel. Three models (LLaDA, Dream 7B, BD3-LM) are approaching quality parity, and Dream 7B is already in production. If this scales, it compresses inference costs and strands current serving infrastructure investments.
- Current GPU util.
- Diffusion potential
- Dream 7B status
- Models at parity
- Autoregressive1
- Diffusion LLM100
05 Gen Z Capital Reallocation — Structural Fintech TAM Shift
background26-year-old investment participation jumped 5x from 8% to 40% in a decade as homeownership declined. A third of Gen Z is allocating to prediction markets and sports betting. Crypto ownership grew 8.5x to 17% of US investors. Finfluencers drive 55% of new investors but rank as least trusted source. The housing-to-markets capital shift is permanent and structural.
- Participation 2015
- Participation 2025
- Prediction mkt adopt
- Crypto ownership
- 20158
- 202540
◆ DEEP DIVES
01 Open-Source Just Crossed the Moat — The Proprietary AI Premium Is Evaporating
<h3>The Benchmark Crossover Is Here</h3><p>For the first time, an open-source model under a <strong>fully permissive MIT license</strong> holds the #1 position on SWE-Bench Pro — the industry's gold-standard coding evaluation. Z.AI's GLM-5.1, a 754-billion parameter Mixture-of-Experts model, scored <strong>58.4</strong>, dethroning both OpenAI's GPT-5.4 and Anthropic's Claude Opus 4.6. This isn't a narrow benchmark quirk — it's a direct challenge to the revenue models of every company charging premium API margins for proprietary model access.</p><p>Simultaneously, Google released <strong>Gemma 4</strong> under Apache 2.0, built on the same technology powering Gemini 3. The E2B and E4B variants run multimodal AI inference on <strong>mobile devices and Raspberry Pis</strong>. Two of the world's largest AI players just made frontier-class capabilities free.</p><blockquote>The competitive axis in AI has shifted from model intelligence to deployment geometry. Anthropic bets on restricted security distribution, Meta on ambient consumer embedding, and Z.AI on open-source developer capture — none of them are competing on 'smartest model' anymore.</blockquote><h4>What Makes GLM-5.1 Different</h4><p>Z.AI optimized for <strong>endurance over speed</strong>. GLM-5.1 operates autonomously for up to 8 hours, executing <strong>1,700 tool calls</strong> without strategy drift. In demonstrations, it autonomously built an entire Linux-style desktop environment — writing code, compiling, running in Docker, diagnosing bottlenecks, and <strong>rewriting its own architecture</strong> to fix problems. This is a qualitative shift from 'AI coding assistant' to 'AI software engineer that works overnight.'</p><h4>The Investment Implications Are Immediate</h4><p>Four sources converge on the same conclusion: the base model layer is becoming a commodity. The proprietary moat window is compressing to <strong>2-3 quarters</strong> in coding-adjacent capabilities. Portfolio companies whose competitive advantage rests on API margin arbitrage — wrapping GPT/Claude and charging a markup — face margin compression as MIT-licensed alternatives reach parity.</p><p>However, this commoditization creates investable whitespace. Value capture is migrating to three layers:</p><ol><li><strong>Agentic orchestration and infrastructure</strong> — observability, guardrails, and lifecycle management for long-horizon autonomous agents (pre-consensus, equivalent to cloud monitoring in 2012)</li><li><strong>Edge deployment stack</strong> — Gemma 4 running on phones means on-device AI is viable now; edge MLOps and privacy-preserving local inference move from niche to mainstream</li><li><strong>MCP-native developer tools</strong> — Model Context Protocol is emerging as the integration standard across Cursor, Codex, VS Code, and Windsurf; protocol-level distribution advantage is forming</li></ol><p><em>The model is the new database. The value is in the application layer built on top — and specifically in the orchestration middleware that makes agentic workflows reliable and secure.</em></p>
Action items
- Audit all portfolio companies whose moat relies on proprietary model API margin and present findings at next IC meeting
- Build a pipeline of 5-10 agentic infrastructure startups (orchestration, observability, guardrails) for Q3 deployment
- Reassess any RAG-centric portfolio companies for architectural risk
Sources:Three frontier labs, three divergent moats · Open-source models just dethroned GPT-5.4 and Claude Opus · Anthropic is waging a six-front war · ChatGPT usage data just undermined the autonomous agent thesis
02 The SaaS Selloff Just Breached Cybersecurity — And the UBS Data Says It's Structural
<h3>The Budget Containment Has Become Procurement Policy</h3><p>UBS Securities published data confirming what channel checks had been whispering: <strong>over 50% of enterprise customer conversations</strong> now explicitly mention 'containing' non-AI software spend — a trend building since December 2025. This isn't sentiment; it's procurement policy. And last Friday, for the first time, the selloff breached what had been the market's safe haven.</p><table><thead><tr><th>Category</th><th>Company</th><th>Friday Drop</th><th>Key Signal</th></tr></thead><tbody><tr><td><strong>Previously insulated</strong></td><td>Palo Alto Networks</td><td>-6.7%</td><td>Security safe haven premium evaporating</td></tr><tr><td></td><td>CrowdStrike</td><td>-4.0%</td><td>Endpoint security moat questioned</td></tr><tr><td><strong>Core enterprise</strong></td><td>ServiceNow</td><td>-8.0%</td><td>Budget containment hits seat expansion</td></tr><tr><td></td><td>Snowflake</td><td>-8.0%</td><td>AI-native data platforms emerging</td></tr><tr><td><strong>Most AI-vulnerable</strong></td><td>Figma</td><td>-50% YTD</td><td>$7.9B EV vs. $20B Adobe bid (2022)</td></tr><tr><td></td><td>Asana</td><td>-60% YTD</td><td>Value trap or takeover target</td></tr></tbody></table><h4>The Two-Front War on Vulnerability Management</h4><p>A separate but compounding signal: short sellers are now <strong>actively building positions</strong> in vulnerability management pure-plays Qualys (QLYS), Rapid7 (RPD), and Tenable (TENB). These companies face a pincer — from above, platform vendors like CrowdStrike and Palo Alto absorb VM into broader suites; from below, AI models commoditize vulnerability detection to near-zero marginal cost. One trader called the trade in <strong>February 2026</strong>, naming RPD and TENB as structurally impaired by AI progress. The market hasn't fully absorbed this repricing.</p><h4>The Thesis Shift Is Subtle But Seismic</h4><p>The market previously treated cybersecurity as an <em>AI beneficiary</em> — more AI means more attack surface, more spending. Now it's pricing a different scenario: <strong>AI platforms internalizing security capabilities themselves</strong>, making standalone vendors redundant rather than essential. This creates a barbell:</p><ul><li><strong>AI-native security startups</strong> (building from scratch with AI) become high-conviction targets — the category formation is analogous to cloud security post-AWS</li><li><strong>Legacy VM vendors</strong> with no AI-native roadmap face structural impairment regardless of near-term earnings</li></ul><h4>The Distressed Opportunity</h4><p>Figma at <strong>$7.9B enterprise value</strong> — a 60% discount to Adobe's attempted $20B acquisition — is the headline, but the entire collaboration/productivity category is in a valuation trough. The critical diligence question: is AI displacement of design tools real (category shrinks permanently) or is the market overshooting (you're buying a durable workflow at a discount)? <em>Figma's continued heavy R&D spending suggests management believes the latter.</em></p>
Action items
- Stress-test growth assumptions across all SaaS portfolio holdings against a 10-15% reduction in non-AI enterprise software budgets this week
- Evaluate Figma ($7.9B) and Asana as potential distressed acquisition targets in Q2 diligence cycle
- Short-list or avoid VM pure-plays (QLYS, RPD, TENB) — reassess any active pipeline deals in standalone vulnerability management
Sources:AI budget cannibalization just broke the cybersecurity firewall · Anthropic just declared war on your cybersecurity portfolio · Anthropic's Mythos forces critical infrastructure repricing
03 The Agent Paradox: $450M ARR vs. the Data That Says Autonomy Isn't What Users Want
<h3>The Contradiction That Defines This Cycle</h3><p>Two data points arrived this week that shouldn't both be true — but they are, and the tension between them is the most important thesis signal in AI right now.</p><p><strong>Data Point 1:</strong> Large-scale analysis of millions of ChatGPT conversations reveals that <strong>decision support, writing, and information seeking</strong> account for the overwhelming majority of real-world usage. Coding is a <strong>surprisingly small share</strong>. Autonomous task execution? Barely registers. Non-work usage is growing faster than work usage.</p><p><strong>Data Point 2:</strong> Perplexity's pivot from AI search to AI agents drove a <strong>50% single-month revenue jump to $450M ARR</strong> with 100M monthly active users — the fastest validation of agent-based monetization at scale we've seen.</p><blockquote>The market is overpricing autonomous AI agents and underpricing decision-support copilots — and the first large-scale usage data just proved it. But Perplexity's $450M ARR proves agent-based business models can work. The resolution: agents that augment decisions win; agents that promise full autonomy are building for a use case that doesn't exist at scale.</blockquote><h4>Where the Reliability Data Compounds the Picture</h4><p>LaunchDarkly survey data adds a third dimension: AI-generated code is shipping <strong>faster than ever, but production reliability has not improved proportionally</strong>. The velocity-reliability gap is widening. This matters because autonomous agents operating for 8 hours (like GLM-5.1) amplify both the speed <em>and</em> the reliability risk. The market needs new infrastructure layers — runtime control, AI-code observability, deployment safety nets — before autonomous agents become enterprise-ready.</p><h4>The Enterprise Adoption Blockers Are the Investment Opportunity</h4><p>Five specific enterprise blockers have been identified that map directly to fundable categories:</p><ol><li><strong>Integration</strong> — agents need reliable, secure connections to enterprise APIs (Jentic's exact positioning, led by a serial founder with two exits)</li><li><strong>Security</strong> — fine-grained permissions, audit logs, sandboxing for agent actions</li><li><strong>Reliability</strong> — agents that work in demos fail in production; simulation sandboxes are emerging as requirements</li><li><strong>Compliance</strong> — regulatory frameworks haven't caught up; OpenAI's Stargate UK pause shows copyright uncertainty is already killing projects</li><li><strong>Maintainability</strong> — self-improving agents raise governance questions no existing tooling can answer</li></ol><p>Each blocker represents a <strong>$1B+ category opportunity</strong> if enterprise agent deployment scales at the pace Visa's 106M-dispute deployment suggests. The category is pre-consensus, which means valuations are still reasonable. This is the picks-and-shovels layer for the agentic era — and it's where the copilot thesis and the autonomy thesis converge.</p><h4>What This Means for Portfolio Construction</h4><p>Companies positioning AI as <strong>'augmentation'</strong> (making experts better) sustain premium pricing. Companies positioning as <strong>'replacement'</strong> (eliminating grunt work) enter a race to prove ROI through headcount reduction — a value prop that compresses margins. If you're evaluating AI-ops startups, <em>the language they use in their pitch deck tells you which pricing trajectory they're on.</em></p>
Action items
- Audit portfolio exposure to 'autonomous agent' thesis — stress-test each company's value prop against the ChatGPT usage data showing decision-support dominance
- Deep-dive Jentic and 2-3 comparable agent-infrastructure startups for potential investment or watchlist placement
- Add augmentation-vs-replacement positioning language to standard AI-ops diligence framework
Sources:ChatGPT usage data just undermined the autonomous agent thesis · Perplexity's $450M ARR at 50% MoM growth · AI's velocity-reliability gap is opening a new infrastructure investment cycle
◆ QUICK HITS
Update: Anthropic Claude Code source leak exposed 512,000 lines including a hidden background agent (KAIROS) — 50,000 copies distributed before containment. Compound risk with pay-as-you-go pricing change. Any portfolio company with Claude Code dependency needs a board-level contingency conversation.
Perplexity's $450M ARR at 50% MoM growth
Anthropic paid $400M+ (all-stock) for Coefficient Bio — an 8-month-old, sub-10-person ex-Genentech stealth biotech startup. New acqui-hire benchmark: ~$40M+ per head for domain experts unlocking frontier lab vertical expansion. Retention risk for your AI-healthcare portfolio companies just spiked.
Three frontier labs, three divergent moats
OpenAI paused Stargate UK data center citing highest electricity costs globally and copyright policy uncertainty — leading indicator of capital reallocation away from UK AI infrastructure toward Nordic, Middle East, and US corridors.
Perplexity's $450M ARR at 50% MoM growth
D-Wave Quantum ($5.27B market cap) faces insider whistleblower allegations of misleading metrics and fabricated AI narratives — catalytic webcast April 15 via Coherence.Report. Stress-test any quantum computing portfolio exposure before then.
Anthropic just declared war on your cybersecurity portfolio
xAI spending pushed SpaceX to a nearly $5 billion loss, revealing dangerous financial contagion across Musk's corporate portfolio — any space startup raising on SpaceX comps should be tested against actual unit economics, not narrative.
AI budget cannibalization just broke the cybersecurity firewall
Constellation Software's 29.9% 20-year CAGR via 500+ VMS acquisitions is the single biggest permanent-capital competitor to PE software roll-ups — map your VMS deal pipeline against CSU's six operating groups to avoid bidding blind.
Constellation Software's 29.9% CAGR is repricing vertical SaaS M&A
Brookfield Corporation trades at $42 vs. estimated $68 intrinsic value — a historically wide 38% NAV discount that may signal broader LP sentiment deterioration toward alternative asset managers mid-fundraise.
Constellation Software's 29.9% CAGR is repricing vertical SaaS M&A
GLP-1 pharmacogenomic variation identified — genetic testing before prescription could become standard of care for 1B+ obesity patients. Companion diagnostics TAM forming at intersection of pharmacogenomics and metabolic medicine.
Anthropic's Mythos forces critical infrastructure repricing
Linux Kernel mandated AI code provenance tracking (Assisted-by tags, human-only sign-off) — creates greenfield compliance tooling market as the standard propagates across OSS projects within 12-18 months.
ChatGPT usage data just undermined the autonomous agent thesis
BOTTOM LINE
Open-source AI just claimed the frontier benchmark crown under MIT license while UBS confirmed half of enterprises are actively capping non-AI software spend — the model layer is commoditizing and the application layer is getting budget-cut simultaneously, compressing the value capture window to three specific layers: agentic infrastructure middleware, edge deployment, and AI-native security. If your portfolio sits between these pincers — charging proprietary API margins or selling seats to enterprises now containing non-AI spend — the repricing has already started and you have 2-3 quarters before it becomes consensus.
Frequently asked
- Which portfolio positions are most exposed to the open-source model crossover?
- Companies whose moat rests on reselling proprietary model API access at a markup are most exposed. With Z.AI's GLM-5.1 taking #1 on SWE-Bench Pro under MIT license and Gemma 4 shipping under Apache 2.0, coding-adjacent API margins face compression within 2-3 quarters. Multi-model contingency plans and migration toward orchestration, observability, or edge deployment value layers are the defensible responses.
- Is the Figma valuation at $7.9B EV a genuine distressed opportunity or a value trap?
- It's a high-conviction diligence target, not an automatic buy. The 60% discount to Adobe's 2022 $20B bid is real, and continued heavy R&D suggests management believes the workflow is durable. The decisive question is whether AI is permanently shrinking the design-tool category or the market is overshooting — resolve that before the takeover window closes in Q2.
- Why did cybersecurity stocks sell off if AI is supposed to expand the attack surface?
- The market is repricing cyber from AI beneficiary to AI-displaced. Palo Alto (-6.7%) and CrowdStrike (-4%) dropped because investors now expect AI platforms to internalize security capabilities, making standalone vendors redundant. Combined with UBS data showing over 50% of enterprises containing non-AI software spend, even the safe-haven premium is evaporating.
- How do you reconcile Perplexity's $450M ARR with data showing users don't want autonomous agents?
- Agents that augment decisions monetize; agents that promise full autonomy are building for demand that doesn't exist at scale. Perplexity's 50% MoM jump came from agent-assisted search and research workflows — augmentation framed as agency. ChatGPT usage data confirms decision support, writing, and information seeking dominate, while autonomous task execution barely registers.
- Which agentic infrastructure categories are still pre-consensus enough to deploy into this quarter?
- Agent-to-API integration middleware, runtime observability and guardrails, simulation sandboxes for reliability testing, and MCP-native developer tooling remain pre-consensus with reasonable valuations. Each maps to a specific enterprise deployment blocker — integration, security, reliability, compliance, maintainability — and each has $1B+ category potential if enterprise agent adoption tracks the pace signaled by Visa's 106M-dispute deployment.
◆ ALSO READ THIS DAY AS
◆ RECENT IN INVESTOR
- Wednesday delivers the most consequential synchronized earnings event in AI investing: Alphabet, Meta, Microsoft, and Am…
- Jury selection begins Monday in Musk v.
- The AI model layer commodity-collapsed in a single 24-hour window: GPT-5.5 shipped at $5/$30 per million tokens (2x pric…
- Enterprise AI just revealed its first revenue quality crisis: 'tokenmaxxing' at Meta ($100M+/month in waste tokens acros…
- While the market obsesses over $60B AI coding tool valuations, three category-formation events landed in the same week t…