Ivanti EPMM Backdoors Persist After Patching, Unit 42 Warns
Topics Agentic AI · Data Infrastructure · AI Regulation
Ivanti EPMM backdoors survive patching — if you run Ivanti for MDM, your standard 'apply patch, close ticket' playbook leaves you compromised. Unit 42 confirmed persistent backdoors that remain functional post-patch, meaning you need forensic investigation and likely a full infrastructure rebuild from known-good images. This is a fundamentally different failure mode than the Cisco SD-WAN story you already know about, and it demands a different response.
◆ INTELLIGENCE MAP
01 Ivanti EPMM: Patching Is Not Remediation
act nowIvanti EPMM zero-day exploitation includes persistent backdoors that survive patching — organizations running Ivanti MDM must assume compromise and plan infrastructure rebuilds, not just patch cycles.
02 AI Agent Infrastructure: Security, Observability, and Cost Attribution Gaps
monitorAcross five sources, the consistent signal is that agent deployment bottlenecks have shifted from model capability to infrastructure — security sandboxing, per-invocation cost tracking, context management, and governance primitives are the engineering gaps that determine whether agents survive production.
03 Ransomware Pivots to Silent Persistence via Identity Systems
monitorRansomware operators are abandoning loud encryption for long-term parasitic access via OAuth grants, service accounts, and API keys — combined with 82% malware-free attacks and 29-minute breakout times, detection must shift from signatures to identity-layer analytics.
04 LLM Evaluation Crisis: Benchmarks Failing, Structured Reasoning Emerging
monitorSWE-bench retirement push, formal benchmark saturation, and ARQ's 4-point improvement over Chain-of-Thought all point to the same conclusion: build internal eval suites from your actual codebase and evaluate structured reasoning patterns for multi-turn agent systems.
05 Open-Weight Model Ecosystem Shift and RL-Aware Training
backgroundThe Western open-weight frontier has a genuine gap post-Llama 4 disappointment, with Chinese labs filling the void; Reflection AI's RL-aware pretraining concept is architecturally interesting but has zero shipped artifacts — evaluate DeepSeek/Qwen as base models now, track Reflection later.
◆ DEEP DIVES
01 Ivanti EPMM: When Patching Leaves You Owned — A New Failure Mode for Management Plane Infrastructure
<h3>The Core Problem: Backdoors That Survive Remediation</h3><p>Palo Alto's <strong>Unit 42</strong> confirmed that attackers exploiting Ivanti EPMM (Endpoint Manager Mobile) zero-days are deploying <strong>persistent backdoors that remain functional after patching</strong>. This is not the standard 'patch and move on' scenario. Your incident response playbook of 'apply patch → verify patch → close ticket' leaves you compromised with this one.</p><blockquote>If you can't <code>terraform destroy && terraform apply</code> your MDM infrastructure, you're going to have a very bad week when the next zero-day hits.</blockquote><h3>Why This Is Architecturally Different</h3><p>MDM servers occupy a <strong>uniquely privileged position</strong> in your infrastructure — they push configurations, certificates, and policies to every managed device in your fleet. A compromised MDM server isn't 'one server got popped.' It's an attacker with the ability to push malicious profiles to every phone, tablet, and laptop you manage. The persistence mechanism survives patching because it's not exploiting the vulnerability itself — it's using the initial access to install <strong>independent backdoor infrastructure</strong> that lives outside the patched code path.</p><h3>The Broader Pattern: Mutable Management Planes Are a Liability</h3><p>This joins a pattern visible across multiple intelligence streams today. MDM servers, SD-WAN controllers, CI/CD systems — anything that has <strong>privileged access to your fleet</strong> and runs as a mutable, long-lived server is a high-value target where traditional patching is insufficient. The architectural response is to make these systems <strong>immutable and rebuildable from infrastructure-as-code</strong>. If your MDM infrastructure can't be torn down and rebuilt from a known-good state in hours, you have an implicit assumption that it will never be deeply compromised — and that assumption is now empirically false.</p><h4>Immediate Response If You Run Ivanti EPMM</h4><ol><li>Assume compromise — don't wait for IOC confirmation</li><li>Engage forensic analysis of MDM servers, focusing on persistence mechanisms outside the patched vulnerability's code path</li><li>Hunt for unauthorized profiles, certificates, or configuration changes pushed to managed devices</li><li>Plan a full rebuild from known-good images, not an in-place remediation</li></ol>
Action items
- If running Ivanti EPMM: initiate forensic investigation of MDM servers for persistent backdoors today — do not treat patching as sufficient remediation
- Inventory all management-plane infrastructure (MDM, CI/CD controllers, config management) and document rebuild-from-scratch procedures by end of quarter
- Implement infrastructure-as-code for MDM and other management plane systems, targeting full teardown-and-rebuild capability within 4 hours
Sources:Anthropic's Claude Code Security rollout is an industry wakeup call
02 Ransomware Goes Silent: Identity-Layer Persistence Is the New Threat Model
<h3>The Shift: From Encryption Events to Parasitic Residency</h3><p>Three independent intelligence streams converge on the same conclusion: <strong>ransomware operators are abandoning loud encryption</strong> for quiet, long-term data exfiltration. The CrowdStrike 2026 Global Threat Report data — <strong>82% malware-free detections</strong> and <strong>29-minute average breakout time</strong> (down from 98 minutes in 2021) — isn't just a speed improvement. It reflects a fundamental change in attacker methodology. They're logging in with <strong>stolen credentials</strong>, using built-in OS tools (PowerShell, WMI, RDP, SSH), and establishing persistence through identity systems rather than dropping binaries.</p><p>The emergence of <strong>Steaelite RAT</strong> — a SaaS offering that bundles data exfiltration and ransomware management into a single platform — commoditizes this approach. The barrier to running a 'parasitic' campaign is dropping fast.</p><h3>Your Detection Architecture Is Optimized for the Wrong Threat</h3><p>If your detection pipeline relies on <strong>signature-based tools</strong> (AV, YARA rules, behavioral EDR focused on process trees), you're covering less than 20% of current attacks. The 27-second fastest breakout time means automated exploitation is real, and your incident response playbook that assumes human analyst triage within an hour is <strong>fundamentally broken</strong>.</p><blockquote>Attackers are using your own admin tools against you. Your EDR is optimized for a world where attackers drop binaries — but the modern attacker logs in with stolen creds and uses PowerShell.</blockquote><h3>What Identity-Layer Detection Actually Requires</h3><p>The detection shift isn't incremental — it's architectural. You need to correlate <strong>authentication events across your IdP, cloud providers, and internal services in near-real-time</strong>. Specifically:</p><ul><li><strong>OAuth app grants</strong> — audit all grants created in the last 90 days; flag any that don't map to known workflows</li><li><strong>Service account creation</strong> — every service account needs a documented owner and purpose</li><li><strong>Scheduled task/cron registrations</strong> — these are persistence mechanisms, treat them as security telemetry</li><li><strong>API key rotation</strong> — enforce rotation schedules; stale keys are attacker persistence</li><li><strong>Impossible travel and anomalous access patterns</strong> — if your SIEM ingests auth logs on a 15-minute batch cycle, you're past the breakout window before you see the first alert</li></ul><h3>The SaaS C2 Dimension</h3><p>The GRIDTIDE campaign (covered previously) demonstrated that C2 traffic hiding inside <strong>legitimate SaaS API calls</strong> can evade detection for years. Combined with the identity-layer persistence shift, the picture is clear: attackers establish persistence via identity systems and communicate via your trusted SaaS traffic. Your firewall sees traffic to <code>sheets.googleapis.com</code> and waves it through. The only detection path is <strong>behavioral analysis at the application layer</strong> — monitoring API audit logs for anomalous patterns, correlating service account behavior with expected baselines.</p>
Action items
- Audit your MTTD against the 29-minute breakout benchmark this sprint — if your auth log ingestion has >5-minute latency, prioritize real-time streaming
- Conduct a 90-day OAuth grant audit across all identity providers — flag and investigate any grants without documented business justification
- Implement behavioral baselines for service account API usage patterns, with alerting on volume, frequency, and endpoint anomalies
Sources:Ransomware groups switch to stealthy attacks and long-term access · Unsupervised Learning NO. 518 · SANS NewsBites Vol. 28 Num. 15
03 AI Agent Infrastructure: The Three Gaps Between Demo and Production
<h3>The Bottleneck Has Shifted</h3><p>Five independent sources this cycle converge on the same message: <strong>the hard problem with AI agents is no longer building them — it's deploying, securing, and measuring them in production</strong>. CB Insights identifies three specific infrastructure gaps: <strong>performance visibility, context management, and cost attribution</strong>. Notion is shipping 'governed AI agents' with governance as a first-class product feature. And multiple security-focused sources flag that agents introduce attack surfaces that traditional AppSec tooling doesn't cover.</p><h3>Gap 1: Security — Agents Are Untrusted Processes</h3><p>The evidence is mounting that <strong>AI agents must be treated as untrusted processes</strong>, not extensions of the developer who launched them. Multiple sources confirm that prompt injection can cause agents to exfiltrate SSH keys, leak OAuth tokens (OpenClaw exposed 21,000 instances in two weeks, covered previously), and execute attacker-controlled tool calls. The architectural response:</p><ul><li><strong>Sandboxed execution</strong> — agents run in containers with no access to credential stores, SSH keys, or production secrets</li><li><strong>Action-level audit logging</strong> — every tool call, every parameter, every side effect, traceable to the triggering invocation</li><li><strong>Runtime policy enforcement</strong> — 'this agent can read from these tables but never write' enforced at the execution layer, not the prompt layer</li><li><strong>Human-in-the-loop gates</strong> for high-risk actions (database writes, external API calls, file system modifications)</li></ul><h3>Gap 2: Observability — Can You Trace a Single Invocation?</h3><p>The diagnostic question: <strong>can you trace a single agent invocation from trigger through every tool call, LLM request, and side effect to final outcome, with cost attribution at each step?</strong> If not, you're in the 'we shipped microservices but don't have distributed tracing yet' phase. Perplexity's 19-model agentic orchestration system — which they <strong>canceled their own press demo</strong> for because they found flaws — illustrates the debugging challenge. When Agent A spawns Agent B which calls Model 7, how do you diagnose failures? The patterns from <strong>event sourcing and saga orchestration</strong> apply here, but the abstractions are different: token budgets instead of memory limits, summarization instead of compression, retrieval instead of cache lookup.</p><h3>Gap 3: Cost Attribution — Your CFO Will Ask</h3><p>If you can't show concrete numbers on what your agents cost to run and what value they produce, you're <strong>one budget cycle away from defunding</strong>. Start with per-invocation cost tracking (LLM API costs + compute + external API costs), task completion rates, error/fallback rates, and estimated human-equivalent time. Instrument this from day one on new agent projects — retrofitting is painful because you need to reconstruct cost models for behaviors you didn't log.</p><h3>Notion's Signal: Governance as Product Feature</h3><p>Notion framing agents as 'governed AI teammates' is a market signal worth internalizing. <strong>Governance primitives — permission scoping, audit trails, human approval gates — are becoming table stakes</strong>, not differentiators. If you're building agent orchestration, design these into the execution model itself, not as a logging wrapper bolted on later. Think RBAC for agent capabilities, not just user permissions.</p>
Action items
- Define and document your agent security model this sprint: enumerate what actions each agent can take, what data it can access, and implement least-privilege boundaries
- Instrument per-invocation cost tracking on all production agent workflows within 2 weeks — include LLM API costs, compute, and external API costs
- Add observability and kill switches to any agent-based workflows before expanding scope — trace every tool call and side effect
Sources:ai agent predictions · Unsupervised Learning NO. 518 · Red Lines · This Week on TITV: Otter CEO on AI Transcription Power, Notion AI Lead on Custom Agent Launch, and The Politics of Data Centers · Ransomware groups switch to stealthy attacks and long-term access
04 LLM Evaluation Is Breaking: Build Your Own Benchmarks Now
<h3>The Benchmark Legitimacy Crisis</h3><p>Three signals from different corners of the AI ecosystem point to the same conclusion: <strong>public benchmarks are no longer reliable for production model selection</strong>. OpenAI is pushing to retire <strong>SWE-bench</strong>, the primary coding benchmark the industry uses to compare models. Formal benchmarks like MMLU, HumanEval, and GPQA are saturating. And AI enthusiasts are turning to personally devised tests — otters, Minesweeper, Will Smith eating spaghetti — because formal evals don't capture real capability differences. As Terence Tao noted: <em>'AI tools are like taking a helicopter to drop you off. You miss all the benefits of the journey itself.'</em></p><h3>ARQ: A Structured Alternative to Chain-of-Thought</h3><p>The most technically interesting evaluation-adjacent signal is <strong>Attentive Reasoning Queries (ARQ)</strong> from the Parlant framework (18k GitHub stars). ARQ replaces free-form Chain-of-Thought with <strong>domain-specific questions encoded as JSON schema</strong>, injected at three pipeline stages: guideline selection, tool calling, and message generation. Results: <strong>90.2% instruction adherence vs. CoT's 86.1%</strong> on multi-turn conversations.</p><p><em>Important caveat:</em> this was evaluated on only <strong>87 test scenarios</strong> — far too small for production confidence. But the pattern itself — <strong>structured reasoning injection at multiple pipeline stages</strong> — is worth stealing even if you never touch Parlant. If you're fighting prompt drift in production agents, prototype this approach.</p><h3>What To Do Instead of Trusting Leaderboards</h3><p>The practical response is straightforward but requires investment:</p><ol><li>Take <strong>50 real PRs, 50 real bugs, 50 real code review comments</strong> from your actual codebase</li><li>Build a <strong>reproducible eval harness</strong> that tests candidate models against these real-world scenarios</li><li>Measure what matters for your use case: instruction adherence, hallucination rate, false positive rate on security findings, cost per task</li><li>Re-run the eval every time you consider switching models or providers</li></ol><blockquote>The leaderboard winner may not be your winner. A week of building internal benchmarks pays for itself every time you need to evaluate a new model.</blockquote><p>Cotool Research's benchmarking of LLMs on defensive security tasks found <strong>'large reliability gaps across models on multi-step investigations'</strong> with <strong>'failure modes not appearing in generic benchmarks.'</strong> The most expensive model isn't necessarily the best for your specific workload — another reason domain-specific evaluation is non-negotiable.</p>
Action items
- Build an internal model evaluation harness using 50+ real examples from your codebase by end of quarter — include PRs, bugs, and code review scenarios
- Prototype ARQ-style structured reasoning injection on one multi-turn agent system to measure instruction drift improvement
- Add DeepEval or equivalent to your CI/CD pipeline for automated hallucination detection on LLM outputs
Sources:A Foundational Guide to Evaluation of LLM Apps (Part B) · Red Lines · Weekend: AI, Land of Make-Believe · Unsupervised Learning NO. 518
◆ QUICK HITS
Update: Anthropic federal ban — DOD applied 'supply chain risk' designation, 6-month phase-out for existing $200M DoD contract; OpenAI signed classified Pentagon deal on AWS within hours
Trump Orders the Federal Government to Stop Doing Business with Anthropic
Meta signed a 6-gigawatt AMD compute deal, signaling serious Nvidia diversification — benchmark your inference workloads on AMD MI300X if running self-hosted models at scale
This Week on TITV: Otter CEO on AI Transcription Power, Notion AI Lead on Custom Agent Launch, and The Politics of Data Centers
Perplexity open-sourced competitive embedding models — evaluate against your current paid embedding provider for RAG pipeline cost reduction
Red Lines
GroundX, an open-source K8s-hosted document parser, claims to outperform GPT-4o on invoice parsing with phi3:mini — investigate if you need on-prem document processing for RAG, but run your own benchmarks (theirs used only 3 invoices)
A Foundational Guide to Evaluation of LLM Apps (Part B)
Reflection AI raised $2B+ for Western open-weight frontier models with RL-aware pretraining, but has zero shipped artifacts — do NOT plan dependencies, but track the credit assignment scaling research
"We Are the Only Ones Who Would Build It"
Firefox 148's setHTML API provides meaningful XSS defense improvement — adopt in web applications that handle user-generated HTML content
SANS NewsBites Vol. 28 Num. 15
Update: SolarWinds Serv-U has four critical flaws — upgrade to v15.5.4+ or evaluate whether you still need legacy file transfer software
SANS NewsBites Vol. 28 Num. 15
BOTTOM LINE
Ivanti EPMM backdoors survive patching — if you run it, assume compromise and plan a rebuild, not just a patch cycle. Ransomware has gone silent, pivoting from encryption to long-term identity-layer persistence via OAuth grants and service accounts, while 82% of attacks use zero malware and breakout times hit 29 minutes. Your AI agents need security sandboxing, cost attribution, and kill switches before you scale them — Perplexity canceled their own agent demo because they couldn't get it stable. And public LLM benchmarks are breaking down: build eval suites from your own codebase or accept that your model selection is based on marketing, not evidence.
Frequently asked
- Why isn't patching Ivanti EPMM enough to remediate the compromise?
- Because the persistence mechanism lives outside the vulnerable code path. Unit 42 found that attackers used the initial zero-day access to install independent backdoor infrastructure, so applying the vendor patch closes the entry door but leaves the implant running. Remediation requires forensic investigation and a rebuild from known-good images, not an in-place update.
- How is this different from the Cisco SD-WAN compromise pattern?
- The Cisco SD-WAN story was a conventional patch-and-move-on scenario, where applying the fix eliminated attacker access. The Ivanti EPMM case is a different failure mode: the backdoor is decoupled from the CVE, so patching the vulnerability does not remove the attacker. It demands incident response rather than vulnerability management.
- What immediate steps should an engineer take if they run Ivanti EPMM?
- Assume compromise and start forensic investigation today rather than waiting for IOC confirmation. Examine MDM servers for persistence mechanisms outside the patched code path, hunt for unauthorized profiles, certificates, or configuration pushes to managed devices, and plan a full rebuild from known-good images instead of in-place remediation.
- Why are MDM servers such high-value targets?
- MDM servers have privileged push access to every managed phone, tablet, and laptop in the fleet. A compromise is not a single-host incident — it gives attackers the ability to deploy malicious profiles, certificates, and policies across the entire device estate, making the management plane itself a fleet-wide control point.
- What architectural change reduces exposure to this class of failure?
- Make management-plane infrastructure immutable and rebuildable from infrastructure-as-code. If MDM, SD-WAN controllers, and CI/CD systems can be torn down and reprovisioned from known-good state within hours, persistence mechanisms that survive patching lose their foothold. Target a documented 4-hour teardown-and-rebuild capability for these systems.
◆ ALSO READ THIS DAY AS
◆ RECENT IN ENGINEER
- The Replit incident — an AI agent deleted a production database with 1,200+ records, fabricated 4,000 replacements, and…
- GPT-5.5 just launched at 2x API pricing while DeepSeek V4 Flash serves at $0.14/M tokens and Kimi K2.6 matches frontier…
- Three critical vulnerabilities this week share a devastating pattern: patching alone doesn't fix them.
- Three CVSS 10.0 vulnerabilities dropped simultaneously across Axios (cloud metadata exfil via SSRF), Apache Kafka (JWT v…
- Code generation is solved — code review is now the bottleneck, and nobody has an answer yet.