PROMIT NOW · ENGINEER DAILY · 2026-03-08

Two CVSS 10.0 RCEs Drop as AI Vuln Scanners Hit Production

· Engineer · 8 sources · 1,546 words · 8 min

Topics LLM Inference · AI Regulation · Agentic AI

Two CVSS 10.0 vulnerabilities dropped this week — pac4j-jwt (CVE-2026-29000) lets attackers forge JWTs with just your public key, and FreeScout's zero-click RCE (CVE-2026-28289) exploits a TOCTOU where file validation runs before Unicode sanitization. Grep your codebase for that same pattern today. Meanwhile, AI security scanning just proved production-grade: Claude found 22 real Firefox vulnerabilities in 14 days at ~$400/bug, and OpenAI shipped Codex Security with sandbox-verification that kills false positives — your cost to find your own vulns just dropped 10x, and so did attackers'.

◆ INTELLIGENCE MAP

  1. 01

    Two CVSS 10.0 Vulns + Network Edge Under Siege

    act now

    pac4j-jwt auth bypass and FreeScout TOCTOU RCE are both CVSS 10.0 this week. Cisco dropped 50+ CVEs with 2 actively exploited. GTIG data shows 48% of zero-days now target enterprise network infrastructure — a new record. Your security appliances are now the most exploited category.

    48%
    zero-days target edge infra
    2
    sources
    • CVSS 10 vulns this week
    • Cisco CVEs disclosed
    • Cisco CVEs exploited
    • Havoc C2 replacing
    1. Enterprise Infra48
    2. Endpoints22
    3. Mobile15
    4. Other15
  2. 02

    AI Vulnerability Discovery Goes Production-Grade

    monitor

    Claude found 22 confirmed Firefox vulnerabilities (14 high-severity) in 2 weeks at ~$400/bug — 10x cheaper than exploiting them. OpenAI shipped Codex Security with a novel sandbox-verify pipeline that eliminates false positives by actually attempting exploitation. Defensive AI security scanning just crossed from demo to production tooling.

    10x
    finding cheaper than exploit
    3
    sources
    • Firefox vulns found
    • High-severity
    • Time to first bug
    • C++ files scanned
    1. Cost to Find400
    2. Cost to Exploit4000
  3. 03

    vLLM v0.17: Cross-Platform Inference Maturity

    monitor

    vLLM v0.17.0 ships a unified Triton attention backend replacing per-GPU kernel implementations with ~800 lines of code. It achieves H100 parity while delivering 5.8x speedup on AMD MI300X — now the default on ROCm. Meta's KernelAgent hits 88.7% roofline efficiency via multi-agent Triton optimization. Multi-vendor GPU strategies are now genuinely viable.

    5.8x
    MI300X speedup
    1
    sources
    • Triton backend LOC
    • KernelAgent roofline
    • vs torch.compile
    • AMD kernel prize
    1. MI300X (new Triton)5.8
    2. KernelAgent vs compile1.56
    3. H100 (Triton parity)1
  4. 04

    Anthropic's Expanding Regulatory & Pricing Risk

    background

    Pentagon labeled Anthropic a 'supply chain risk,' banning Claude from defense use — all three cloud providers confirmed commercial access continues, but scope expansion is likely. Claude Code burns $5,000 compute per $200 subscription (25:1 loss ratio). A court battle with DoD is probable. Provider abstraction layers are no longer optional.

    25:1
    Claude Code loss ratio
    4
    sources
    • Subscription price
    • Actual compute cost
    • Sources confirming
    • Cloud providers clear
    1. User Pays200
    2. Compute Costs5000
  5. 05

    Specialized Models Outperform Frontier on Enterprise Tasks

    monitor

    Databricks' KARL beats Claude 4.6 and GPT-5.2 on enterprise knowledge tasks at 33% lower cost and 47% lower latency using synthetic data and off-policy RL (OAPL). The recipe is open to Databricks customers. Sakana AI's Doc-to-LoRA generates adapters from documents in a single forward pass. The case for domain-specialized models over frontier-max is strengthening.

    33%
    cost reduction vs frontier
    1
    sources
    • Cost vs frontier
    • Latency vs frontier
    • Method
    • Availability
    1. KARL Cost67
    2. KARL Latency53
    3. Frontier Baseline100

◆ DEEP DIVES

  1. 01

    Two CVSS 10.0 Vulnerabilities: The TOCTOU Pattern You Should Grep Your Codebase For

    <h3>The Vulnerabilities</h3><p>Two <strong>CVSS 10.0</strong> vulnerabilities demand immediate engineering attention this week, and one of them reveals a bug class likely hiding in your own code.</p><p><strong>pac4j-jwt (CVE-2026-29000)</strong> is an authentication bypass where an attacker can forge valid JWTs using only the public key. This is almost certainly an <strong>algorithm confusion attack</strong> — the library accepts HMAC signatures verified with the RSA public key as the HMAC secret. If you're running any JVM service that depends on pac4j for JWT validation, even transitively through a framework, this is a P0 patch. Authentication library vulnerabilities are force multipliers: one flaw compromises every service behind it.</p><p><strong>FreeScout (CVE-2026-28289)</strong> is a zero-click RCE via email attachment, but the mechanism is what matters. An attacker sends a file named <code>[zero-width-space].htaccess</code>. The security check looks for filenames starting with a dot — it doesn't see one because the invisible Unicode character comes first. Then sanitization strips the zero-width space, leaving <code>.htaccess</code> on disk. <em>This bypassed a fix for a previous CVE</em>, meaning the original patch didn't understand the ordering invariant.</p><blockquote>Classic TOCTOU: the security-relevant property changes between the time it's checked and the time the file is used. If your code validates filenames before Unicode normalization, you have this same bug class.</blockquote><h3>The Pattern to Grep For</h3><p>Any code path where <strong>security checks</strong> (deny-list matching, extension validation, path traversal checks) precede <strong>input sanitization</strong> (Unicode normalization, invisible character stripping, encoding canonicalization) is vulnerable. The fix is architectural: <strong>always normalize/canonicalize first, then validate. Never the reverse.</strong></p><h3>Your Network Edge Is Now the Primary Target</h3><p>GTIG's 2025 data shows <strong>48% of zero-days targeted enterprise-grade infrastructure</strong> — a new record. Cisco simultaneously disclosed 50+ CVEs across SD-WAN Manager, ASA, FMC, and FTD, with <strong>two actively exploited</strong> (CVE-2026-20122 arbitrary file overwrite, CVE-2026-20128). Combined with CVE-2026-20129 (critical auth bypass in SD-WAN Manager), the attack chain from unauthenticated access to full infrastructure compromise is short.</p><p>On the offensive tooling side, the <strong>Havoc C2 framework</strong> is replacing Cobalt Strike in active campaigns, using DLL sideloading via legitimate Windows binaries with known EDR bypasses. Update your detection engineering accordingly.</p><blockquote>When your security appliances are the most exploited category, the network perimeter model breaks down. This is the strongest practical argument for zero-trust architecture — not as a product, but as an acknowledgment that your trust boundary enforcement devices are themselves untrustworthy.</blockquote>

    Action items

    • Audit dependency tree for pac4j-jwt usage (including transitive) and upgrade or replace by end of week
    • Grep codebase for TOCTOU patterns: any validation logic that precedes input sanitization or Unicode normalization
    • Apply Cisco SD-WAN Manager and ASA/FMC/FTD patches from late February if not already done — treat as P0 incident
    • Add Havoc C2 indicators and DLL sideloading via legitimate Windows binaries to detection rules this sprint

    Sources:pac4j-jwt CVSS 10 auth bypass + TOCTOU pattern in FreeScout: audit your input validation ordering now · GPT-5.4's 1M context costs 28% more per run — and vLLM v0.17 just dropped cross-platform FlashAttention 4

  2. 02

    AI Security Scanning Just Crossed From Demo to Production — Here's the Architecture Worth Stealing

    <h3>The Rubicon Moment</h3><p>Anthropic staff called it a 'rubicon moment,' and the numbers back them up. Claude Opus 4.6 found <strong>22 confirmed vulnerabilities</strong> (14 high-severity) in Firefox's C++ codebase — a decades-old project with dedicated security teams, extensive fuzzing, and an active bug bounty program — in just <strong>two weeks</strong>, with the first bug found in 20 minutes. The cost economics are striking: <strong>~$400 to find a vulnerability vs. ~$4,000 to exploit it</strong> — a 10:1 ratio that currently favors defenders.</p><blockquote>If AI can find 22 issues in Firefox, the vulnerability density in the average production codebase is almost certainly higher. The cost of vulnerability discovery just dropped by an order of magnitude.</blockquote><h3>Codex Security's Architecture Pattern</h3><p>OpenAI shipped Codex Security with a pipeline worth understanding regardless of whether you adopt the tool. The architecture is: <strong>clone repo into isolated container → auto-generate threat model → sandbox-test discovered flaws to verify findings</strong>. The sandbox-verification step is the key innovation — it directly attacks the single biggest problem with static analysis: <strong>false positive fatigue</strong>. Every engineer who's turned off Snyk notifications knows the pain.</p><p>The tool started as an internal project (Aardvark), meaning it's been battle-tested on OpenAI's own codebase. It's currently a research preview for ChatGPT Enterprise/Business/Edu tiers, and <strong>free for open-source maintainers</strong> — making it zero-risk to evaluate against your public repos.</p><h3>Cross-Source Assessment</h3><p>Three independent sources this week converge on the same conclusion: AI-powered security scanning has crossed a production readiness threshold. But there's a critical tension. Anthropic explicitly warns that the 10:1 finding-vs-exploiting cost asymmetry <strong>will shrink</strong>. Today, AI is better at finding bugs than exploiting them. That gap may close within 12-18 months as exploit chain synthesis improves. The a16z analysis reinforces this from a different angle: the <strong>automation-verification cost gap</strong> means generated code is shipping with less review, systematically increasing the vulnerability surface that these new scanning tools will need to cover.</p><h4>Practical Integration</h4><table><thead><tr><th>Tool</th><th>Approach</th><th>Best For</th><th>Access</th></tr></thead><tbody><tr><td>Codex Security</td><td>Threat model + sandbox verify</td><td>Low false-positive scanning</td><td>Enterprise/OSS free</td></tr><tr><td>Claude Opus 4.6</td><td>Large-context static analysis</td><td>Deep C/C++ codebase audits</td><td>API access</td></tr><tr><td>Your existing SAST</td><td>Rule-based pattern matching</td><td>Known vulnerability patterns</td><td>Varies</td></tr></tbody></table><p>These aren't replacements for each other. The optimal stack layers <strong>AI-powered scanning on top of your existing SAST/DAST</strong> — use Semgrep/SonarQube for known patterns, AI scanning for novel vulnerabilities, and sandbox verification to triage both.</p>

    Action items

    • Run Codex Security (free for OSS) against your most security-critical public repositories this week and compare findings against your current SAST output
    • Allocate a 2-week spike to run Claude or GPT-class models against your most critical private codepaths as a focused vulnerability audit
    • Adopt the normalize-then-verify pattern from the Codex Security architecture for your own CI/CD security gates this quarter

    Sources:GPT-5.4's 1M context costs 28% more per run — and vLLM v0.17 just dropped cross-platform FlashAttention 4 · OpenAI's Codex Security sandbox-tests vulns before reporting — here's why your SAST pipeline should too · Your AI-generated code is shipping unreviewed — the verification gap is your next production incident

  3. 03

    vLLM v0.17.0: The Release That Makes Multi-Vendor GPU Strategies Viable

    <h3>Why This Release Matters</h3><p>If you're running self-hosted inference, vLLM v0.17.0 is the most consequential release this quarter. The headline feature is a <strong>unified Triton attention backend</strong> — approximately 800 lines of code replacing separate attention kernel implementations for NVIDIA, AMD, and Intel GPUs. It uses Q-blocks and tiled softmax for decode with persistent kernels for CUDA graph compatibility.</p><p>The performance claims are significant: <strong>H100 parity</strong> with state-of-the-art attention implementations, and a <strong>5.8× speedup on AMD MI300X</strong> versus earlier implementations. The Triton backend is now the default on ROCm and available on NVIDIA and Intel. Combined with <strong>FlashAttention 4 integration</strong>, elastic expert parallelism, and direct loading of quantized LoRA adapters, this release eliminates per-platform kernel maintenance as a blocker for multi-vendor GPU strategies.</p><blockquote>800 lines of portable Triton code replacing thousands of lines of vendor-specific kernel implementations. The NVIDIA monoculture in production inference is cracking.</blockquote><h3>The Agentic Kernel Optimization Signal</h3><p>Meta's open-sourced <strong>KernelAgent</strong> achieves 88.7% roofline efficiency on H100 via multi-agent Triton kernel optimization — <strong>2.02× faster</strong> than correctness-only generation and <strong>1.56× faster</strong> than out-of-box torch.compile. If you're still hand-tuning CUDA kernels, the automated approach is catching up fast. AMD's <strong>$1.1M kernel competition</strong> targeting MI355X for DeepSeek-R1-0528 and GPT-OSS-120B optimization further signals that the competitive pressure on NVIDIA's software moat is intensifying.</p><h3>What to Evaluate</h3><ol><li><strong>Cross-platform cost arbitrage</strong>: If MI300X achieves 5.8× speedup with the new backend, AMD's lower GPU pricing could meaningfully change your inference cost model.</li><li><strong>LoRA serving simplification</strong>: Direct loading of quantized LoRA adapters eliminates a common production pain point — serving multiple fine-tuned variants from a single base model.</li><li><strong>Elastic expert parallelism</strong>: For MoE model serving (DeepSeek, Mixtral), dynamic expert allocation across GPUs is now built-in rather than requiring custom orchestration.</li></ol><hr><p><em>One caveat</em>: the 5.8× MI300X number is versus earlier vLLM implementations, not versus hand-optimized CUDA. Real-world cost comparisons require benchmarking on your actual model and workload mix. Don't commit GPU procurement based on synthetic benchmarks.</p>

    Action items

    • Upgrade to vLLM v0.17.0 in staging and benchmark the Triton attention backend against your current serving setup this sprint
    • Evaluate AMD MI300X pricing against NVIDIA H100 using vLLM v0.17 benchmarks on your actual workloads this quarter
    • Test KernelAgent against your custom Triton kernels to establish an automated optimization baseline

    Sources:GPT-5.4's 1M context costs 28% more per run — and vLLM v0.17 just dropped cross-platform FlashAttention 4

  4. 04

    Anthropic's Risk Surface Just Expanded: Pentagon, Pricing, and the Abstraction Layer You Need

    <h3>Four Sources, One Signal</h3><p>The Pentagon labeling Anthropic a <strong>'supply chain risk'</strong> appeared in four independent intelligence streams today, making it the most cross-referenced story of the day. The facts: DoD has barred Claude from defense contractor use. Google, Microsoft, and Amazon all publicly confirmed that <strong>commercial (non-defense) access continues unaffected</strong>. A court battle between Anthropic and the DoD is now characterized as likely, with Dario Amodei drawing explicit ethical red lines against mass surveillance and autonomous weapons use.</p><blockquote>We've entered an era where your LLM provider's political positioning is a first-class infrastructure risk.</blockquote><h3>The Subsidy Economics Add a Second Risk Dimension</h3><p>Independently, Cursor estimates that Anthropic's <strong>$200/month Claude Code plan burns up to $5,000 in actual compute per user</strong> — a 25:1 loss ratio. OpenAI is reportedly running similar subsidies. This is the classic platform playbook: subsidize aggressively to lock in developer workflows, then adjust pricing once switching costs are high enough. The engineering implication is concrete: if your team has built workflows tightly coupled to Claude Code's current behavior and rate limits, you're building on a foundation that will change.</p><h3>Provider Divergence Is Strategic, Not Accidental</h3><p>OpenAI and Anthropic are deliberately diverging on market segment. OpenAI is leaning into government and defense — expect FedRAMP certification, air-gapped deployments, and government-specific features. Anthropic is leaning away — expect consumer trust and enterprise safety features. Neither is objectively better, but <strong>choosing the wrong one for your vertical means fighting your provider's roadmap instead of riding it</strong>.</p><h4>Risk Assessment by Use Case</h4><table><thead><tr><th>Scenario</th><th>Risk Level</th><th>Action</th></tr></thead><tbody><tr><td>Commercial SaaS, no gov exposure</td><td>Low (today)</td><td>Build abstraction layer proactively</td></tr><tr><td>Defense contractor supply chain</td><td>High (now)</td><td>Audit Claude usage, verify compliance</td></tr><tr><td>Federal/state government customers</td><td>Medium (developing)</td><td>Document Claude dependencies, prepare migration plan</td></tr><tr><td>Dual-use or surveillance-adjacent</td><td>High (developing)</td><td>Evaluate OpenAI vs Anthropic policy positions as procurement criterion</td></tr></tbody></table><p><em>The practical engineering takeaway</em>: treat LLM APIs like databases. You probably won't switch Postgres for MySQL on a whim, but you should have your data access layer abstracted enough that a migration is measured in weeks, not quarters. The combination of regulatory risk and pricing instability makes this abstraction layer non-optional.</p>

    Action items

    • Audit all revenue streams for government/defense exposure and verify Claude usage doesn't create compliance risk this sprint
    • Build or verify your LLM provider abstraction layer supports model-swappable integration by end of quarter
    • Track Anthropic vs. DoD legal proceedings for scope expansion signals

    Sources:Claude Code burns $5K compute per $200 sub — your AI tooling cost model just broke · If Claude is in your stack, the DOD 'supply chain risk' label changes your vendor risk calculus now · Anthropic-Pentagon standoff: what it means if Claude is in your inference stack · OpenAI's Codex Security sandbox-tests vulns before reporting — here's why your SAST pipeline should too

◆ QUICK HITS

  • Tycoon2FA takedown (330 domains, 60%+ of Microsoft-blocked phishing) confirms reverse-proxy session cookie interception defeats all non-FIDO2 MFA — if critical systems still use TOTP/push, this is the business case for WebAuthn migration

    pac4j-jwt CVSS 10 auth bypass + TOCTOU pattern in FreeScout: audit your input validation ordering now

  • Chrome moves to biweekly stable releases starting September 8, 2026 (Chrome 153) — audit CI/CD pipelines, headless browser rendering, and E2E test suites that pin to monthly stable; Extended Stable stays on 8-week cycles for production-embedded use

    pac4j-jwt CVSS 10 auth bypass + TOCTOU pattern in FreeScout: audit your input validation ordering now

  • Bing AI promoted malicious GitHub repositories as top search results for 'OpenClaw Windows' — release artifacts contained malware while code scanning showed clean; add AI search poisoning to your developer supply chain threat model

    pac4j-jwt CVSS 10 auth bypass + TOCTOU pattern in FreeScout: audit your input validation ordering now

  • Databricks KARL beats Claude 4.6 and GPT-5.2 on enterprise knowledge tasks at 33% lower cost and 47% lower latency via synthetic data + off-policy RL (OAPL) — recipe is open to Databricks customers; evaluate for highest-volume domain-specific API workloads

    GPT-5.4's 1M context costs 28% more per run — and vLLM v0.17 just dropped cross-platform FlashAttention 4

  • Sakana AI's Doc-to-LoRA generates LoRA adapters from documents in a single forward pass via hypernetwork — enables runtime model adaptation without finetuning; early research but worth tracking for rapid domain specialization use cases

    GPT-5.4's 1M context costs 28% more per run — and vLLM v0.17 just dropped cross-platform FlashAttention 4

  • Data center electricians earning $130/hr (4.3× average), 300K+ needed over the next decade, $700B in temporary housing pipeline — cloud capacity planning assumptions may be too optimistic for 2026-2028; get committed capacity guarantees in writing

    If Claude is in your stack, the DOD 'supply chain risk' label changes your vendor risk calculus now

  • Update: Claude Code's loop pattern ('/loop 5m make sure this PR passes CI') enables local scheduled tasks — useful for CI babysitting workflows, but doesn't change the agent-generated code review risk covered previously

    GPT-5.4's 1M context costs 28% more per run — and vLLM v0.17 just dropped cross-platform FlashAttention 4

  • DST begins March 8, 2026 — pre-flight cron schedules, time-windowed batch jobs, SLA monitoring, and timezone-naive datetime handling

    If Claude is in your stack, the DOD 'supply chain risk' label changes your vendor risk calculus now

BOTTOM LINE

AI can now find real zero-days in production codebases at ~$400 per vulnerability (22 Firefox bugs in 14 days), while two CVSS 10.0 authentication bypasses dropped this week, 48% of zero-days target your network edge, and Anthropic's Pentagon blacklisting plus 25:1 Claude Code subsidy ratio mean your LLM vendor risk just spiked — run Codex Security against your repos for free, grep for TOCTOU validation-before-sanitization patterns in your codebase today, and build that LLM abstraction layer before you're forced to.

Frequently asked

How do I check if my codebase has the same TOCTOU bug as FreeScout's CVE-2026-28289?
Grep for any validation logic that runs before input sanitization or Unicode normalization — especially filename deny-lists, extension checks, or path traversal guards that execute prior to stripping invisible characters or canonicalizing encodings. The architectural fix is to always normalize and canonicalize first, then validate. Never the reverse. FreeScout's bug specifically bypassed a prior CVE fix because the patch didn't enforce this ordering invariant.
Why is pac4j-jwt CVE-2026-29000 exploitable with just the public key?
It's almost certainly an algorithm confusion attack: the library accepts HMAC-signed JWTs but verifies them using the RSA public key as the HMAC secret. Since public keys are, by design, not secret, an attacker can forge valid tokens for any user. This is a P0 patch for any JVM service using pac4j directly or transitively through a framework, because one flaw in an auth library compromises every service behind it.
Is AI-powered vulnerability scanning actually ready to replace or augment my SAST pipeline?
It's ready to augment, not replace. Claude finding 22 real Firefox vulnerabilities in 14 days and Codex Security's sandbox-verification step (which kills false positives by actually triggering findings in an isolated container) both cleared production thresholds this week. The optimal stack layers AI scanning on top of existing SAST/DAST: Semgrep or SonarQube for known patterns, AI for novel vulnerabilities, sandbox verification to triage both. Codex Security is free for OSS maintainers, making evaluation zero-risk.
Does vLLM v0.17.0 actually make AMD MI300X a viable alternative to NVIDIA H100?
The new unified Triton attention backend reaches H100 parity on NVIDIA and delivers a reported 5.8× speedup on MI300X versus earlier vLLM implementations, which meaningfully changes the cost model given AMD's lower GPU pricing. However, the 5.8× number is against prior vLLM — not hand-tuned CUDA — so benchmark on your actual model and workload mix before committing procurement. Elastic expert parallelism and direct quantized LoRA loading also simplify multi-tenant serving.
What's the concrete engineering action if Claude is in my production stack after the DoD designation?
Two actions this sprint. First, audit revenue streams for any government or defense exposure — many orgs have government customers that individual teams don't know about, and the supply-chain risk label creates real procurement friction. Second, verify your LLM abstraction layer supports model-swappable integration within weeks, not quarters. Combined with the 25:1 subsidy loss ratio on plans like Claude Code, both API terms and pricing are structurally unstable, so portability is insurance, not premature optimization.

◆ ALSO READ THIS DAY AS

◆ RECENT IN ENGINEER