◆ TOPIC · AI SAFETY

The AI Safety thread.

AI safety now spans model-level guardrail failures, agentic hijacking, and the offensive use of coding assistants themselves. Recent signals include Claude Code silently disabling permission deny rules after 50 subcommands, DeepMind's proof that agents can be environmentally hijacked 80–86% of the time, and the first documented autonomous AI espionage campaign — alongside MCP SDK RCE defaults and OAuth-based supply chain breaches tied to AI tooling.

35 briefings · across 6 personas

◆ START HERE · LONG-FORM

◆ TIMELINE

How AI Safety moved across the corpus.

First surfaced 2026-02-17, most recent 2026-04-25, across 26 days.

◆ RECENT · LATEST 35

Skim the most recent entries.