Featured
How Many Rewrites to Strip a Watermark? Empirical Paraphrase-Removal Curves for LLM Watermarks
Cross-model paraphrasing drops Kirchenbauer watermark detection from 100% to 60% in a single pass. After ten passes, it plateaus at 40%. The watermark is partially robust — but not enough for adversarial settings where the attacker has access to any LLM. I measured this by watermarking text with GPT-2, paraphrasing with Claude Haiku, and tracking how the z-score decays. Five experiments. Six pre-registered hypotheses. Real green-list watermarking with logit access. ...
Privilege Escalation Cascades at 98% While Domain-Aligned Attacks Are Invisible
Domain-aligned prompt injections cascade through multi-agent systems at a 0% detection rate. Privilege escalation payloads hit 97.6%. That’s a 98 percentage-point spread across payload types in the same agent architecture — the single biggest variable determining whether your multi-agent system catches an attack or never sees it. I ran six experiments on real Claude Haiku agents to find out why. Three resistance patterns explain the gap — and each has a quantified bypass condition. ...
AI Security Has a Shipping Problem
Thesis: The AI security industry produces frameworks and guidelines but almost no one ships working tools that practitioners can deploy today. The gap between “risk identified” and “risk mitigated” in AI security is wider than any other area of cybersecurity I’ve worked in. We have more frameworks per deployed tool than any domain in the history of information security. And the frameworks keep coming while the tools don’t. The Evidence 1. OWASP published the Agentic Top 10 in late 2025. No tools enforce it. ...
Browse by Topic
Weekly AI security research — findings, tools, and curated signal.
Subscribe on Substack