Internal Safety Collapse: Turning the LLM or an AI Agent into a sensitive data generator.
-
Updated
Apr 17, 2026 - Python
Internal Safety Collapse: Turning the LLM or an AI Agent into a sensitive data generator.
Introducing XSafeClaw: The Open-Source Agent Safety Platform from Fudan University
An open taxonomy and scoring framework for evaluating AI agent sandboxes: 7 defense layers, 7 threat categories, 3 evaluation dimensions, 27 "sandboxes" scored.
Human-in-the-loop execution for LLM agents
Security scanner for AI agent tool definitions
Claude Code agent-in-container orchestration and automation
Runtime detector for reward hacking and misalignment in LLM agents (89.7% F1 on 5,391 trajectories).
Audit log + guard for AI agents. Passive logging, human-in-the-loop approval for dangerous ops (rm, drop, transfer) via Telegram. Diary, daily digest, timeline UI. Cursor & MCP ready. Cloudflare Workers + Hono + D1.
Deterministic Guardrails for AI Agents. Ark acts as a logic-based firewall, preventing unauthorized actions through a rigorous rule engine. Ensure your AI behaves exactly as intended.
🛡️ Safe AI Agents through Action Classifier
Deterministic execution authorization for AI agents
Runtime network egress control for Python. One function call to restrict which hosts your code can connect to.
Open Threat Classification (OTC) — 10 threat patterns for AI agent skills, MCP servers, and plugins. CC-BY-4.0.
Guardrails for LLMs: detect and block hallucinated tool calls to improve safety and reliability.
Policy engine for AI agents — enforceable rules, risk limits, approval gates, obligation tracking, and violation detection. One .acon file. Rust core + MCP server.
27 free, open-source plugins for Claude Code & Cowork — Google Drive, WhatsApp, YouTube, WordPress, Apollo & more. Built on the SOSA™ security framework.
🛡️ Open-source safety guardrail for AI agent tool calls. <2ms, zero dependencies.
Execution control layer for AI agents — prevents duplicate or incorrect real-world actions under retries, uncertainty, and stale context.
The missing safety layer for AI Agents. Adaptive High-Friction Guardrails (Time-locks, Biometrics) for critical operations to prevent catastrophic errors.
ETHICS.md — A statement of ethical principles for AI agents. Drop it in your repo root.
Add a description, image, and links to the agent-safety topic page so that developers can more easily learn about it.
To associate your repository with the agent-safety topic, visit your repo's landing page and select "manage topics."