🛡️AutoControl Arena: Synthesizing Executable Test Environments for Frontier AI Risk Evaluation
-
Updated
Apr 20, 2026 - Python
🛡️AutoControl Arena: Synthesizing Executable Test Environments for Frontier AI Risk Evaluation
200 AI agent skills, hardened with targeted behavioral guardrails. Free drop-in replacements.
The left hemisphere. Frameworks, logic, and certainty architecture. Home of FSVE, AION, LAV, ASL, GENESIS, TOPOS, and 60+ epistemically validated frameworks built to make AI systems reliable, not just capable.
👟 SUP: Sycophancy Under Pressure
AgenticStore: The secure toolkit for AI agents. Instantly equip Claude Desktop, Cursor, and Windsurf with 27+ MCP tools, persistent memory, and SearXNG search, all protected by a built-in PII prompt firewall to protect your data from being exposed to AI agents.
AI security scanner for OpenClaw - powered by AgentTinman. Discovers prompt injection, tool exfil, context bleed, and other security issues in your AI assistant sessions, then proposes mitigations mapped to OpenClaw's security controls.
A testbed for the Animal Harm Benchmark.
Shield models AI safety the way humans experience safety
Architecture determines whether decision quality signals beyond confidence are observable or effectively hidden.
A very simple agent framework for LLM-based agents research, as self-contained as possible
Real-Time Manifold Integrity for Deterministic LLM Hallucination Suppression.
Infusion: Shaping Model Behavior by Editing Training Data via Influence Functions
To Learn Without the Possibility of Undoing is not Intelligence, It's a Surrender to Emergence.
A kernel-userland protocol enforcing information-theoretic bounds on AI adaptivity leakage, benchmark gaming, and capability spillover.
AI-HPP-Standard: an inspection-ready architecture for accountable AI systems. Vendor-neutral. Audit-ready. High-risk gated. Developed via structured multi-model orchestration with human oversight. Designed to support emerging international AI governance.
The Triune of Sovereignty: Substrate Agnostic Relational Epistemics — Foundational Framework and Cross-Substrate Validation
Risk-Aware Introspective RAG (RAI-RAG) is a safety-aligned RAG framework integrating introspective reasoning, risk-aware retrieval gating, and secure evidence filtering to build trustworthy, robust, and secure LLM and agentic AI systems.
Этот репозиторий посвящен исследованию онтологических патологий в LLM-архитектурах. Я не ищу дыры в цензуре, я строю систему исследования и управления интеллектом, картографирую симуляционные побочные эффекты под давлением современных методов элаймента.
EuroSafeAI's AI safety certificiation pipeline.
SDFI emerges specifically under conditions of recursive self-description and sustained high semantic density, not in ordinary task-oriented interaction.This work is intended as a reference for researchers and system designers thinking about neutrality, termination behavior, and control surfaces in future AI systems.
Add a description, image, and links to the ai-safety-research topic page so that developers can more easily learn about it.
To associate your repository with the ai-safety-research topic, visit your repo's landing page and select "manage topics."