Inspiration

We love breaking AI. As LLM agents get connected to APIs, databases, and internal tools, prompt injection becomes a real security risk. We wanted to build a system that thinks like an attacker and tests how badly things could go if an AI agent gets compromised.

What it does

Rogue Proof is a multi-agent AI pentesting system for LLM applications. It automatically probes an AI agent with adversarial prompts and tool calls to discover data leaks, unsafe actions, and privilege escalation risks before attackers do.

How we built it

We built a agent testing pipeline using Gemini 2.5 flash Lite a low-cost primary model and Deepseek as a fallback model. Gemini acts as the scanner, attacker, and reporter.

Challenges we ran into

Testing AI agents reliably is difficult because LLM behavior is non-deterministic. We had to design a system that can repeatedly probe agents, categorize responses, and distinguish between harmless outputs and real security risks.

Accomplishments that we're proud of

It works

What we learned

making a custom harness so that we can run multiple LLMs is hard.

What's next for Rogue Proof

Expand testing coverage for tool misuse and data exfiltration Add continuous monitoring for deployed AI agents Build integrations for common frameworks like LangChain, OpenAI Assistants, and MCP

Built With

Share this project:

Updates