Inspiration
We love breaking AI. As LLM agents get connected to APIs, databases, and internal tools, prompt injection becomes a real security risk. We wanted to build a system that thinks like an attacker and tests how badly things could go if an AI agent gets compromised.
What it does
Rogue Proof is a multi-agent AI pentesting system for LLM applications. It automatically probes an AI agent with adversarial prompts and tool calls to discover data leaks, unsafe actions, and privilege escalation risks before attackers do.
How we built it
We built a agent testing pipeline using Gemini 2.5 flash Lite a low-cost primary model and Deepseek as a fallback model. Gemini acts as the scanner, attacker, and reporter.
Challenges we ran into
Testing AI agents reliably is difficult because LLM behavior is non-deterministic. We had to design a system that can repeatedly probe agents, categorize responses, and distinguish between harmless outputs and real security risks.
Accomplishments that we're proud of
It works
What we learned
making a custom harness so that we can run multiple LLMs is hard.
What's next for Rogue Proof
Expand testing coverage for tool misuse and data exfiltration Add continuous monitoring for deployed AI agents Build integrations for common frameworks like LangChain, OpenAI Assistants, and MCP
Log in or sign up for Devpost to join the conversation.