Inspiration

As AI agents become more autonomous, browsing the web, calling APIs, modifying files, and making decisions, small errors or prompt injections can quietly push them into unsafe or malicious behavior over time. Today, most safety tooling focuses on filtering individual prompts or responses, not on how an agent’s behavior drifts across multiple steps. We were inspired by how traditional antivirus software protects operating systems at runtime, continuously monitoring behavior and stopping threats before damage occurs. We wanted to bring the same concept to autonomous AI systems, a lightweight security layer that keeps agents aligned with their intended purpose and prevents dangerous actions before they happen.

What it does

It continuously monitors agent behavior and tool usage, detects signs of malicious or unsafe drift, and enforces safety policies in real time. When risky behavior is detected, IntentGuard can block, modify, or require approval for actions before they reach external systems.

Key capabilities include:

  • Runtime monitoring of agent actions and tool calls
  • Drift detection across multi-step agent behavior
  • Policy-based enforcement to block unsafe actions
  • Clear audit logs for visibility and debugging
  • Easy integration with existing agent frameworks
  • This allows developers and teams to deploy autonomous agents with greater confidence and control.

How we built it

IntentGuard is a zero-trust Gemini wrapper that intercepts AI tool calls and detects intent drift before execution.

  • Node.js backend with audit logging (SQLite), session state, and configurable risk thresholds
  • Gemini-powered intent inference to track agent goals over time
  • Policy engine that scores risk using intent drift, tool sensitivity, context, and history
  • Four outcomes: allow, allow-with-evolution, flag-for-review, or block
  • Lightweight wrapper that integrates without changing agent behavior
  • Live frontend demo with real-time drift scores and logs

Key choice: using LLM reasoning instead of rule-based classifiers because intent drift is semantic.

Challenges we ran into

  1. Comparing intent semantically rather than by keywords
  2. Balancing false positives with real threats
  3. Environment variable loading issues in Node.js ES modules
  4. Gemini API latency affecting real-time decisions
  5. Making the demo feel realistic instead of overly adversarial

Accomplishments that we're proud of

  • Built a production-style backend, not a mock demo
  • Detected goal evolution, not just malicious actions
  • Designed non-binary security decisions
  • Provided clear explanations for every decision
  • Integrated the real Gemini API end to end
  • Delivered clear documentation and a believable demo

What we learned

  • Intent drift is a semantic problem that benefits from LLM reasoning
  • Context (time, user, history) heavily affects intent
  • Too many false positives make security tools unusable
  • Explainability is essential for zero-trust systems
  • Realistic demos are more convincing than flashy ones

What's next for IntentGuard

  • Move to embeddings for more accurate intent comparison
  • Learn from human review decisions over time
  • Integrate with popular agent frameworks
  • Add enterprise features like compliance reporting and RBAC
  • Shift from detection toward proactive prevention

Vision: make IntentGuard a standard security layer for autonomous AI agents.

Built With

Share this project:

Updates