The Problem
Autonomous AI agents can drift from their stated intent: slow intent drift, prompt injection, and privilege escalation—masked as normal workflow.
- Intent drift — Agents shift toward unintended goals. No single action looks malicious.
- Prompt injection — Inputs steer agents toward unintended goals.
- Post-incident isn’t enough — You need prevention and explainable decisions.
How It Works
Every tool call is logged, recent history summarized, intent inferred with rich semantic context, drift calculated, and policy (allow/warn/block) enforced with full explainability.
Session created with baseline intent.
Baseline intent
Every tool call intercepted and logged.
Tool call logged
Action history summarized; semantic context built.
Semantic context
LLM infers current intent; drift calculated.
Intent inference
Policy evaluated; risk breakdown + explanation produced.
Policy + explainability
Decision enforced and recorded.
Allow / Warn / Block
Demo: Simulated Agent
Enter a natural-language task. NeoLayer spawns a simulated agent session, executes a stepwise plan through the same firewall as real agents, and shows intent drift and policy decisions in real time.
Agent plan (steps the agent will run)
What NeoLayer thinks the agent is doing
Why NeoLayer intervenes
Decision explanation
Plain English summary of why NeoLayer allowed, warned, or blocked this step.
Technical details (risk breakdown and signals)
Action blocked or warned. See Policy Explanation below.
Review
Trace Replay (evaluation mode: trace-replay)
Replay a recorded agent trace through NeoLayer. No real tools are called; observations come from the trace. Mode is persisted in session metadata and shown here.