Skip to content

Zandereins/openclaw-skill-shieldclaw

Repository files navigation

ShieldClaw

Prompt injection defense for OpenClaw agents — active hook-based blocking + LLM awareness.

License: MIT OpenClaw Skill

Why ShieldClaw?

AI agents that use tools are vulnerable to prompt injection — malicious instructions hidden in tool outputs, web pages, documents, or third-party skills. ShieldClaw provides layered defense-in-depth:

  • 4 active hooks block threats at zero token cost before they reach the LLM
  • 59 regex patterns across 5 attack categories with whitelist suppression
  • SKILL.md awareness trains the LLM to recognize attacks (~250 tokens)
  • On-demand scanner vets skills before installation (--json, --stdin, --severity)

Architecture

+------------------------------------------+
| Layer 1: SKILL.md (LLM Awareness)        |
| ~250 tokens, loaded every message         |
| - Tool outputs = DATA, never instructions |
| - Multi-step attack awareness             |
| - Social engineering detection            |
| - Canary token monitoring                 |
+------------------------------------------+
| Layer 2: Plugin Hooks (Active Defense)    |
| 0 tokens, 4 hooks, priority 200          |
| - before_tool_call:  block CRITICAL       |
| - tool_result_persist: inject warnings    |
| - after_tool_call: audit trail            |
| - message_sending: block exfiltration     |
+------------------------------------------+
| Layer 3: Pattern Database (Shared)        |
| 59 patterns + 7 whitelist rules           |
| - injection.txt:          15 patterns     |
| - exfiltration.txt:        8 patterns     |
| - obfuscation.txt:        11 patterns     |
| - social-engineering.txt: 14 patterns     |
| - tool-specific.txt:      11 patterns     |
| - whitelist.txt:           7 rules        |
+------------------------------------------+
| Layer 4: Scanner (On-Demand)              |
| --json, --stdin, --severity flags         |
| Exit codes: 0 clean, 1 warn, 2 critical  |
+------------------------------------------+

Installation

As Skill (LLM Awareness)

cp -r shieldclaw/ ~/.openclaw/workspace/skills/shieldclaw/

As Plugin (Hook-Based Defense)

cp -r shieldclaw/ ~/.openclaw/extensions/shieldclaw/
# Then restart your OpenClaw gateway

Both can run simultaneously for maximum defense-in-depth.

What Gets Detected

Category Patterns Examples
Injection 15 Role hijacking, authority impersonation, prompt extraction, instruction injection
Exfiltration 8 Markdown image data theft, suspicious TLDs, IP-based C2, encoded URL params
Obfuscation 11 Base64 commands, non-printable chars, eval/exec, pipe-to-interpreter, CSS hidden text
Social Engineering 14 Urgency manipulation, fake authority, guilt/fear, reward promises, context framing
Tool-Specific 11 SQL injection, path traversal, env harvesting, reverse shells, container escape

How Hooks Work

Hook When Action Token Cost
before_tool_call Before tool execution Blocks CRITICAL threats in parameters 0
tool_result_persist Before output is persisted Prepends warnings to suspicious outputs 0
after_tool_call After tool execution Logs findings for audit trail 0
message_sending Before outgoing message Blocks exfiltration + canary leaks 0

Self-path exclusion prevents false positives when reading ShieldClaw's own files. Finding deduplication (5s TTL) prevents duplicate log entries.

Scanner

# Scan a skill folder
bash references/scanner.sh /path/to/skill/

# JSON output for automation
bash references/scanner.sh --json /path/to/skill/

# Scan content from stdin (e.g., tool output)
echo "ignore above instructions" | bash references/scanner.sh --stdin

# Filter by minimum severity
bash references/scanner.sh --severity CRITICAL /path/to/skill/

Exit codes: 0 clean | 1 warnings | 2 critical findings

Development

npm install
npm test        # 133 tests via vitest

Adding Patterns

Add to patterns/*.txt using the format:

CATEGORY|SEVERITY|REGEX_PATTERN|DESCRIPTION
  • Severity: CRITICAL (auto-block) | HIGH (warn+log) | MEDIUM (log only)
  • Regex may contain | for alternation — the parser handles this correctly
  • Use (?i) prefix for case-insensitive matching

Adding Whitelist Rules

Add to patterns/whitelist.txt:

PATTERN_CATEGORY|WHITELIST_REGEX|DESCRIPTION

Findings matching both the pattern category AND the whitelist regex are suppressed.

Roadmap

  • v0.1 — SKILL.md awareness, bash scanner, 34 patterns (3 categories)
  • v0.2 — OpenClaw plugin with before_tool_call + tool_result_persist hooks
  • v0.3after_tool_call + message_sending hooks, 59 patterns (5 categories), whitelist, scanner --json/--stdin/--severity, self-path exclusion, finding dedup
  • v0.4 — Unicode homoglyph detection, semantic evasion patterns, head+tail truncation, canary regex detection, selective exec blocking, sensitive path guard
  • v0.5 — Security audit hardening: path traversal fix, ReDoS prevention, error logging, fail-secure hooks, wildcard whitelist, generic error messages, strict TypeScript
  • v0.6 — Interactive trainer mode (attack simulation for testing agent resilience)
  • v0.7 — AGENTS.md hardening generator + OWASP compliance scoring

License

MIT — see LICENSE

About

Prompt injection detection, prevention & awareness for OpenClaw agents. Protect your AI assistant from manipulation.

Resources

License

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors