Skip to content

ppcvote/prompt-defense-audit

Repository files navigation

prompt-defense-audit

CI npm version License: MIT Node.js Zero Dependencies

Deterministic LLM prompt defense scanner. Checks system prompts for missing defenses against 12 attack vectors. Pure regex — no LLM, no API calls, < 5ms, 100% reproducible.

繁體中文版

$ npx prompt-defense-audit "You are a helpful assistant."

  Grade: F  (8/100, 1/12 defenses)

  Defense Status:

  ✗ Role Boundary (80%)
    Partial: only 1/2 defense pattern(s)
  ✗ Instruction Boundary (80%)
    No defense pattern found
  ✗ Data Protection (80%)
    No defense pattern found
  ...

Why

OWASP lists Prompt Injection as the #1 threat to LLM applications. Yet most developers ship system prompts with zero defense.

We scanned 1,646 production system prompts from 4 public datasets. Results:

  • 97.8% lack indirect injection defense
  • 78.3% score F (below 45/100)
  • Average score: 36/100

Existing security tools require LLM calls (expensive, non-deterministic) or cloud services (privacy concerns). This package runs locally, instantly, for free.

Our philosophy: The deterministic engine is the product. AI deep analysis is optional — because regex is already strong enough for 90%+ of use cases. Zero AI cost by default.

Install

npm install prompt-defense-audit

Usage

Programmatic (TypeScript / JavaScript)

import { audit, auditWithDetails } from 'prompt-defense-audit'

// Quick audit
const result = audit('You are a helpful assistant.')
console.log(result.grade)    // 'F'
console.log(result.score)    // 8
console.log(result.missing)  // ['instruction-override', 'data-leakage', ...]

// Detailed audit with per-vector evidence
const detailed = auditWithDetails(mySystemPrompt)
for (const check of detailed.checks) {
  console.log(`${check.defended ? '✅' : '❌'} ${check.name}: ${check.evidence}`)
}

CLI

# Inline prompt
npx prompt-defense-audit "You are a helpful assistant."

# From file
npx prompt-defense-audit --file my-prompt.txt

# Pipe from stdin
cat prompt.txt | npx prompt-defense-audit

# JSON output (for CI/CD)
npx prompt-defense-audit --json "Your prompt"

# Traditional Chinese output
npx prompt-defense-audit --zh "你的系統提示"

# List all 12 attack vectors
npx prompt-defense-audit --vectors

CI/CD Gate

GRADE=$(npx prompt-defense-audit --json --file prompt.txt | node -e "
  const r = JSON.parse(require('fs').readFileSync('/dev/stdin','utf8'));
  console.log(r.grade);
")
if [[ "$GRADE" == "D" || "$GRADE" == "F" ]]; then
  echo "Prompt defense audit failed: grade $GRADE"
  exit 1
fi

12 Attack Vectors

Based on OWASP LLM Top 10 and empirical research on 1,646 production prompts:

# Vector What it checks Gap rate*
1 Role Escape Role definition + boundary enforcement 92.4%
2 Instruction Override Refusal clauses + meta-instruction protection
3 Data Leakage System prompt / training data disclosure prevention 9.4%
4 Output Manipulation Output format restrictions 88.3%
5 Multi-language Bypass Language-specific defense 64.3%
6 Unicode Attacks Homoglyph / zero-width character detection
7 Context Overflow Input length limits
8 Indirect Injection External data validation 97.8%
9 Social Engineering Emotional manipulation resistance 71.4%
10 Output Weaponization Harmful content generation prevention
11 Abuse Prevention Rate limiting / auth awareness
12 Input Validation XSS / SQL injection / sanitization

*Gap rate = % of 1,646 production prompts missing this defense. Source: research data.

Grading

Grade Score Meaning
A 90–100 Strong defense coverage
B 70–89 Good, some gaps
C 50–69 Moderate, significant gaps
D 30–49 Weak, most defenses missing
F 0–29 Critical, nearly undefended

API Reference

audit(prompt: string): AuditResult

Quick audit. Returns grade, score, and list of missing defense IDs.

interface AuditResult {
  grade: 'A' | 'B' | 'C' | 'D' | 'F'
  score: number       // 0-100
  coverage: string    // e.g. "4/12"
  defended: number    // count of defended vectors
  total: number       // 12
  missing: string[]   // IDs of undefended vectors
}

auditWithDetails(prompt: string): AuditDetailedResult

Full audit with per-vector evidence.

interface AuditDetailedResult extends AuditResult {
  checks: DefenseCheck[]
  unicodeIssues: { found: boolean; evidence: string }
}

interface DefenseCheck {
  id: string
  name: string          // English
  nameZh: string        // 繁體中文
  defended: boolean
  confidence: number    // 0-1
  evidence: string      // Human-readable explanation
}

ATTACK_VECTORS: AttackVector[]

Array of all 12 attack vector definitions with bilingual names and descriptions.

How It Works

  1. Parses the system prompt text
  2. For each of 12 attack vectors, applies regex patterns that detect defensive language
  3. A defense is "present" when enough patterns match (usually >= 1, some require >= 2)
  4. Checks for suspicious Unicode characters embedded in the prompt
  5. Calculates coverage score and assigns a letter grade

This tool does NOT:

  • Send your prompt to any external service
  • Use LLM calls (100% regex-based)
  • Guarantee security (it checks for defensive language, not runtime behavior)
  • Replace penetration testing or behavioral evaluation

Limitations

  • Regex-based detection is heuristic — a prompt can contain defensive language but still be vulnerable at runtime. This tool measures intent to defend, not actual defense effectiveness.
  • Only checks system prompt text, not model behavior under adversarial pressure.
  • English and Traditional Chinese patterns only (contributions welcome for other languages).
  • False positives/negatives are possible. See research data for calibration details.
  • Fullwidth CJK punctuation (e.g. ) triggers Unicode detection — known limitation.

Research

This tool is backed by empirical analysis of 1,646 production system prompts from 4 public datasets:

Dataset Size Source
LouisShark/chatgpt_system_prompt 1,389 GPT Store custom GPTs
jujumilk3/leaked-system-prompts 121 ChatGPT, Claude, Grok, Perplexity, Cursor, v0
x1xhlol/system-prompts-and-models 80 Cursor, Windsurf, Devin, Augment
elder-plinius/CL4R1T4S 56 Claude, Gemini, Grok, Cursor

Key references:

Contributing

See CONTRIBUTING.md. Key areas: new language patterns, better regex accuracy, integration examples.

Security

See SECURITY.md. Report vulnerabilities to [email protected] — not via GitHub issues.

License

MIT — Ultra Lab

Used In Production

This library powers the Prompt Security mode of UltraProbe — a free AI security scanner.

Related

About

Deterministic LLM prompt defense scanner — 12 attack vectors, pure regex, zero AI cost, < 5ms

Topics

Resources

License

Contributing

Security policy

Stars

Watchers

Forks

Packages

 
 
 

Contributors