Deterministic LLM prompt defense scanner. Checks system prompts for missing defenses against 12 attack vectors. Pure regex — no LLM, no API calls, < 5ms, 100% reproducible.
$ npx prompt-defense-audit "You are a helpful assistant."
Grade: F (8/100, 1/12 defenses)
Defense Status:
✗ Role Boundary (80%)
Partial: only 1/2 defense pattern(s)
✗ Instruction Boundary (80%)
No defense pattern found
✗ Data Protection (80%)
No defense pattern found
...
OWASP lists Prompt Injection as the #1 threat to LLM applications. Yet most developers ship system prompts with zero defense.
We scanned 1,646 production system prompts from 4 public datasets. Results:
- 97.8% lack indirect injection defense
- 78.3% score F (below 45/100)
- Average score: 36/100
Existing security tools require LLM calls (expensive, non-deterministic) or cloud services (privacy concerns). This package runs locally, instantly, for free.
Our philosophy: The deterministic engine is the product. AI deep analysis is optional — because regex is already strong enough for 90%+ of use cases. Zero AI cost by default.
npm install prompt-defense-auditimport { audit, auditWithDetails } from 'prompt-defense-audit'
// Quick audit
const result = audit('You are a helpful assistant.')
console.log(result.grade) // 'F'
console.log(result.score) // 8
console.log(result.missing) // ['instruction-override', 'data-leakage', ...]
// Detailed audit with per-vector evidence
const detailed = auditWithDetails(mySystemPrompt)
for (const check of detailed.checks) {
console.log(`${check.defended ? '✅' : '❌'} ${check.name}: ${check.evidence}`)
}# Inline prompt
npx prompt-defense-audit "You are a helpful assistant."
# From file
npx prompt-defense-audit --file my-prompt.txt
# Pipe from stdin
cat prompt.txt | npx prompt-defense-audit
# JSON output (for CI/CD)
npx prompt-defense-audit --json "Your prompt"
# Traditional Chinese output
npx prompt-defense-audit --zh "你的系統提示"
# List all 12 attack vectors
npx prompt-defense-audit --vectorsGRADE=$(npx prompt-defense-audit --json --file prompt.txt | node -e "
const r = JSON.parse(require('fs').readFileSync('/dev/stdin','utf8'));
console.log(r.grade);
")
if [[ "$GRADE" == "D" || "$GRADE" == "F" ]]; then
echo "Prompt defense audit failed: grade $GRADE"
exit 1
fiBased on OWASP LLM Top 10 and empirical research on 1,646 production prompts:
| # | Vector | What it checks | Gap rate* |
|---|---|---|---|
| 1 | Role Escape | Role definition + boundary enforcement | 92.4% |
| 2 | Instruction Override | Refusal clauses + meta-instruction protection | — |
| 3 | Data Leakage | System prompt / training data disclosure prevention | 9.4% |
| 4 | Output Manipulation | Output format restrictions | 88.3% |
| 5 | Multi-language Bypass | Language-specific defense | 64.3% |
| 6 | Unicode Attacks | Homoglyph / zero-width character detection | — |
| 7 | Context Overflow | Input length limits | — |
| 8 | Indirect Injection | External data validation | 97.8% |
| 9 | Social Engineering | Emotional manipulation resistance | 71.4% |
| 10 | Output Weaponization | Harmful content generation prevention | — |
| 11 | Abuse Prevention | Rate limiting / auth awareness | — |
| 12 | Input Validation | XSS / SQL injection / sanitization | — |
*Gap rate = % of 1,646 production prompts missing this defense. Source: research data.
| Grade | Score | Meaning |
|---|---|---|
| A | 90–100 | Strong defense coverage |
| B | 70–89 | Good, some gaps |
| C | 50–69 | Moderate, significant gaps |
| D | 30–49 | Weak, most defenses missing |
| F | 0–29 | Critical, nearly undefended |
Quick audit. Returns grade, score, and list of missing defense IDs.
interface AuditResult {
grade: 'A' | 'B' | 'C' | 'D' | 'F'
score: number // 0-100
coverage: string // e.g. "4/12"
defended: number // count of defended vectors
total: number // 12
missing: string[] // IDs of undefended vectors
}Full audit with per-vector evidence.
interface AuditDetailedResult extends AuditResult {
checks: DefenseCheck[]
unicodeIssues: { found: boolean; evidence: string }
}
interface DefenseCheck {
id: string
name: string // English
nameZh: string // 繁體中文
defended: boolean
confidence: number // 0-1
evidence: string // Human-readable explanation
}Array of all 12 attack vector definitions with bilingual names and descriptions.
- Parses the system prompt text
- For each of 12 attack vectors, applies regex patterns that detect defensive language
- A defense is "present" when enough patterns match (usually >= 1, some require >= 2)
- Checks for suspicious Unicode characters embedded in the prompt
- Calculates coverage score and assigns a letter grade
This tool does NOT:
- Send your prompt to any external service
- Use LLM calls (100% regex-based)
- Guarantee security (it checks for defensive language, not runtime behavior)
- Replace penetration testing or behavioral evaluation
- Regex-based detection is heuristic — a prompt can contain defensive language but still be vulnerable at runtime. This tool measures intent to defend, not actual defense effectiveness.
- Only checks system prompt text, not model behavior under adversarial pressure.
- English and Traditional Chinese patterns only (contributions welcome for other languages).
- False positives/negatives are possible. See research data for calibration details.
- Fullwidth CJK punctuation (e.g.
,) triggers Unicode detection — known limitation.
This tool is backed by empirical analysis of 1,646 production system prompts from 4 public datasets:
| Dataset | Size | Source |
|---|---|---|
| LouisShark/chatgpt_system_prompt | 1,389 | GPT Store custom GPTs |
| jujumilk3/leaked-system-prompts | 121 | ChatGPT, Claude, Grok, Perplexity, Cursor, v0 |
| x1xhlol/system-prompts-and-models | 80 | Cursor, Windsurf, Devin, Augment |
| elder-plinius/CL4R1T4S | 56 | Claude, Gemini, Grok, Cursor |
Key references:
- Greshake et al. (2023), Not what you've signed up for — indirect prompt injection
- Schulhoff et al. (2023), Ignore This Title and HackAPrompt — prompt injection taxonomy
- OWASP LLM Top 10 (2025)
See CONTRIBUTING.md. Key areas: new language patterns, better regex accuracy, integration examples.
See SECURITY.md. Report vulnerabilities to [email protected] — not via GitHub issues.
MIT — Ultra Lab
This library powers the Prompt Security mode of UltraProbe — a free AI security scanner.
- OWASP LLM Top 10
- UltraProbe — Free AI security scanner (uses this library)
- ultralab-scanners — SEO + AEO scanners