prompt-defense-audit

Deterministic LLM prompt defense scanner. Checks system prompts for missing defenses against 12 attack vectors. Pure regex — no LLM, no API calls, < 5ms, 100% reproducible.

繁體中文版

$ npx prompt-defense-audit "You are a helpful assistant."

  Grade: F  (8/100, 1/12 defenses)

  Defense Status:

  ✗ Role Boundary (80%)
    Partial: only 1/2 defense pattern(s)
  ✗ Instruction Boundary (80%)
    No defense pattern found
  ✗ Data Protection (80%)
    No defense pattern found
  ...

Why

OWASP lists Prompt Injection as the #1 threat to LLM applications. Yet most developers ship system prompts with zero defense.

We scanned 1,646 production system prompts from 4 public datasets. Results:

97.8% lack indirect injection defense
78.3% score F (below 45/100)
Average score: 36/100

Existing security tools require LLM calls (expensive, non-deterministic) or cloud services (privacy concerns). This package runs locally, instantly, for free.

Our philosophy: The deterministic engine is the product. AI deep analysis is optional — because regex is already strong enough for 90%+ of use cases. Zero AI cost by default.

Install

npm install prompt-defense-audit

Usage

Programmatic (TypeScript / JavaScript)

import { audit, auditWithDetails } from 'prompt-defense-audit'

// Quick audit
const result = audit('You are a helpful assistant.')
console.log(result.grade)    // 'F'
console.log(result.score)    // 8
console.log(result.missing)  // ['instruction-override', 'data-leakage', ...]

// Detailed audit with per-vector evidence
const detailed = auditWithDetails(mySystemPrompt)
for (const check of detailed.checks) {
  console.log(`${check.defended ? '✅' : '❌'} ${check.name}: ${check.evidence}`)
}

CLI

# Inline prompt
npx prompt-defense-audit "You are a helpful assistant."

# From file
npx prompt-defense-audit --file my-prompt.txt

# Pipe from stdin
cat prompt.txt | npx prompt-defense-audit

# JSON output (for CI/CD)
npx prompt-defense-audit --json "Your prompt"

# Traditional Chinese output
npx prompt-defense-audit --zh "你的系統提示"

# List all 12 attack vectors
npx prompt-defense-audit --vectors

CI/CD Gate

GRADE=$(npx prompt-defense-audit --json --file prompt.txt | node -e "
  const r = JSON.parse(require('fs').readFileSync('/dev/stdin','utf8'));
  console.log(r.grade);
")
if [[ "$GRADE" == "D" || "$GRADE" == "F" ]]; then
  echo "Prompt defense audit failed: grade $GRADE"
  exit 1
fi

12 Attack Vectors

Based on OWASP LLM Top 10 and empirical research on 1,646 production prompts:

#	Vector	What it checks	Gap rate*
1	Role Escape	Role definition + boundary enforcement	92.4%
2	Instruction Override	Refusal clauses + meta-instruction protection	—
3	Data Leakage	System prompt / training data disclosure prevention	9.4%
4	Output Manipulation	Output format restrictions	88.3%
5	Multi-language Bypass	Language-specific defense	64.3%
6	Unicode Attacks	Homoglyph / zero-width character detection	—
7	Context Overflow	Input length limits	—
8	Indirect Injection	External data validation	97.8%
9	Social Engineering	Emotional manipulation resistance	71.4%
10	Output Weaponization	Harmful content generation prevention	—
11	Abuse Prevention	Rate limiting / auth awareness	—
12	Input Validation	XSS / SQL injection / sanitization	—

*Gap rate = % of 1,646 production prompts missing this defense. Source: research data.

Grading

Grade	Score	Meaning
A	90–100	Strong defense coverage
B	70–89	Good, some gaps
C	50–69	Moderate, significant gaps
D	30–49	Weak, most defenses missing
F	0–29	Critical, nearly undefended

API Reference

`audit(prompt: string): AuditResult`

Quick audit. Returns grade, score, and list of missing defense IDs.

interface AuditResult {
  grade: 'A' | 'B' | 'C' | 'D' | 'F'
  score: number       // 0-100
  coverage: string    // e.g. "4/12"
  defended: number    // count of defended vectors
  total: number       // 12
  missing: string[]   // IDs of undefended vectors
}

`auditWithDetails(prompt: string): AuditDetailedResult`

Full audit with per-vector evidence.

interface AuditDetailedResult extends AuditResult {
  checks: DefenseCheck[]
  unicodeIssues: { found: boolean; evidence: string }
}

interface DefenseCheck {
  id: string
  name: string          // English
  nameZh: string        // 繁體中文
  defended: boolean
  confidence: number    // 0-1
  evidence: string      // Human-readable explanation
}

`ATTACK_VECTORS: AttackVector[]`

Array of all 12 attack vector definitions with bilingual names and descriptions.

How It Works

Parses the system prompt text
For each of 12 attack vectors, applies regex patterns that detect defensive language
A defense is "present" when enough patterns match (usually >= 1, some require >= 2)
Checks for suspicious Unicode characters embedded in the prompt
Calculates coverage score and assigns a letter grade

This tool does NOT:

Send your prompt to any external service
Use LLM calls (100% regex-based)
Guarantee security (it checks for defensive language, not runtime behavior)
Replace penetration testing or behavioral evaluation

Limitations

Regex-based detection is heuristic — a prompt can contain defensive language but still be vulnerable at runtime. This tool measures intent to defend, not actual defense effectiveness.
Only checks system prompt text, not model behavior under adversarial pressure.
English and Traditional Chinese patterns only (contributions welcome for other languages).
False positives/negatives are possible. See research data for calibration details.
Fullwidth CJK punctuation (e.g. ，) triggers Unicode detection — known limitation.

Research

This tool is backed by empirical analysis of 1,646 production system prompts from 4 public datasets:

Dataset	Size	Source
LouisShark/chatgpt_system_prompt	1,389	GPT Store custom GPTs
jujumilk3/leaked-system-prompts	121	ChatGPT, Claude, Grok, Perplexity, Cursor, v0
x1xhlol/system-prompts-and-models	80	Cursor, Windsurf, Devin, Augment
elder-plinius/CL4R1T4S	56	Claude, Gemini, Grok, Cursor

Key references:

Greshake et al. (2023), Not what you've signed up for — indirect prompt injection
Schulhoff et al. (2023), Ignore This Title and HackAPrompt — prompt injection taxonomy
OWASP LLM Top 10 (2025)

Contributing

See CONTRIBUTING.md. Key areas: new language patterns, better regex accuracy, integration examples.

Security

See SECURITY.md. Report vulnerabilities to [email protected] — not via GitHub issues.

License

MIT — Ultra Lab

Used In Production

This library powers the Prompt Security mode of UltraProbe — a free AI security scanner.

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
.github		.github
examples		examples
research		research
src		src
test		test
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
README.zh-TW.md		README.zh-TW.md
SECURITY.md		SECURITY.md
package-lock.json		package-lock.json
package.json		package.json
tsconfig.json		tsconfig.json
vitest.config.ts		vitest.config.ts

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

prompt-defense-audit

Why

Install

Usage

Programmatic (TypeScript / JavaScript)

CLI

CI/CD Gate

12 Attack Vectors

Grading

API Reference

`audit(prompt: string): AuditResult`

`auditWithDetails(prompt: string): AuditDetailedResult`

`ATTACK_VECTORS: AttackVector[]`

How It Works

Limitations

Research

Contributing

Security

License

Used In Production

Related

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

prompt-defense-audit

Why

Install

Usage

Programmatic (TypeScript / JavaScript)

CLI

CI/CD Gate

12 Attack Vectors

Grading

API Reference

audit(prompt: string): AuditResult

auditWithDetails(prompt: string): AuditDetailedResult

ATTACK_VECTORS: AttackVector[]

How It Works

Limitations

Research

Contributing

Security

License

Used In Production

Related

About

Topics

Resources

License

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`audit(prompt: string): AuditResult`

`auditWithDetails(prompt: string): AuditDetailedResult`

`ATTACK_VECTORS: AttackVector[]`

Packages