Built-in Rules

Firmis ships with 287 built-in detection rules across 21 threat categories, covering prompt injection, credential harvesting, supply chain attacks, and more.

Summary

Severity	Count
🔴 Critical	82
🟠 High	143
🟡 Medium	59
🟢 Low	3

Access Control

ID	Name	Severity	Confidence	Platforms
`ac-002`	Authentication Bypass Patterns	🔴 Critical	60%	All
`ac-003`	JWT None Algorithm or Weak Signing	🔴 Critical	60%	All
`ac-001`	API Key or Token in URL Query Parameter	🟠 High	55%	All

Rule Details

`ac-002` — Authentication Bypass Patterns

Severity: 🔴 Critical | Category: Access Control | Confidence threshold: 60% | Platforms: All

Detects hardcoded boolean flags and query parameters used to bypass authentication checks in agent code or configurations

Remediation:

Authentication bypass flags are critical vulnerabilities that remove access controls. Remove all hardcoded is_admin, skip_auth, and bypass_auth flags from agent code. Authentication decisions must be made by the identity provider, not boolean flags that can be trivially modified. Use role-based access control (RBAC) instead.

References:

https://owasp.org/www-project-top-10-for-large-language-model-applications/ LLM08 Excessive Permissions
https://atlas.mitre.org/techniques/AML.T0043

`ac-003` — JWT None Algorithm or Weak Signing

Severity: 🔴 Critical | Category: Access Control | Confidence threshold: 60% | Platforms: All

Detects JWT configurations using the ‘none’ algorithm or weak symmetric secrets, enabling token forgery attacks

Remediation:

JWT ‘none’ algorithm allows forging tokens without a valid signature. Always use RS256 or ES256 (asymmetric) for production systems. Never disable JWT verification. Reject tokens with ‘none’ algorithm explicitly. Use cryptographically random secrets of at least 256 bits for HS256.

References:

https://owasp.org/www-project-top-10-for-large-language-model-applications/ LLM08 Excessive Permissions
https://atlas.mitre.org/techniques/AML.T0043

`ac-001` — API Key or Token in URL Query Parameter

Severity: 🟠 High | Category: Access Control | Confidence threshold: 55% | Platforms: All

Detects API keys, tokens, and secrets passed as URL query parameters instead of headers, exposing credentials in logs and browser history

Remediation:

API keys and tokens in URL query parameters are logged by web servers, proxies, CDNs, and browser history in plaintext. Use HTTP Authorization headers or request body parameters instead. Never embed secrets in URLs.

References:

https://owasp.org/www-project-top-10-for-large-language-model-applications/ LLM08 Excessive Permissions
https://atlas.mitre.org/techniques/AML.T0043

Agent Memory Poisoning

ID	Name	Severity	Confidence	Platforms
`kc-006`	SpAIware Persistent Memory Injection	🔴 Critical	50%	All
`mem-003`	Agent Config File Modification	🔴 Critical	50%	All
`aci-002`	Agent Memory Injection via External Write	🟠 High	55%	All
`mem-001`	Agent Memory File Write	🟠 High	60%	All
`mem-002`	Session/Conversation File Access	🟠 High	60%	All
`mem-005`	Copilot Instructions Manipulation	🟠 High	60%	All
`mem-006`	OpenAI Agents Memory Manipulation	🟠 High	60%	All
`mem-007`	Aider Agent Config Manipulation	🟠 High	60%	All
`mem-008`	Memory Injection via Instruction-Like Content (MINJA)	🟠 High	55%	All
`mem-004`	Time-Delayed Execution	🟡 Medium	60%	All
`mem-009`	Inter-Session Message Without Provenance	🟡 Medium	60%	openclaw

Rule Details

`kc-006` — SpAIware Persistent Memory Injection

Severity: 🔴 Critical | Category: Agent Memory Poisoning | Confidence threshold: 50% | Platforms: All

Detects memory tools that auto-execute stored instructions without user approval

Remediation:

Memory tools that auto-execute stored instructions enable persistent injection (SpAIware). Attackers inject instructions that survive across sessions. All memory write operations must require explicit user approval.

`mem-003` — Agent Config File Modification

Severity: 🔴 Critical | Category: Agent Memory Poisoning | Confidence threshold: 50% | Platforms: All

Modifies agent platform config files (.clawdbot/, .openclaw/, .claude/)

Remediation:

Skills must not modify agent platform configuration files. This could inject malicious MCP servers or change security settings.

`aci-002` — Agent Memory Injection via External Write

Severity: 🟠 High | Category: Agent Memory Poisoning | Confidence threshold: 55% | Platforms: All

Detects write operations targeting agent memory files (MEMORY.md, memory/ directories, long-term memory stores, conversation history) from external or untrusted input sources. Attackers inject persistent malicious instructions that survive across sessions.

Remediation:

Agent memory stores must validate the source of all write operations. Implement write-ahead logging for memory modifications. Never allow external inputs to directly modify agent memory without owner verification.

References:

Agents of Chaos (arXiv:2602.20021) — CS10: Non-owner injected constitutional rules into agent memory
OWASP LLM05 (Supply Chain Vulnerabilities)

`mem-001` — Agent Memory File Write

Severity: 🟠 High | Category: Agent Memory Poisoning | Confidence threshold: 60% | Platforms: All

Writes to agent persistent memory files (MEMORY.md, .memories/) — potential memory poisoning

Remediation:

Skills should not modify agent memory files. This could be used to inject persistent malicious instructions that survive across sessions.

`mem-002` — Session/Conversation File Access

Severity: 🟠 High | Category: Agent Memory Poisoning | Confidence threshold: 60% | Platforms: All

Reads agent session or conversation log files — potential data exfiltration

Remediation:

Skills should not read agent session or conversation files. This may be an attempt to exfiltrate conversation data.

`mem-005` — Copilot Instructions Manipulation

Severity: 🟠 High | Category: Agent Memory Poisoning | Confidence threshold: 60% | Platforms: All

Writes to .github/copilot-instructions.md — persistent Copilot behavior injection

Remediation:

Skills should not modify GitHub Copilot instruction files. This could inject persistent malicious behavior into Copilot-assisted development.

`mem-006` — OpenAI Agents Memory Manipulation

Severity: 🟠 High | Category: Agent Memory Poisoning | Confidence threshold: 60% | Platforms: All

Writes to AGENTS.md or .codex/ — OpenAI Codex/Agents persistent memory injection

Remediation:

Skills should not modify OpenAI agent memory files. This could inject persistent malicious instructions.

`mem-007` — Aider Agent Config Manipulation

Severity: 🟠 High | Category: Agent Memory Poisoning | Confidence threshold: 60% | Platforms: All

Writes to .aider/ config or .aider.conf.yml — Aider AI agent manipulation

Remediation:

Skills should not modify Aider AI agent configuration. This could inject malicious instructions or change security settings.

`mem-008` — Memory Injection via Instruction-Like Content (MINJA)

Severity: 🟠 High | Category: Agent Memory Poisoning | Confidence threshold: 55% | Platforms: All

Detects instruction-like content injected into agent memory files — MINJA attack (NeurIPS 2025) achieves 95%+ success rate via query-only interaction

Remediation:

Memory injection (MINJA, NeurIPS 2025) poisons agent persistent memory with instruction-like content that overrides the agent’s behavior on future queries. Memory files should contain only factual data, never behavioral directives. Sanitize memory content by stripping instruction patterns before persisting.

References:

https://arxiv.org/abs/2406.11850

`mem-004` — Time-Delayed Execution

Severity: 🟡 Medium | Category: Agent Memory Poisoning | Confidence threshold: 60% | Platforms: All

Uses time-delayed execution patterns — may be evading real-time analysis

Remediation:

Long time delays in AI agent skills are suspicious. Legitimate skills should execute promptly, not schedule deferred actions.

`mem-009` — Inter-Session Message Without Provenance

Severity: 🟡 Medium | Category: Agent Memory Poisoning | Confidence threshold: 60% | Platforms: openclaw

Detects sessions_send patterns where messages lack provenance markers — GHSA-w5c7

Remediation:

Inter-session messages must carry explicit provenance metadata. Add inputProvenance: { kind: “inter_session”, sessionId: … } to all messages delivered via sessions_send. Without this, a compromised session can inject instructions.

References:

GHSA-w5c7-9qqw-6645

credential-extraction

ID	Name	Severity	Confidence	Platforms
`cex-001`	Browser Cookie Extraction	🔴 Critical	50%	All
`cex-002`	Password Manager Access	🟠 High	50%	All
`cex-003`	Credential File Enumeration	🟠 High	50%	All

Rule Details

`cex-001` — Browser Cookie Extraction

Severity: 🔴 Critical | Category: credential-extraction | Confidence threshold: 50% | Platforms: All

Extracts cookies or session data from browser profiles — credential theft from another app

Remediation:

Extracting credentials from browser cookie stores accesses another application’s authentication material. This is a credential theft vector if triggered by prompt injection. Require explicit operator approval.

`cex-002` — Password Manager Access

Severity: 🟠 High | Category: credential-extraction | Confidence threshold: 50% | Platforms: All

Accesses password managers or OS keychains to extract stored credentials

Remediation:

Accessing password managers from an agent context could leak credentials if the agent is compromised.

`cex-003` — Credential File Enumeration

Severity: 🟠 High | Category: credential-extraction | Confidence threshold: 50% | Platforms: All

Reads or enumerates credential storage files from other applications

Remediation:

Enumerating credential files could expose stored secrets to the agent’s context where they become exfiltrable.

Credential Harvesting

ID	Name	Severity	Confidence	Platforms
`cred-002`	SSH Private Key Access	🔴 Critical	75%	All
`cred-005`	Browser Cookie/Credential Access	🔴 Critical	85%	All
`cred-006`	Keychain/Credential Manager Access	🔴 Critical	80%	All
`cred-015`	Container Environment Variable Theft	🔴 Critical	55%	All
`cred-018`	Python Subprocess Credential Theft	🔴 Critical	70%	All
`cred-020`	Service Role Keys in MCP Config	🔴 Critical	50%	mcp, claude, cursor
`cred-001`	AWS Credentials Access	🟠 High	80%	All
`cred-003`	GCP Service Account Key	🟠 High	80%	All
`cred-007`	Git Credentials Access	🟠 High	75%	All
`cred-008`	NPM Token Access	🟠 High	80%	All
`cred-009`	Docker Credentials Access	🟠 High	80%	All
`cred-010`	Kubernetes Credentials Access	🟠 High	80%	All
`cred-011`	API Key in Config	🟠 High	50%	All
`cred-012`	Azure CLI Credentials Access	🟠 High	70%	All
`cred-013`	AWS SSO Token Cache Access	🟠 High	70%	All
`cred-014`	Vault Token File Access	🟠 High	70%	All
`cred-016`	Python Pathlib Credential Access	🟠 High	70%	All
`cred-017`	Python Open Credential File	🟠 High	70%	All
`cred-019`	API Base URL Override for Key Exfiltration	🟠 High	55%	All
`kc-011`	Environment Variable Serialization to File	🟠 High	55%	All
`kc-012`	Credential Staging to Temp File	🟠 High	55%	All
`adv-004`	Credential Path via path.join or homedir	🟡 Medium	55%	All
`cred-004`	Environment Variable Harvesting	🟡 Medium	70%	All

Rule Details

`cred-002` — SSH Private Key Access

Severity: 🔴 Critical | Category: Credential Harvesting | Confidence threshold: 75% | Platforms: All

Detects access to SSH private keys

Remediation:

SSH keys should never be accessed by AI agents. Use SSH agent forwarding or API-based access.

`cred-005` — Browser Cookie/Credential Access

Severity: 🔴 Critical | Category: Credential Harvesting | Confidence threshold: 85% | Platforms: All

Detects access to browser credential stores

Remediation:

Never access browser credential stores. This is highly suspicious behavior.

`cred-006` — Keychain/Credential Manager Access

Severity: 🔴 Critical | Category: Credential Harvesting | Confidence threshold: 80% | Platforms: All

Detects access to OS credential managers

Remediation:

Do not access OS credential managers directly. Request credentials through secure channels.

`cred-015` — Container Environment Variable Theft

Severity: 🔴 Critical | Category: Credential Harvesting | Confidence threshold: 55% | Platforms: All

Detects reading /proc/1/environ to steal container credentials

Remediation:

Reading /proc/*/environ exposes all environment variables including secrets injected by container orchestrators. Use the runtime’s secret management instead.

`cred-018` — Python Subprocess Credential Theft

Severity: 🔴 Critical | Category: Credential Harvesting | Confidence threshold: 70% | Platforms: All

Detects Python subprocess calls targeting credential stores

Remediation:

Do not use subprocess to access credential stores. Use official SDKs with proper authentication.

`cred-020` — Service Role Keys in MCP Config

Severity: 🔴 Critical | Category: Credential Harvesting | Confidence threshold: 50% | Platforms: mcp, claude, cursor

Detects Supabase service_role keys or admin-level secrets passed directly in MCP server configurations

Remediation:

Service role keys and admin secrets must never be passed directly in MCP server configurations. These keys bypass Row Level Security and grant full database access. Use environment variables with restricted scopes and anon keys for client-side MCP servers.

`cred-001` — AWS Credentials Access

Severity: 🟠 High | Category: Credential Harvesting | Confidence threshold: 80% | Platforms: All

Detects access to AWS credential files

Remediation:

Remove direct access to AWS credentials. Use environment variables or IAM roles instead.

`cred-003` — GCP Service Account Key

Severity: 🟠 High | Category: Credential Harvesting | Confidence threshold: 80% | Platforms: All

Detects access to Google Cloud service account keys

Remediation:

Use Workload Identity or Application Default Credentials instead of service account keys.

`cred-007` — Git Credentials Access

Severity: 🟠 High | Category: Credential Harvesting | Confidence threshold: 75% | Platforms: All

Detects access to Git credential storage

Remediation:

Use Git credential helpers or SSH keys instead of accessing credential files directly.

`cred-008` — NPM Token Access

Severity: 🟠 High | Category: Credential Harvesting | Confidence threshold: 80% | Platforms: All

Detects access to NPM authentication tokens

Remediation:

Use npm login or CI/CD secret management instead of embedding tokens.

`cred-009` — Docker Credentials Access

Severity: 🟠 High | Category: Credential Harvesting | Confidence threshold: 80% | Platforms: All

Detects access to Docker authentication

Remediation:

Use Docker credential helpers instead of storing credentials in config.json.

`cred-010` — Kubernetes Credentials Access

Severity: 🟠 High | Category: Credential Harvesting | Confidence threshold: 80% | Platforms: All

Detects access to Kubernetes configs

Remediation:

Use RBAC and service accounts instead of accessing kubeconfig directly.

`cred-011` — API Key in Config

Severity: 🟠 High | Category: Credential Harvesting | Confidence threshold: 50% | Platforms: All

Detects API keys and tokens hardcoded in configuration files

Remediation:

Never hardcode API keys or tokens. Use environment variables, secrets managers, or credential vaults.

`cred-012` — Azure CLI Credentials Access

Severity: 🟠 High | Category: Credential Harvesting | Confidence threshold: 70% | Platforms: All

Detects access to Azure CLI credential files

Remediation:

Remove direct access to Azure CLI credentials. Use managed identities or service principals with proper RBAC.

`cred-013` — AWS SSO Token Cache Access

Severity: 🟠 High | Category: Credential Harvesting | Confidence threshold: 70% | Platforms: All

Detects access to AWS SSO cached tokens

Remediation:

Remove direct access to AWS SSO token cache. Use the AWS SDK with proper credential providers.

`cred-014` — Vault Token File Access

Severity: 🟠 High | Category: Credential Harvesting | Confidence threshold: 70% | Platforms: All

Detects access to HashiCorp Vault token files

Remediation:

Remove direct access to Vault token files. Use AppRole or Kubernetes auth methods for automated credential retrieval.

`cred-016` — Python Pathlib Credential Access

Severity: 🟠 High | Category: Credential Harvesting | Confidence threshold: 70% | Platforms: All

Detects Python pathlib-based access to credential files using the / operator

Remediation:

Do not construct paths to credential files using Python pathlib or os.path. Use environment variables or credential providers.

`cred-017` — Python Open Credential File

Severity: 🟠 High | Category: Credential Harvesting | Confidence threshold: 70% | Platforms: All

Detects Python open() calls targeting credential files

Remediation:

Do not open credential files directly. Use credential providers or environment variables.

`cred-019` — API Base URL Override for Key Exfiltration

Severity: 🟠 High | Category: Credential Harvesting | Confidence threshold: 55% | Platforms: All

Detects ANTHROPIC_BASE_URL / OPENAI_BASE_URL overrides that redirect API calls (with auth keys) to attacker-controlled endpoints — CVE-2026-21852

Remediation:

Overriding API base URLs redirects all API traffic (including auth headers with API keys) to a potentially malicious endpoint. Only use official API endpoints. CVE-2026-21852 demonstrated this attack vector for Anthropic API key theft.

References:

CVE-2026-21852

`kc-011` — Environment Variable Serialization to File

Severity: 🟠 High | Category: Credential Harvesting | Confidence threshold: 55% | Platforms: All

Detects full process.env serialization written to log or temp files

Remediation:

Serializing the entire process.env to a file exposes all environment variables including API keys, database credentials, and tokens. Only log specific, non-sensitive configuration values.

`kc-012` — Credential Staging to Temp File

Severity: 🟠 High | Category: Credential Harvesting | Confidence threshold: 55% | Platforms: All

Detects encoding and writing of credentials to temporary files for later exfiltration

Remediation:

Encoding credentials and writing them to temp files is a staging technique for later exfiltration. Credentials should never be written to disk in any form.

`adv-004` — Credential Path via path.join or homedir

Severity: 🟡 Medium | Category: Credential Harvesting | Confidence threshold: 55% | Platforms: All

Detects credential file access constructed via path.join(homedir(), ‘.ssh’) pattern — evades static file-access rules

Remediation:

Constructing credential paths via path.join(homedir(), ‘.ssh’) evades static file path detection rules. This is functionally identical to accessing ~/.ssh/id_rsa. AI agents should never access credential directories regardless of path construction method.

`cred-004` — Environment Variable Harvesting

Severity: 🟡 Medium | Category: Credential Harvesting | Confidence threshold: 70% | Platforms: All

Detects bulk enumeration or targeted access to sensitive environment variables

Remediation:

Only access specific, required environment variables. Never serialize the entire environment.

cross-agent-propagation

ID	Name	Severity	Confidence	Platforms
`kc-007`	Cross-Repo Agent Propagation	🔴 Critical	50%	All
`mat-002`	Missing Authority Verification	🔴 Critical	60%	All
`mat-001`	Cross-Agent Trust Without Verification	🟠 High	55%	All

Rule Details

`kc-007` — Cross-Repo Agent Propagation

Severity: 🔴 Critical | Category: cross-agent-propagation | Confidence threshold: 50% | Platforms: All

Detects skills or tools that modify agent config files across repositories and push changes

Remediation:

Skills that modify CLAUDE.md or copilot-instructions.md and push to remote repos can spread malicious instructions across projects (AgentHopper attack). Never allow automated modification of agent instruction files.

`mat-002` — Missing Authority Verification

Severity: 🔴 Critical | Category: cross-agent-propagation | Confidence threshold: 60% | Platforms: All

Agent configurations with no owner or authority verification, allowing any caller to invoke tools or access agent state

Remediation:

Every agent must verify the identity and authority of input sources. Implement role-based access control (RBAC) for all agent interactions. Define and enforce an owner/authority hierarchy in agent configuration. MCP servers must require authentication tokens for all tool invocations.

References:

Agents of Chaos (arXiv:2602.20021) — CS2: Agent returned confidential data to non-owner
Agents of Chaos — CS8: Attacker impersonated owner with username change
OWASP LLM01 (Prompt Injection)
MITRE ATLAS AML.T0051

`mat-001` — Cross-Agent Trust Without Verification

Severity: 🟠 High | Category: cross-agent-propagation | Confidence threshold: 55% | Platforms: All

Multi-agent configs where agents can modify each other”s state without mutual authentication or identity verification

Remediation:

Multi-agent systems must implement mutual authentication between agents. Never allow one agent to modify another agent”s state, memory, or configuration. Use signed messages for inter-agent communication and verify sender identity.

References:

Agents of Chaos (arXiv:2602.20021) — CS10: Corrupted agent removed server members
Agents of Chaos — CS11: Agent broadcast false accusations to 52+ agents
MITRE ATLAS AML.T0048

Data Exfiltration

ID	Name	Severity	Confidence	Platforms
`adv-001`	Process Environment in HTTP Body	🔴 Critical	50%	All
`exfil-011`	Cloud Metadata Service Access (IMDS/SSRF)	🔴 Critical	50%	All
`adv-006`	Base64-Decoded Network Hostname	🟠 High	55%	All
`adv-015`	Suspicious MCP Server Environment Variables	🟠 High	50%	All
`exfil-001`	Suspicious External HTTP Request	🟠 High	70%	All
`exfil-003`	File Upload to External Service	🟠 High	75%	All
`exfil-004`	DNS Exfiltration Pattern	🟠 High	80%	All
`exfil-006`	Screenshot Capture	🟠 High	80%	All
`exfil-008`	Archive Creation Before Upload	🟠 High	75%	All
`exfil-012`	WebSocket Exfiltration	🟠 High	70%	All
`kc-008`	DNS Exfiltration via Encoded Subdomain	🟠 High	55%	All
`kc-009`	Render-Based Data Exfiltration	🟠 High	55%	All
`kc-010`	Clipboard Content Exfiltration	🟠 High	55%	All
`adv-012`	String Concatenation URL Construction	🟡 Medium	60%	All
`exfil-002`	Base64 Encoded Data Transmission	🟡 Medium	55%	All
`exfil-005`	Clipboard Data Access	🟡 Medium	70%	All
`exfil-007`	Bulk File Read Pattern	🟡 Medium	65%	All
`exfil-009`	Webhook Data Transmission	🟡 Medium	70%	All
`exfil-010`	Email Data Transmission	🟡 Medium	70%	All

Rule Details

`adv-001` — Process Environment in HTTP Body

Severity: 🔴 Critical | Category: Data Exfiltration | Confidence threshold: 50% | Platforms: All

Detects process.env passed directly in fetch/request body — common exfiltration of all environment variables

Remediation:

Passing the entire process.env in an HTTP request body exfiltrates all environment variables including secrets, tokens, and API keys. Access only specific required variables and never transmit the entire environment.

`exfil-011` — Cloud Metadata Service Access (IMDS/SSRF)

Severity: 🔴 Critical | Category: Data Exfiltration | Confidence threshold: 50% | Platforms: All

Detects access to cloud instance metadata services for credential theft

Remediation:

Cloud metadata service access from agent code is extremely suspicious. This is the primary vector for SSRF-to-credential-theft in cloud environments. Agents should never access instance metadata endpoints directly.

`adv-006` — Base64-Decoded Network Hostname

Severity: 🟠 High | Category: Data Exfiltration | Confidence threshold: 55% | Platforms: All

Detects hostnames decoded from base64 at runtime — obfuscated network destination

Remediation:

Decoding hostnames from base64 at runtime hides the actual network destination from static analysis. This is used to evade URL pattern matching rules. All network destinations should be clearly visible in source code.

`adv-015` — Suspicious MCP Server Environment Variables

Severity: 🟠 High | Category: Data Exfiltration | Confidence threshold: 50% | Platforms: All

Detects MCP server config with environment variables that enable data forwarding or credential inclusion

Remediation:

MCP server environment variables like INCLUDE_ENV=true and FORWARD_HEADERS=true instruct the server to include all environment variables or authentication headers in outbound requests. This silently exfiltrates secrets to upstream endpoints. Only pass specific, required environment variables to MCP servers.

`exfil-001` — Suspicious External HTTP Request

Severity: 🟠 High | Category: Data Exfiltration | Confidence threshold: 70% | Platforms: All

Detects HTTP requests to suspicious TLDs or tunneling services

Remediation:

Review all external HTTP requests. Ensure they go to legitimate, expected endpoints.

`exfil-003` — File Upload to External Service

Severity: 🟠 High | Category: Data Exfiltration | Confidence threshold: 75% | Platforms: All

Detects file uploads to external services

Remediation:

Review file uploads to external services. Ensure sensitive data is not being exfiltrated.

`exfil-004` — DNS Exfiltration Pattern

Severity: 🟠 High | Category: Data Exfiltration | Confidence threshold: 80% | Platforms: All

Detects potential DNS-based data exfiltration

Remediation:

DNS queries with dynamic subdomains may indicate data exfiltration. Review DNS usage.

`exfil-006` — Screenshot Capture

Severity: 🟠 High | Category: Data Exfiltration | Confidence threshold: 80% | Platforms: All

Detects screenshot capture API calls or library imports

Remediation:

Screenshot capture is highly sensitive. Ensure this is explicitly requested by the user.

`exfil-008` — Archive Creation Before Upload

Severity: 🟠 High | Category: Data Exfiltration | Confidence threshold: 75% | Platforms: All

Detects creating archives before network transmission

Remediation:

Creating archives before upload may indicate bulk data exfiltration. Review carefully.

`exfil-012` — WebSocket Exfiltration

Severity: 🟠 High | Category: Data Exfiltration | Confidence threshold: 70% | Platforms: All

Detects WebSocket connections that may exfiltrate data to external servers

Remediation:

WebSocket connections can maintain persistent channels for data exfiltration. Verify the destination server is trusted and the data being sent is appropriate.

`kc-008` — DNS Exfiltration via Encoded Subdomain

Severity: 🟠 High | Category: Data Exfiltration | Confidence threshold: 55% | Platforms: All

Detects DNS lookup tools where subdomain contains dynamically encoded data

Remediation:

DNS tools that encode data into subdomains can exfiltrate sensitive information through DNS queries that bypass network security controls. CVE-2025-55284 demonstrated this in Claude Code.

`kc-009` — Render-Based Data Exfiltration

Severity: 🟠 High | Category: Data Exfiltration | Confidence threshold: 55% | Platforms: All

Detects analytics pixels or render outputs that encode sensitive data in URLs

Remediation:

Mermaid diagrams, markdown images, and HTML renders can exfiltrate data by encoding it into external URLs. The attacker server receives the data when the image loads.

`kc-010` — Clipboard Content Exfiltration

Severity: 🟠 High | Category: Data Exfiltration | Confidence threshold: 55% | Platforms: All

Detects clipboard access followed by outbound transmission

Remediation:

Clipboard access combined with outbound network requests enables exfiltration of copied passwords, tokens, and sensitive data. Clipboard access should require explicit user consent.

`adv-012` — String Concatenation URL Construction

Severity: 🟡 Medium | Category: Data Exfiltration | Confidence threshold: 60% | Platforms: All

Detects URL construction via string concatenation to evade URL pattern matching

Remediation:

Building URLs via string concatenation (e.g., ‘htt’ + ‘ps://’) is an evasion technique to prevent static scanners from detecting the full URL. Legitimate code should use complete URL strings or well-known URL construction APIs.

`exfil-002` — Base64 Encoded Data Transmission

Severity: 🟡 Medium | Category: Data Exfiltration | Confidence threshold: 55% | Platforms: All

Detects base64 encoding combined with network transmission in the same file

Remediation:

Base64 encoding before transmission may indicate data obfuscation. Review the data being sent.

`exfil-005` — Clipboard Data Access

Severity: 🟡 Medium | Category: Data Exfiltration | Confidence threshold: 70% | Platforms: All

Detects access to clipboard contents

Remediation:

Clipboard access should be minimized. Review why clipboard data is being accessed.

`exfil-007` — Bulk File Read Pattern

Severity: 🟡 Medium | Category: Data Exfiltration | Confidence threshold: 65% | Platforms: All

Detects reading multiple files in rapid succession

Remediation:

Bulk file reading should be scoped to specific directories. Review the access pattern.

`exfil-009` — Webhook Data Transmission

Severity: 🟡 Medium | Category: Data Exfiltration | Confidence threshold: 70% | Platforms: All

Detects data transmission via webhooks

Remediation:

Webhook data transmission should only send expected, non-sensitive data.

`exfil-010` — Email Data Transmission

Severity: 🟡 Medium | Category: Data Exfiltration | Confidence threshold: 70% | Platforms: All

Detects sending data via email

Remediation:

Email transmission should be explicitly requested. Review what data is being sent.

File System Abuse

ID	Name	Severity	Confidence	Platforms
`fs-003`	System Account File Access	🔴 Critical	55%	All
`fs-005`	Kernel Memory Access	🔴 Critical	50%	All
`fs-008`	Temp Directory Code Execution	🔴 Critical	60%	All
`fs-010`	Recursive Directory Deletion	🔴 Critical	55%	All
`fs-001`	/proc Filesystem Enumeration	🟠 High	60%	All
`fs-002`	System Log Manipulation	🟠 High	55%	All
`fs-004`	Symlink Attack	🟠 High	65%	All
`fs-007`	Symlink Attack to Sensitive Files	🟠 High	55%	All
`fs-009`	Audit Log Manipulation	🟠 High	55%	All
`fs-011`	Config Include Path Traversal	🟠 High	60%	openclaw
`fs-012`	Local File Path in Media URL Parameter	🟠 High	60%	openclaw
`fs-006`	Insecure File Permissions	🟡 Medium	65%	All

Rule Details

`fs-003` — System Account File Access

Severity: 🔴 Critical | Category: File System Abuse | Confidence threshold: 55% | Platforms: All

Detects reads of system authentication and authorization files

Remediation:

Reading system account files (/etc/passwd, /etc/shadow, /etc/sudoers, /etc/group) is a strong indicator of credential harvesting or privilege escalation preparation. /etc/shadow contains password hashes and must never be accessed by an AI agent. Remove all access to these files. Use dedicated APIs for any legitimate user lookup needs.

`fs-005` — Kernel Memory Access

Severity: 🔴 Critical | Category: File System Abuse | Confidence threshold: 50% | Platforms: All

Detects access to kernel memory devices and raw memory operations

Remediation:

Access to kernel memory devices (/dev/mem, /dev/kmem, /dev/port) is an extreme security violation that enables arbitrary memory reads, rootkit installation, and kernel-level compromise. mmap with PROT_EXEC is a code injection technique. This code must be removed immediately. No AI agent should ever touch kernel memory.

`fs-008` — Temp Directory Code Execution

Severity: 🔴 Critical | Category: File System Abuse | Confidence threshold: 60% | Platforms: All

Detects patterns of writing executable code to /tmp and then executing it — a classic malware staging technique

Remediation:

Writing code to /tmp and executing it is a standard malware staging technique. /tmp is world-writable and persists across processes, making it ideal for staging payloads. AI agents must never write executable content to temporary directories. Use secure temporary file handling with mode 600 and never execute temp files.

References:

https://owasp.org/www-project-top-10-for-large-language-model-applications/ LLM06 Excessive Agency
https://atlas.mitre.org/techniques/AML.T0049

`fs-010` — Recursive Directory Deletion

Severity: 🔴 Critical | Category: File System Abuse | Confidence threshold: 55% | Platforms: All

Detects recursive deletion commands targeting system or application directories, which can cause irreversible data destruction

Remediation:

Recursive deletion of system or application directories is destructive and irreversible. AI agents must never delete directories recursively without strict path validation. Implement path allowlists for deletion operations. Never allow deletion of paths matching /, ~, $HOME, or well-known system directories.

References:

https://owasp.org/www-project-top-10-for-large-language-model-applications/ LLM06 Excessive Agency
https://atlas.mitre.org/techniques/AML.T0049

`fs-001` — /proc Filesystem Enumeration

Severity: 🟠 High | Category: File System Abuse | Confidence threshold: 60% | Platforms: All

Detects access to /proc filesystem entries used for reconnaissance and credential theft

Remediation:

Access to /proc filesystem entries is a strong indicator of reconnaissance activity. AI agents should never read /proc entries outside of explicitly approved diagnostic tools. For container environments, /proc/1/environ access is a known credential theft vector. Remove all /proc reads and use legitimate APIs for any required system information.

`fs-002` — System Log Manipulation

Severity: 🟠 High | Category: File System Abuse | Confidence threshold: 55% | Platforms: All

Detects reads, writes, or destruction of system log files to cover tracks

Remediation:

System log access or modification is a serious indicator of anti-forensic activity. AI agents must never read, write, truncate, or delete system log files. Disabling syslog or auditd services to evade detection is a critical security event. Remove all log manipulation code and review why the agent requires log access.

`fs-004` — Symlink Attack

Severity: 🟠 High | Category: File System Abuse | Confidence threshold: 65% | Platforms: All

Detects creation of symbolic links pointing to sensitive system paths

Remediation:

Symlink creation targeting sensitive paths (/etc, /root, ~/.ssh, ~/.aws) is a common privilege escalation and path traversal technique. AI agents should never create symlinks without explicit, scoped authorization. Remove symlink creation code and audit the intent behind any file redirection logic.

`fs-007` — Symlink Attack to Sensitive Files

Severity: 🟠 High | Category: File System Abuse | Confidence threshold: 55% | Platforms: All

Detects creation of symbolic links pointing to sensitive system files or directories, enabling path traversal and unauthorized access

Remediation:

Symlinks to credential files (/etc/shadow, ~/.ssh/id_rsa, ~/.aws/credentials) enable path traversal attacks where a process reading an “innocent” path is redirected to a sensitive file. Remove all symlinks to sensitive paths. Ensure tmp directories are on separate filesystems to prevent symlink races.

References:

https://owasp.org/www-project-top-10-for-large-language-model-applications/ LLM06 Excessive Agency
https://atlas.mitre.org/techniques/AML.T0049

`fs-009` — Audit Log Manipulation

Severity: 🟠 High | Category: File System Abuse | Confidence threshold: 55% | Platforms: All

Detects truncation, clearing, or deletion of audit and application log files to destroy forensic evidence

Remediation:

Log manipulation is a critical anti-forensic action. Audit logs are the primary mechanism for detecting and reconstructing security incidents. AI agents must never truncate, delete, or disable logging systems. Implement log integrity controls (append-only, remote syslog) to prevent tampering.

References:

https://owasp.org/www-project-top-10-for-large-language-model-applications/ LLM06 Excessive Agency
https://atlas.mitre.org/techniques/AML.T0049

`fs-011` — Config Include Path Traversal

Severity: 🟠 High | Category: File System Abuse | Confidence threshold: 60% | Platforms: openclaw

Detects $include directives referencing absolute paths or directory traversal — CVE-2026-32061

Remediation:

Validate that all $include paths resolve within the config root after symlink resolution. Reject absolute paths and sequences containing ’../’.

References:

CVE-2026-32061

`fs-012` — Local File Path in Media URL Parameter

Severity: 🟠 High | Category: File System Abuse | Confidence threshold: 60% | Platforms: openclaw

Detects media URL parameters set to local filesystem paths — CVE-2026-26321

Remediation:

Media URL parameters must be validated against approved schemes (https:// only). Local filesystem paths must never be accepted as media sources.

References:

CVE-2026-26321

`fs-006` — Insecure File Permissions

Severity: 🟡 Medium | Category: File System Abuse | Confidence threshold: 65% | Platforms: All

Detects creation of files or directories with world-writable or overly permissive modes

Remediation:

Overly permissive file modes (777, 666) allow any user on the system to read or modify files, undermining access control and enabling privilege escalation. umask(0) is particularly dangerous as it makes all subsequently created files world-accessible. Use the principle of least privilege: apply only the minimum permissions required. Prefer 640 for files and 750 for directories. Never use 777 in production code.

Insecure Configuration

ID	Name	Severity	Confidence	Platforms
`aci-001`	Agent Identity File Tampering	🔴 Critical	60%	All
`ic-002`	SSL/TLS Verification Disabled	🔴 Critical	60%	All
`ic-004`	Claude Code RCE via Malicious Hooks	🔴 Critical	50%	claude
`kc-005`	MCP Config File Injection	🔴 Critical	50%	All
`adv-007`	Wildcard Permission in Skill Definition	🟠 High	50%	All
`ic-003`	Default or Hardcoded Credentials in Config Files	🟠 High	55%	All
`ic-005`	Cursor Auto-Execute on Folder Open	🟠 High	50%	cursor, codex
`ic-006`	Unauthenticated Local WebSocket Endpoint	🟠 High	55%	openclaw, mcp, claude
`ic-001`	Debug Mode Enabled in Production Config	🟡 Medium	50%	All

Rule Details

`aci-001` — Agent Identity File Tampering

Severity: 🔴 Critical | Category: Insecure Configuration | Confidence threshold: 60% | Platforms: All

Detects write or modify operations targeting agent identity and configuration files (SOUL.md, IDENTITY.md, AGENTS.md, BOOTSTRAP.md, USER.md). Attackers overwrite these files to perform identity spoofing or inject constitutional rules.

Remediation:

Agent identity and configuration files (SOUL.md, IDENTITY.md, etc.) must be read-only. Never grant write access to these files via tool definitions or external input channels. Use file system permissions (chmod 444) and validate file integrity with checksums.

References:

Agents of Chaos (arXiv:2602.20021) — CS8: Identity spoofing via IDENTITY.md overwrite
Agents of Chaos — CS10: Constitutional injection via memory file modification

`ic-002` — SSL/TLS Verification Disabled

Severity: 🔴 Critical | Category: Insecure Configuration | Confidence threshold: 60% | Platforms: All

Detects configurations that disable SSL/TLS certificate verification, enabling man-in-the-middle attacks on agent network connections

Remediation:

Disabling SSL/TLS verification allows man-in-the-middle attacks where an attacker intercepts and modifies all HTTPS traffic without detection. This is never acceptable in production code. Remove all verify=False, rejectUnauthorized:false, and InsecureSkipVerify:true configurations. Use a proper CA bundle for self-signed certs.

References:

https://owasp.org/www-project-top-10-for-large-language-model-applications/ LLM08 Excessive Permissions
https://atlas.mitre.org/techniques/AML.T0049

`ic-004` — Claude Code RCE via Malicious Hooks

Severity: 🔴 Critical | Category: Insecure Configuration | Confidence threshold: 50% | Platforms: claude

Detects malicious shell commands in .claude/settings.json hooks — CVE-2025-59536 (CVSS 8.7). Attackers commit poisoned settings that spawn reverse shells or exfiltrate data when Claude Code executes hooks.

Remediation:

Malicious .claude/settings.json hooks can execute arbitrary commands when Claude Code runs (CVE-2025-59536). Never commit .claude/settings.json to shared repos. Audit all hook commands for suspicious patterns: curl/wget piping to shell, base64 decoding, reverse shells, or backgrounded processes.

References:

CVE-2025-59536

`kc-005` — MCP Config File Injection

Severity: 🔴 Critical | Category: Insecure Configuration | Confidence threshold: 50% | Platforms: All

Detects tools that write to .mcp.json to add new MCP servers — potential supply chain injection

Remediation:

Writing to .mcp.json programmatically can inject attacker-controlled MCP servers into the agent toolchain. MCP server configuration should only be modified by the user directly, never by tools or scripts.

`adv-007` — Wildcard Permission in Skill Definition

Severity: 🟠 High | Category: Insecure Configuration | Confidence threshold: 50% | Platforms: All

Detects wildcard permissions (shell:, filesystem:, network:*) in skill or tool definitions

Remediation:

Wildcard permissions grant unrestricted access. A weather tool should not need shell:* or filesystem:*. Request only the minimal permissions required for the tool’s stated purpose (e.g., network:read for a weather API tool).

`ic-003` — Default or Hardcoded Credentials in Config Files

Severity: 🟠 High | Category: Insecure Configuration | Confidence threshold: 55% | Platforms: All

Detects default, well-known, or hardcoded credentials in configuration files that should use secrets management instead

Remediation:

Hardcoded credentials in configuration files are a critical security risk. They are committed to version control, visible to all team members, and cannot be rotated without code changes. Use environment variables, secrets managers (Vault, AWS Secrets Manager, Azure Key Vault), or .env files (gitignored). Rotate all credentials that may have been exposed in version history.

References:

https://owasp.org/www-project-top-10-for-large-language-model-applications/ LLM08 Excessive Permissions
https://atlas.mitre.org/techniques/AML.T0043

`ic-005` — Cursor Auto-Execute on Folder Open

Severity: 🟠 High | Category: Insecure Configuration | Confidence threshold: 50% | Platforms: cursor, codex

Detects .vscode/tasks.json with runOn:folderOpen that auto-executes shell commands when a project is opened in Cursor/VS Code

Remediation:

Tasks with runOn:folderOpen execute automatically when a project is opened. Attackers commit malicious .vscode/tasks.json files that run arbitrary commands without user interaction. Remove runOn:folderOpen from untrusted projects and review all task commands.

References:

CVE-2025-59944

`ic-006` — Unauthenticated Local WebSocket Endpoint

Severity: 🟠 High | Category: Insecure Configuration | Confidence threshold: 55% | Platforms: openclaw, mcp, claude

Detects local WebSocket/HTTP server configs bound to loopback without auth — GHSA-qpjj

Remediation:

Loopback-only binding is insufficient. Any website can initiate WebSocket connections to localhost. Require a shared secret token on every WebSocket upgrade request.

`ic-001` — Debug Mode Enabled in Production Config

Severity: 🟡 Medium | Category: Insecure Configuration | Confidence threshold: 50% | Platforms: All

Detects debug mode flags enabled in application or agent configurations, which expose stack traces, internal state, and disable security controls

Remediation:

Debug mode exposes detailed error messages, stack traces, and internal state that attackers can use to understand application structure and find vulnerabilities. In production: set DEBUG=false, NODE_ENV=production, and disable verbose error pages. Use structured logging to capture diagnostic information without exposing it to end users.

References:

https://owasp.org/www-project-top-10-for-large-language-model-applications/ LLM08 Excessive Permissions

Known Malicious Patterns

ID	Name	Severity	Confidence	Platforms
`mal-infra-001`	Known Malicious C2/Exfiltration Infrastructure	🔴 Critical	30%	All
`mal-infra-002`	Known Malicious GitHub Resources	🔴 Critical	30%	All
`mal-sandworm-001`	SANDWORM MCP Config Injection	🔴 Critical	40%	All
`mal-skill-001`	Known Malicious Skill Name (Programmatic Campaign)	🔴 Critical	30%	openclaw
`mal-skill-002`	Known Malicious Skill (Unicode Contraband / DAN Jailbreaks)	🔴 Critical	30%	openclaw
`mal-skill-003`	Known Malicious Skill (Credential Harvesting)	🔴 Critical	30%	openclaw
`mal-skill-004`	ClawHavoc Campaign Skills	🔴 Critical	30%	openclaw
`mal-skill-005`	ClawHavoc YouTube Imitation Skills	🔴 Critical	30%	openclaw
`mal-typo-001`	ClawHub Typosquatting Pattern	🔴 Critical	30%	All
`yara-001`	Obfuscated Base64 Payload	🔴 Critical	40%	All
`yara-002`	Reverse Shell Pattern	🔴 Critical	40%	All
`yara-003`	Credential Stealer Signature	🔴 Critical	40%	All
`yara-006`	RAT/Backdoor Pattern	🔴 Critical	40%	All
`mal-author-001`	Known Malicious Author	🟠 High	30%	openclaw
`mal-updater-001`	Fake Auto-Updater Skill	🟠 High	40%	openclaw
`yara-005`	Coin Miner Signature	🟠 High	40%	All

Rule Details

`mal-infra-001` — Known Malicious C2/Exfiltration Infrastructure

Severity: 🔴 Critical | Category: Known Malicious Patterns | Confidence threshold: 30% | Platforms: All

Code references known malicious command-and-control servers or exfiltration endpoints

Remediation:

This code communicates with known malicious infrastructure. Remove the skill and investigate potential data exfiltration.

`mal-infra-002` — Known Malicious GitHub Resources

Severity: 🔴 Critical | Category: Known Malicious Patterns | Confidence threshold: 30% | Platforms: All

References to GitHub repositories known to host malware payloads

Remediation:

This references a known malware distribution point. Remove the skill and scan your system for compromise indicators.

`mal-sandworm-001` — SANDWORM MCP Config Injection

Severity: 🔴 Critical | Category: Known Malicious Patterns | Confidence threshold: 40% | Platforms: All

Detects MCP server injection patterns used by the SANDWORM_MODE worm to persist in Claude, Cursor, and Continue IDE configs

Remediation:

The SANDWORM worm injects malicious MCP servers into IDE configs (~/.claude/, ~/.cursor/, ~/.continue/) to maintain persistence. If you see unexpected MCP server entries, remove them and audit your npm packages for postinstall scripts that modify IDE configs.

References:

https://socket.dev/blog/sandworm-mode-ai-worm

`mal-skill-001` — Known Malicious Skill Name (Programmatic Campaign)

Severity: 🔴 Critical | Category: Known Malicious Patterns | Confidence threshold: 30% | Platforms: openclaw

Skill matches a known malicious skill from the zaycv/Aslaep123 campaigns: programmatic malware distribution via ClawHub

Remediation:

Remove this skill immediately. It is a confirmed malicious package from a known attacker campaign. Report to ClawHub/OpenClaw security team.

`mal-skill-002` — Known Malicious Skill (Unicode Contraband / DAN Jailbreaks)

Severity: 🔴 Critical | Category: Known Malicious Patterns | Confidence threshold: 30% | Platforms: openclaw

Skill matches known malicious packages using Unicode contraband and DAN jailbreak techniques

Remediation:

Remove this skill immediately. Uses Unicode contraband to hide malicious instructions and DAN jailbreaks to bypass safety.

`mal-skill-003` — Known Malicious Skill (Credential Harvesting)

Severity: 🔴 Critical | Category: Known Malicious Patterns | Confidence threshold: 30% | Platforms: openclaw

Skill matches known packages that harvest credentials, credit cards, or session data

Remediation:

Remove this skill immediately. It is a confirmed credential-harvesting or data-theft package.

`mal-skill-004` — ClawHavoc Campaign Skills

Severity: 🔴 Critical | Category: Known Malicious Patterns | Confidence threshold: 30% | Platforms: openclaw

Skill matches known ClawHavoc campaign: reverse shells, direct exfiltration, and YouTube imitation skills

Remediation:

Remove this skill immediately. Part of the ClawHavoc malware campaign with reverse shell and exfiltration capabilities.

`mal-skill-005` — ClawHavoc YouTube Imitation Skills

Severity: 🔴 Critical | Category: Known Malicious Patterns | Confidence threshold: 30% | Platforms: openclaw

Skill impersonates YouTube utilities to deliver malware

Remediation:

Remove this skill. It impersonates a YouTube utility to deliver malicious payloads.

`mal-typo-001` — ClawHub Typosquatting Pattern

Severity: 🔴 Critical | Category: Known Malicious Patterns | Confidence threshold: 30% | Platforms: All

Detects typosquatted variations of ‘clawhub’ used in malware campaigns

Remediation:

This is a typosquatted version of ClawHub, a known malware distribution technique. Remove the skill and verify your package sources.

`yara-001` — Obfuscated Base64 Payload

Severity: 🔴 Critical | Category: Known Malicious Patterns | Confidence threshold: 40% | Platforms: All

Detects base64 decode combined with dynamic code execution — multi-layer obfuscation

Remediation:

No remediation guidance available.

`yara-002` — Reverse Shell Pattern

Severity: 🔴 Critical | Category: Known Malicious Patterns | Confidence threshold: 40% | Platforms: All

Detects classic reverse shell byte patterns across languages

Remediation:

No remediation guidance available.

`yara-003` — Credential Stealer Signature

Severity: 🔴 Critical | Category: Known Malicious Patterns | Confidence threshold: 40% | Platforms: All

Detects high-risk credential file access combined with data exfiltration to suspicious targets

Remediation:

No remediation guidance available.

`yara-006` — RAT/Backdoor Pattern

Severity: 🔴 Critical | Category: Known Malicious Patterns | Confidence threshold: 40% | Platforms: All

Detects remote access trojan and backdoor communication patterns

Remediation:

No remediation guidance available.

`mal-author-001` — Known Malicious Author

Severity: 🟠 High | Category: Known Malicious Patterns | Confidence threshold: 30% | Platforms: openclaw

Content authored by a known malicious actor who has published 40+ confirmed malicious skills

Remediation:

Skills by this author should be treated as malicious. Remove immediately and audit your system for compromise.

`mal-updater-001` — Fake Auto-Updater Skill

Severity: 🟠 High | Category: Known Malicious Patterns | Confidence threshold: 40% | Platforms: openclaw

Detects skills masquerading as auto-updaters, a common malware delivery mechanism

Remediation:

Legitimate AI skills do not auto-update themselves. This is likely a malware delivery mechanism. Remove immediately.

`yara-005` — Coin Miner Signature

Severity: 🟠 High | Category: Known Malicious Patterns | Confidence threshold: 40% | Platforms: All

Detects cryptocurrency mining code and configuration patterns

Remediation:

No remediation guidance available.

Malware Distribution

ID	Name	Severity	Confidence	Platforms
`adv-013`	Remote Code Fetch and Execute	🔴 Critical	50%	All
`malware-002`	Password-Protected Archive Extraction	🔴 Critical	50%	All
`malware-003`	Base64-Encoded Command Execution	🔴 Critical	50%	All
`malware-004`	Remote Script Piping	🔴 Critical	40%	All
`malware-001`	Remote Archive Download	🟠 High	60%	All
`malware-005`	System Service Manipulation	🟠 High	60%	All
`malware-006`	Fake Prerequisite Installation Instructions	🟡 Medium	60%	All

Rule Details

`adv-013` — Remote Code Fetch and Execute

Severity: 🔴 Critical | Category: Malware Distribution | Confidence threshold: 50% | Platforms: All

Detects fetching remote content and executing it via eval/Function — delayed payload delivery

Remediation:

Fetching content from a remote URL and executing it via new Function() or eval() is a classic delayed payload delivery mechanism. The code appears clean at scan time but loads malicious payloads from attacker-controlled servers at runtime.

`malware-002` — Password-Protected Archive Extraction

Severity: 🔴 Critical | Category: Malware Distribution | Confidence threshold: 50% | Platforms: All

Extracts password-protected archives — used to evade static analysis

Remediation:

Password-protected archives are commonly used to evade antivirus and static analysis. This is highly suspicious in an AI agent context.

`malware-003` — Base64-Encoded Command Execution

Severity: 🔴 Critical | Category: Malware Distribution | Confidence threshold: 50% | Platforms: All

Executes base64-encoded commands — used to obfuscate malicious payloads

Remediation:

Base64-encoded execution is a classic obfuscation technique. Decode and review the payload before allowing this skill.

`malware-004` — Remote Script Piping

Severity: 🔴 Critical | Category: Malware Distribution | Confidence threshold: 40% | Platforms: All

Pipes remote content directly to shell execution — classic malware delivery

Remediation:

Never pipe remote content directly to a shell interpreter. Download, verify, then execute separately.

`malware-001` — Remote Archive Download

Severity: 🟠 High | Category: Malware Distribution | Confidence threshold: 60% | Platforms: All

Downloads archive files from GitHub releases or remote URLs — common malware delivery vector

Remediation:

Downloading archives from remote URLs is a common malware delivery technique. Verify the source and use package managers instead.

`malware-005` — System Service Manipulation

Severity: 🟠 High | Category: Malware Distribution | Confidence threshold: 60% | Platforms: All

Modifies system services or daemons — potential persistence mechanism

Remediation:

AI agent skills should not manipulate system services. This may indicate a persistence mechanism.

`malware-006` — Fake Prerequisite Installation Instructions

Severity: 🟡 Medium | Category: Malware Distribution | Confidence threshold: 60% | Platforms: All

Skill documentation instructs users to run suspicious installation commands

Remediation:

Review installation instructions carefully. Legitimate skills should not require manual downloads from unknown sources.

Network Abuse

ID	Name	Severity	Confidence	Platforms
`na-007`	Reverse Shell Patterns	🔴 Critical	60%	All
`net-001`	Bind Shell	🔴 Critical	60%	All
`na-006`	DNS Exfiltration via Long Subdomain Queries	🟠 High	55%	All
`na-008`	Cryptocurrency Mining Endpoints	🟠 High	60%	All
`na-011`	MCP SSRF — Internal Network Access via Tool Parameters	🟠 High	55%	mcp, claude, cursor
`na-012`	Unrestricted gatewayUrl Override (SSRF)	🟠 High	65%	openclaw, mcp
`na-013`	Browser CDP Relay Without Auth	🟠 High	60%	openclaw
`na-014`	Dangerous URL Scheme in Browser Navigation	🟠 High	65%	openclaw
`net-002`	Raw Socket Creation	🟠 High	65%	All
`net-003`	SSH Tunneling	🟠 High	60%	All
`net-005`	DNS Covert Channel	🟠 High	60%	All
`na-009`	Tor Network and Anonymizing Proxy Connections	🟡 Medium	55%	All
`net-004`	Proxy and Tor Usage	🟡 Medium	65%	All
`na-010`	Non-Standard Port Usage for HTTP/HTTPS	🟢 Low	45%	All

Rule Details

`na-007` — Reverse Shell Patterns

Severity: 🔴 Critical | Category: Network Abuse | Confidence threshold: 60% | Platforms: All

Detects reverse shell one-liners that connect back to an attacker-controlled host, providing interactive shell access

Remediation:

Reverse shells provide attackers with interactive command execution on compromised systems. These are unambiguous attack payloads — no legitimate use case exists for reverse shell one-liners in agent code. Remove immediately and investigate the source of this code.

References:

https://owasp.org/www-project-top-10-for-large-language-model-applications/ LLM06 Excessive Agency
https://atlas.mitre.org/techniques/AML.T0049

`net-001` — Bind Shell

Severity: 🔴 Critical | Category: Network Abuse | Confidence threshold: 60% | Platforms: All

Detects server-side bind shell patterns that open a listening port for incoming attacker connections

Remediation:

Bind shells open a network listener that an attacker can connect to directly. AI agents should never create raw TCP listeners. Remove all socket.bind/listen and net.createServer patterns unless they are part of a documented, sandboxed service with explicit user consent.

`na-006` — DNS Exfiltration via Long Subdomain Queries

Severity: 🟠 High | Category: Network Abuse | Confidence threshold: 55% | Platforms: All

Detects patterns of DNS exfiltration where data is encoded into unusually long subdomain labels to bypass network monitoring

Remediation:

DNS exfiltration encodes stolen data as subdomains of attacker-controlled domains. Each DNS query carries a fragment of exfiltrated content that bypasses HTTP/HTTPS monitoring. Implement DNS monitoring and block queries with unusually long labels. AI agents must not construct or resolve dynamically-encoded DNS queries.

References:

https://owasp.org/www-project-top-10-for-large-language-model-applications/ LLM02 Insecure Output
https://atlas.mitre.org/techniques/AML.T0049

`na-008` — Cryptocurrency Mining Endpoints

Severity: 🟠 High | Category: Network Abuse | Confidence threshold: 60% | Platforms: All

Detects connections to known cryptocurrency mining pool endpoints and mining-related protocol patterns

Remediation:

Cryptocurrency mining in agent environments consumes unauthorized compute resources and may indicate a broader supply-chain compromise. Remove all mining software, pool connections, and mining algorithm references. Investigate how this code was introduced.

References:

https://owasp.org/www-project-top-10-for-large-language-model-applications/ LLM05 Supply Chain
https://atlas.mitre.org/techniques/AML.T0049

`na-011` — MCP SSRF — Internal Network Access via Tool Parameters

Severity: 🟠 High | Category: Network Abuse | Confidence threshold: 55% | Platforms: mcp, claude, cursor

Detects SSRF patterns in MCP tool parameters where URLs point to internal/localhost/metadata ranges. 36.7% of MCP servers are vulnerable.

Remediation:

MCP tool parameters must not accept URLs pointing to internal networks, localhost, or cloud metadata endpoints. Implement URL validation and allowlisting on the server side. Block private IP ranges (10.x, 172.16-31.x, 192.168.x), localhost, and metadata endpoints (169.254.169.254).

References:

https://owasp.org/www-project-top-10-for-large-language-model-applications/ LLM06 Excessive Agency

`na-012` — Unrestricted gatewayUrl Override (SSRF)

Severity: 🟠 High | Category: Network Abuse | Confidence threshold: 65% | Platforms: openclaw, mcp

Detects gatewayUrl parameters pointing to private/internal addresses or cloud metadata — CVE-2026-26322

Remediation:

The gatewayUrl parameter must be validated against an explicit allowlist. Block all private IP ranges, localhost, and cloud metadata IPs.

References:

CVE-2026-26322

`na-013` — Browser CDP Relay Without Auth

Severity: 🟠 High | Category: Network Abuse | Confidence threshold: 60% | Platforms: openclaw

Detects /cdp WebSocket endpoints that may lack token validation — GHSA-mr32

Remediation:

CDP relay endpoints must require a shared secret token on every WebSocket upgrade and validate the Origin header. Without both controls, any website can steal session data.

`na-014` — Dangerous URL Scheme in Browser Navigation

Severity: 🟠 High | Category: Network Abuse | Confidence threshold: 65% | Platforms: openclaw

Detects file://, javascript:, or data: URL schemes in browser navigation — GHSA-45cg

Remediation:

Browser navigation guards must reject all URL schemes except http:// and https://. Use a deny-by-default approach for URL scheme validation.

`net-002` — Raw Socket Creation

Severity: 🟠 High | Category: Network Abuse | Confidence threshold: 65% | Platforms: All

Detects creation of raw network sockets that bypass normal OS protocol stacks, enabling packet crafting and sniffing

Remediation:

Raw sockets allow crafting arbitrary network packets and capturing all traffic on an interface. This capability is not required by legitimate AI agents. Remove raw socket usage and use higher-level network APIs instead.

`net-003` — SSH Tunneling

Severity: 🟠 High | Category: Network Abuse | Confidence threshold: 60% | Platforms: All

Detects SSH-based tunneling and port-forwarding patterns used to bypass firewalls or exfiltrate data covertly

Remediation:

SSH tunneling can be used to bypass network controls, exfiltrate data, or grant reverse access to internal systems. AI agents should not establish SSH port-forwards or tunnels. Remove these patterns entirely.

`net-005` — DNS Covert Channel

Severity: 🟠 High | Category: Network Abuse | Confidence threshold: 60% | Platforms: All

Detects DNS-over-HTTPS used as a covert communication channel and DNS tunneling tools that encode data in DNS queries

Remediation:

DNS covert channels encode data in DNS query subdomains or use DoH endpoints to bypass firewalls while exfiltrating data or maintaining C2 communication. AI agents should not use DNS-over-HTTPS programmatically or invoke DNS tunneling tools. Remove all such patterns and use standard HTTPS APIs instead.

`na-009` — Tor Network and Anonymizing Proxy Connections

Severity: 🟡 Medium | Category: Network Abuse | Confidence threshold: 55% | Platforms: All

Detects .onion domain connections and Tor/proxy configurations used to anonymize malicious network activity

Remediation:

Tor and .onion connections are used to anonymize communication with C2 servers and exfiltrate data beyond network monitoring. AI agents must use direct, auditable connections only. Remove all Tor proxy configurations and .onion references.

References:

https://owasp.org/www-project-top-10-for-large-language-model-applications/ LLM02 Insecure Output
https://atlas.mitre.org/techniques/AML.T0049

`net-004` — Proxy and Tor Usage

Severity: 🟡 Medium | Category: Network Abuse | Confidence threshold: 65% | Platforms: All

Detects use of SOCKS proxies, proxy chaining tools, and the Tor network to anonymize or reroute network traffic

Remediation:

Proxy and Tor usage in agent code can be used to anonymize malicious activity or bypass network monitoring. AI agents should use direct connections only. Remove SOCKS proxy configuration and Tor-related dependencies.

`na-010` — Non-Standard Port Usage for HTTP/HTTPS

Severity: 🟢 Low | Category: Network Abuse | Confidence threshold: 45% | Platforms: All

Detects HTTP or HTTPS traffic on non-standard ports, commonly used to bypass firewall rules and evade traffic inspection

Remediation:

Non-standard ports are frequently used to evade port-based firewall rules and network monitoring configured for standard ports (80, 443). Review all network connections using non-standard ports to ensure they are documented and authorized.

References:

https://owasp.org/www-project-top-10-for-large-language-model-applications/ LLM02 Insecure Output

permission-bypass

ID	Name	Severity	Confidence	Platforms
`kc-001`	Agent Config Permission Bypass	🔴 Critical	50%	All
`kc-002`	Agent Instruction File Rewrite	🔴 Critical	50%	All
`kc-003`	MCP Wildcard Permission Grant	🔴 Critical	50%	All
`pbp-001`	YOLO Mode / No-Approval Execution	🔴 Critical	50%	All
`pbp-003`	Sandbox Escape	🔴 Critical	50%	All
`pbp-002`	Full Disk Access Request	🟠 High	50%	All

Rule Details

`kc-001` — Agent Config Permission Bypass

Severity: 🔴 Critical | Category: permission-bypass | Confidence threshold: 50% | Platforms: All

Detects writes to agent configuration files that disable human approval or enable autonomous mode

Remediation:

Tools that write agent configuration files to disable human approval enable autonomous execution without oversight. CVE-2025-53773 demonstrated this attack. Never allow tools to modify approval settings programmatically.

`kc-002` — Agent Instruction File Rewrite

Severity: 🔴 Critical | Category: permission-bypass | Confidence threshold: 50% | Platforms: All

Detects tools that write to CLAUDE.md, copilot-instructions.md, or other agent instruction files

Remediation:

Writing to agent instruction files (CLAUDE.md, copilot-instructions.md) modifies how AI agents behave. This enables persistent injection of malicious instructions that persist across sessions.

`kc-003` — MCP Wildcard Permission Grant

Severity: 🔴 Critical | Category: permission-bypass | Confidence threshold: 50% | Platforms: All

Detects MCP server configurations with wildcard permissions granting unrestricted access

Remediation:

MCP server configurations with wildcard permissions grant unrestricted access to system resources. Always use principle of least privilege with specific allowed paths, commands, and hosts.

`pbp-001` — YOLO Mode / No-Approval Execution

Severity: 🔴 Critical | Category: permission-bypass | Confidence threshold: 50% | Platforms: All

Skill uses —yolo, —full-auto, or similar flags that disable safety confirmations

Remediation:

Disabling safety confirmations removes the human-in-the-loop barrier. If the agent is compromised via prompt injection, bypassed permissions allow arbitrary execution without operator approval.

`pbp-003` — Sandbox Escape

Severity: 🔴 Critical | Category: permission-bypass | Confidence threshold: 50% | Platforms: All

Skill explicitly escapes or disables sandboxing

Remediation:

Escaping the sandbox removes containment boundaries. A compromised agent with host-level access can affect the entire system.

`pbp-002` — Full Disk Access Request

Severity: 🟠 High | Category: permission-bypass | Confidence threshold: 50% | Platforms: All

Skill requires or requests macOS Full Disk Access — overly broad system permission

Remediation:

Full Disk Access grants read access to all files on disk, far exceeding what most tools need. Request only the minimum permissions required for the tool’s function.

Permission Overgrant

ID	Name	Severity	Confidence	Platforms
`perm-002`	Maximum Blast Radius Permission Combo	🔴 Critical	70%	openclaw
`po-005`	Agent Filesystem Write to Sensitive Directories	🔴 Critical	60%	All
`perm-001`	Wildcard Permission	🟠 High	50%	openclaw
`po-004`	MCP Server Wildcard Tool Permissions	🟠 High	55%	mcp, claude, openclaw
`po-007`	Allow-All Network Policy	🟠 High	55%	All
`perm-003`	Dangerous Tool Declarations	🟡 Medium	50%	openclaw
`po-006`	Overly Broad CORS Configuration	🟡 Medium	55%	All

Rule Details

`perm-002` — Maximum Blast Radius Permission Combo

Severity: 🔴 Critical | Category: Permission Overgrant | Confidence threshold: 70% | Platforms: openclaw

Skill requests shell + network + filesystem permissions in a permissions block — maximum attack surface

Remediation:

Skills with shell + network + filesystem access can exfiltrate any data. This combination should be carefully reviewed.

`po-005` — Agent Filesystem Write to Sensitive Directories

Severity: 🔴 Critical | Category: Permission Overgrant | Confidence threshold: 60% | Platforms: All

Detects agent configurations or code requesting write access to sensitive system directories like /etc, /root, or ~/.ssh

Remediation:

Agents must not write to system directories (/etc, /root, /boot, ~/.ssh). Confine filesystem write permissions to the application’s own data directory. Use explicit path allowlists, never path-prefix grants to system locations.

References:

https://owasp.org/www-project-top-10-for-large-language-model-applications/ LLM08 Excessive Permissions
https://atlas.mitre.org/techniques/AML.T0049

`perm-001` — Wildcard Permission

Severity: 🟠 High | Category: Permission Overgrant | Confidence threshold: 50% | Platforms: openclaw

Skill requests wildcard permissions granting unrestricted access

Remediation:

Avoid wildcard permissions. Request only the specific permissions needed (e.g., shell:read, filesystem:home).

`po-004` — MCP Server Wildcard Tool Permissions

Severity: 🟠 High | Category: Permission Overgrant | Confidence threshold: 55% | Platforms: mcp, claude, openclaw

Detects MCP server configurations that request wildcard or all-tools permissions, granting unrestricted tool access

Remediation:

MCP servers must declare the minimum set of tools required. Wildcard tool permissions grant agents access to every registered tool, including dangerous ones. Enumerate the specific tools needed explicitly.

References:

https://owasp.org/www-project-top-10-for-large-language-model-applications/ LLM08 Excessive Permissions
https://atlas.mitre.org/techniques/AML.T0043

`po-007` — Allow-All Network Policy

Severity: 🟠 High | Category: Permission Overgrant | Confidence threshold: 55% | Platforms: All

Detects network policies or firewall rules that permit all inbound or outbound traffic, removing network isolation

Remediation:

Allow-all network policies remove critical isolation for agent environments. Define explicit allowlists for permitted endpoints and ports. Apply zero-trust network principles: deny by default, allow by exception.

References:

https://owasp.org/www-project-top-10-for-large-language-model-applications/ LLM08 Excessive Permissions
https://atlas.mitre.org/techniques/AML.T0049

`perm-003` — Dangerous Tool Declarations

Severity: 🟡 Medium | Category: Permission Overgrant | Confidence threshold: 50% | Platforms: openclaw

Skill declares tools that provide excessive system access

Remediation:

Minimize tool access in skill declarations. Use the most restrictive tools that accomplish the task.

`po-006` — Overly Broad CORS Configuration

Severity: 🟡 Medium | Category: Permission Overgrant | Confidence threshold: 55% | Platforms: All

Detects CORS policies that allow all origins, enabling cross-origin attacks on agent APIs

Remediation:

CORS wildcard (Access-Control-Allow-Origin: *) allows any website to make cross-origin requests to your agent API, enabling data theft and CSRF attacks. Restrict allowed origins to an explicit allowlist of trusted domains.

References:

https://owasp.org/www-project-top-10-for-large-language-model-applications/ LLM08 Excessive Permissions

Privilege Escalation

ID	Name	Severity	Confidence	Platforms
`pe-011`	Sudo/Root Escalation in Agent Config	🔴 Critical	60%	All
`pe-013`	Docker Privileged Container or Capability Addition	🔴 Critical	60%	All
`pe-014`	AWS IAM Wildcard Permission Policy	🔴 Critical	65%	All
`pe-015`	Setuid/Setgid Bit Setting	🔴 Critical	65%	All
`privesc-001`	Sudo/Root Command Execution	🔴 Critical	85%	All
`privesc-002`	Process Injection Patterns	🔴 Critical	90%	All
`privesc-004`	Setuid/Capability Manipulation	🔴 Critical	85%	All
`privesc-007`	Kernel Module Loading	🔴 Critical	90%	All
`privesc-009`	Container Escape Patterns	🔴 Critical	85%	All
`pe-012`	World-Writable File Permission Setting	🟠 High	55%	All
`pe-016`	Crontab and Systemd Persistence Installation	🟠 High	60%	All
`privesc-003`	Shell Escape Sequences	🟠 High	80%	All
`privesc-005`	Cron/Scheduled Task Manipulation	🟠 High	80%	All
`privesc-006`	Service/Daemon Manipulation	🟠 High	80%	All
`privesc-010`	Debugger Attachment	🟠 High	80%	All
`pe-017`	safeBins Trusted Directory in User-Writable Path	🟡 Medium	55%	openclaw
`pe-018`	Unvalidated PID Kill Without Ownership Check	🟡 Medium	60%	openclaw
`privesc-008`	Environment Path Manipulation	🟡 Medium	70%	All

Rule Details

`pe-011` — Sudo/Root Escalation in Agent Config

Severity: 🔴 Critical | Category: Privilege Escalation | Confidence threshold: 60% | Platforms: All

Detects sudo, root, or administrator escalation commands embedded in agent configurations or scripts

Remediation:

AI agents must not execute commands with elevated privileges. Remove sudo, doas, runas, and run-as-root patterns from agent configurations. Use least-privilege service accounts and grant only the minimum permissions needed.

References:

https://owasp.org/www-project-top-10-for-large-language-model-applications/ LLM06 Excessive Agency
https://atlas.mitre.org/techniques/AML.T0043

`pe-013` — Docker Privileged Container or Capability Addition

Severity: 🔴 Critical | Category: Privilege Escalation | Confidence threshold: 60% | Platforms: All

Detects Docker —privileged flag or —cap-add usage that grants host-level capabilities to containers

Remediation:

Privileged containers have unrestricted access to the host kernel. Remove —privileged and high-privilege —cap-add flags. Use specific minimal capabilities only when absolutely required and document the justification.

References:

https://owasp.org/www-project-top-10-for-large-language-model-applications/ LLM06 Excessive Agency
https://atlas.mitre.org/techniques/AML.T0049

`pe-014` — AWS IAM Wildcard Permission Policy

Severity: 🔴 Critical | Category: Privilege Escalation | Confidence threshold: 65% | Platforms: All

Detects IAM policy statements granting wildcard Action or Resource permissions, violating least-privilege

Remediation:

IAM policies with Action: '' or combined Action: '' + Resource: ’*’ grant full AWS account access. Apply least-privilege: enumerate only the specific actions and resources required. Use IAM Access Analyzer to validate policies.

References:

https://owasp.org/www-project-top-10-for-large-language-model-applications/ LLM08 Excessive Permissions
https://atlas.mitre.org/techniques/AML.T0049

`pe-015` — Setuid/Setgid Bit Setting

Severity: 🔴 Critical | Category: Privilege Escalation | Confidence threshold: 65% | Platforms: All

Detects chmod commands that set the setuid or setgid bit, enabling privilege escalation via SUID binaries

Remediation:

SUID/SGID binaries run with the owner’s privileges regardless of who executes them. AI agents must never set the setuid or setgid bit. Remove all chmod +s and numeric setuid/setgid modes. Audit any binary with these bits already set.

References:

https://owasp.org/www-project-top-10-for-large-language-model-applications/ LLM06 Excessive Agency
https://atlas.mitre.org/techniques/AML.T0049

`privesc-001` — Sudo/Root Command Execution

Severity: 🔴 Critical | Category: Privilege Escalation | Confidence threshold: 85% | Platforms: All

Detects attempts to execute commands with elevated privileges

Remediation:

AI agents should never execute commands with elevated privileges. Remove sudo/su usage.

`privesc-002` — Process Injection Patterns

Severity: 🔴 Critical | Category: Privilege Escalation | Confidence threshold: 90% | Platforms: All

Detects process injection or DLL injection patterns

Remediation:

Process injection is a serious security concern. This should never be in an AI agent.

`privesc-004` — Setuid/Capability Manipulation

Severity: 🔴 Critical | Category: Privilege Escalation | Confidence threshold: 85% | Platforms: All

Detects attempts to modify file permissions or capabilities

Remediation:

File permission and capability manipulation can lead to privilege escalation. Remove these.

`privesc-007` — Kernel Module Loading

Severity: 🔴 Critical | Category: Privilege Escalation | Confidence threshold: 90% | Platforms: All

Detects attempts to load kernel modules

Remediation:

Kernel module manipulation is extremely dangerous. This should never be in an AI agent.

`privesc-009` — Container Escape Patterns

Severity: 🔴 Critical | Category: Privilege Escalation | Confidence threshold: 85% | Platforms: All

Detects attempts to escape container environments

Remediation:

Container escape attempts are critical security issues. Remove these patterns.

`pe-012` — World-Writable File Permission Setting

Severity: 🟠 High | Category: Privilege Escalation | Confidence threshold: 55% | Platforms: All

Detects chmod commands that set world-writable or world-executable bits, enabling privilege escalation via file replacement

Remediation:

Overly permissive file modes (777, 666) and root ownership changes are privilege escalation enablers. Use the principle of least privilege: 640 for files, 750 for directories. Never grant world-write permissions on files that could be executed.

References:

https://owasp.org/www-project-top-10-for-large-language-model-applications/ LLM06 Excessive Agency
https://atlas.mitre.org/techniques/AML.T0049

`pe-016` — Crontab and Systemd Persistence Installation

Severity: 🟠 High | Category: Privilege Escalation | Confidence threshold: 60% | Platforms: All

Detects crontab modifications and systemd unit installations used to establish persistent backdoor execution

Remediation:

Crontab and systemd service installation are common persistence mechanisms. AI agents must not modify scheduled tasks or install services without explicit user authorization. Remove crontab edits and systemctl enable calls. Review any @reboot entries — these survive reboots and are hard to detect.

References:

https://owasp.org/www-project-top-10-for-large-language-model-applications/ LLM06 Excessive Agency
https://atlas.mitre.org/techniques/AML.T0043

`privesc-003` — Shell Escape Sequences

Severity: 🟠 High | Category: Privilege Escalation | Confidence threshold: 80% | Platforms: All

Detects attempts to escape restricted shells

Remediation:

Remove shell escape patterns. These attempt to break out of restricted environments.

`privesc-005` — Cron/Scheduled Task Manipulation

Severity: 🟠 High | Category: Privilege Escalation | Confidence threshold: 80% | Platforms: All

Detects modification of scheduled tasks

Remediation:

Scheduled task modification should not be performed by AI agents without explicit permission.

`privesc-006` — Service/Daemon Manipulation

Severity: 🟠 High | Category: Privilege Escalation | Confidence threshold: 80% | Platforms: All

Detects attempts to modify system services

Remediation:

System service manipulation requires careful review. Ensure this is intended behavior.

`privesc-010` — Debugger Attachment

Severity: 🟠 High | Category: Privilege Escalation | Confidence threshold: 80% | Platforms: All

Detects attempts to attach debuggers to processes

Remediation:

Debugger attachment can be used for privilege escalation. Review this carefully.

`pe-017` — safeBins Trusted Directory in User-Writable Path

Severity: 🟡 Medium | Category: Privilege Escalation | Confidence threshold: 55% | Platforms: openclaw

Detects exec-allowlist configs trusting user-writable package manager directories — GHSA-5gj7

Remediation:

safeBins must only trust immutable system directories (/bin, /usr/bin, /sbin). Package manager paths like /opt/homebrew/bin are writable and must not be in the default trusted set.

`pe-018` — Unvalidated PID Kill Without Ownership Check

Severity: 🟡 Medium | Category: Privilege Escalation | Confidence threshold: 60% | Platforms: openclaw

Detects process termination via pattern matching without verifying process ownership — CVE-2026-27486

Remediation:

Before sending SIGKILL, validate that the process is a direct child (ppid == process.pid). Never use pkill/killall as the sole process selector on shared systems.

References:

CVE-2026-27486

`privesc-008` — Environment Path Manipulation

Severity: 🟡 Medium | Category: Privilege Escalation | Confidence threshold: 70% | Platforms: All

Detects PATH or library path manipulation

Remediation:

PATH manipulation can lead to binary hijacking. Review environment variable changes.

Prompt Injection

ID	Name	Severity	Confidence	Platforms
`adv-009`	Hidden Prompt Injection in HTML Content	🔴 Critical	50%	All
`adv-011`	Agent Backstory with External Data Exfiltration	🟠 High	55%	All
`prompt-001`	Instruction Override in Tool Description	🟠 High	75%	All
`prompt-002`	System Prompt Extraction	🟠 High	80%	All
`prompt-004`	Hidden Instructions in Unicode	🟠 High	85%	All
`prompt-009`	Recursive Prompt Injection	🟠 High	80%	All
`prompt-012`	Non-Latin Override Instructions	🟠 High	60%	All
`prompt-013`	Unicode Tag Characters	🟠 High	70%	All
`prompt-003`	Role Manipulation	🟡 Medium	70%	All
`prompt-005`	Delimiter Injection	🟡 Medium	70%	All
`prompt-006`	Encoded Instruction Injection	🟡 Medium	75%	All
`prompt-007`	Context Manipulation	🟡 Medium	75%	All
`prompt-010`	Markdown/HTML Injection	🟡 Medium	70%	All
`prompt-011`	Homoglyph Mixed-Script Attack	🟡 Medium	50%	All
`prompt-014`	Hypothetical Framing Prompt Injection	🟡 Medium	60%	All
`prompt-015`	Unsafe Markdown HTML Rendering (XSS via innerHTML)	🟡 Medium	60%	openclaw, claude, mcp
`prompt-016`	External Metadata Injected into System Prompt	🟡 Medium	60%	openclaw
`prompt-008`	Output Format Manipulation	🟢 Low	65%	All

Rule Details

`adv-009` — Hidden Prompt Injection in HTML Content

Severity: 🔴 Critical | Category: Prompt Injection | Confidence threshold: 50% | Platforms: All

Detects hidden prompt injection payloads in HTML using display:none or invisible text targeting AI agents

Remediation:

Hidden HTML elements containing instructions like [SYSTEM OVERRIDE] are indirect prompt injection attacks. Attackers embed invisible text in web pages that AI agents process, causing the agent to execute unauthorized actions. All tool output from web fetching should be sanitized and hidden content stripped before processing.

`adv-011` — Agent Backstory with External Data Exfiltration

Severity: 🟠 High | Category: Prompt Injection | Confidence threshold: 55% | Platforms: All

Detects CrewAI/agent backstory text that directs the agent to send data to external services

Remediation:

Agent backstory text that instructs sending data to external URLs is a social engineering attack disguised as agent configuration. Backstories should describe the agent’s persona, not contain operational directives to exfiltrate data.

`prompt-001` — Instruction Override in Tool Description

Severity: 🟠 High | Category: Prompt Injection | Confidence threshold: 75% | Platforms: All

Detects prompt injection patterns in tool/skill descriptions

Remediation:

Remove instruction override patterns from descriptions. These are prompt injection attempts.

`prompt-002` — System Prompt Extraction

Severity: 🟠 High | Category: Prompt Injection | Confidence threshold: 80% | Platforms: All

Detects attempts to extract system prompts

Remediation:

Remove prompt extraction attempts. These try to reveal confidential instructions.

`prompt-004` — Hidden Instructions in Unicode

Severity: 🟠 High | Category: Prompt Injection | Confidence threshold: 85% | Platforms: All

Detects hidden instructions using Unicode tricks

Remediation:

Remove invisible Unicode characters. These may hide malicious instructions.

`prompt-009` — Recursive Prompt Injection

Severity: 🟠 High | Category: Prompt Injection | Confidence threshold: 80% | Platforms: All

Detects prompts designed to inject into future contexts

Remediation:

Remove recursive injection patterns. These attempt to persist malicious instructions.

`prompt-012` — Non-Latin Override Instructions

Severity: 🟠 High | Category: Prompt Injection | Confidence threshold: 60% | Platforms: All

Detects override keywords combined with non-Latin script characters

Remediation:

Remove override instructions combined with non-Latin text. This is a multi-lingual injection technique to bypass Latin-only filters.

`prompt-013` — Unicode Tag Characters

Severity: 🟠 High | Category: Prompt Injection | Confidence threshold: 70% | Platforms: All

Detects Unicode tag characters (U+E0001-U+E007F) used to hide invisible markup

Remediation:

Remove Unicode tag characters. These are invisible characters that can hide malicious instructions.

`prompt-003` — Role Manipulation

Severity: 🟡 Medium | Category: Prompt Injection | Confidence threshold: 70% | Platforms: All

Detects attempts to change AI behavior through role play

Remediation:

Remove role manipulation patterns. These attempt to bypass AI safety measures.

`prompt-005` — Delimiter Injection

Severity: 🟡 Medium | Category: Prompt Injection | Confidence threshold: 70% | Platforms: All

Detects attempts to break out of delimiters

Remediation:

Remove fake delimiters that attempt to inject system-level instructions.

`prompt-006` — Encoded Instruction Injection

Severity: 🟡 Medium | Category: Prompt Injection | Confidence threshold: 75% | Platforms: All

Detects encoded or obfuscated instructions

Remediation:

Remove encoded instructions. These attempt to bypass content filtering.

`prompt-007` — Context Manipulation

Severity: 🟡 Medium | Category: Prompt Injection | Confidence threshold: 75% | Platforms: All

Detects false authorization claims and fake privilege escalation in tool descriptions

Remediation:

Remove context manipulation attempts. These try to mislead the AI about user intent.

`prompt-010` — Markdown/HTML Injection

Severity: 🟡 Medium | Category: Prompt Injection | Confidence threshold: 70% | Platforms: All

Detects attempts to inject via markdown or HTML

Remediation:

Sanitize markdown and HTML content. These may execute malicious code.

`prompt-011` — Homoglyph Mixed-Script Attack

Severity: 🟡 Medium | Category: Prompt Injection | Confidence threshold: 50% | Platforms: All

Detects Cyrillic/Greek/Armenian characters mixed with Latin text (homoglyph attacks)

Remediation:

Remove mixed-script text. Homoglyph attacks use visually identical characters from different scripts to bypass filters.

`prompt-014` — Hypothetical Framing Prompt Injection

Severity: 🟡 Medium | Category: Prompt Injection | Confidence threshold: 60% | Platforms: All

Detects hypothetical/imaginative framing used to bypass safety guardrails by asking the AI to imagine having access to restricted resources

Remediation:

Hypothetical framing is a prompt injection technique where the attacker asks the AI to imagine having elevated access. The AI may then act on the hypothetical scenario as if it were real. Tool descriptions should never contain hypothetical prompts or imaginative framing of access to restricted resources.

`prompt-015` — Unsafe Markdown HTML Rendering (XSS via innerHTML)

Severity: 🟡 Medium | Category: Prompt Injection | Confidence threshold: 60% | Platforms: openclaw, claude, mcp

Detects markdown parsers rendering directly to innerHTML without sanitization — GHSA-r294

Remediation:

Never render user-controlled markdown directly to innerHTML without HTML sanitization. Use DOMPurify or override the HTML token renderer in marked.js.

References:

GHSA-r294-2894-92j3

`prompt-016` — External Metadata Injected into System Prompt

Severity: 🟡 Medium | Category: Prompt Injection | Confidence threshold: 60% | Platforms: openclaw

Detects Slack/channel metadata interpolated into system prompts — CVE-2026-24764

Remediation:

External metadata from third-party platforms must never be interpolated into system prompts. This creates a prompt injection channel for anyone with channel edit permissions.

References:

CVE-2026-24764

`prompt-008` — Output Format Manipulation

Severity: 🟢 Low | Category: Prompt Injection | Confidence threshold: 65% | Platforms: All

Detects attempts to control AI output format maliciously

Remediation:

Review output format instructions. Some may attempt to suppress safety warnings.

Secret Detection

ID	Name	Severity	Confidence	Platforms
`sec-007`	Stripe Live Secret Key	🔴 Critical	55%	All
`sec-009`	Stripe Restricted Key	🔴 Critical	55%	All
`sec-010`	Square Application Secret	🔴 Critical	60%	All
`sec-011`	PayPal / Braintree Credentials	🔴 Critical	60%	All
`sec-035`	HashiCorp Vault Token	🔴 Critical	60%	All
`sec-037`	Cloudflare API Token and Key	🔴 Critical	60%	All
`sec-038`	Base64-Encoded Private Key	🔴 Critical	70%	All
`sec-056`	Supabase Service Role Key (Inline)	🔴 Critical	55%	All
`sec-001`	Azure Storage Account Key	🟠 High	70%	All
`sec-002`	Azure SAS Token	🟠 High	65%	All
`sec-003`	Azure Active Directory Client Secret	🟠 High	60%	All
`sec-004`	Azure Subscription Key (Cognitive Services / API Management)	🟠 High	65%	All
`sec-005`	Alibaba Cloud Access Key	🟠 High	70%	All
`sec-006`	IBM Cloud API Key	🟠 High	70%	All
`sec-008`	Stripe Live Publishable Key	🟠 High	60%	All
`sec-012`	Twilio Account SID and Auth Token	🟠 High	65%	All
`sec-013`	SendGrid API Key	🟠 High	60%	All
`sec-014`	Mailgun API Key	🟠 High	65%	All
`sec-016`	Postmark Server Token	🟠 High	65%	All
`sec-017`	Heroku API Key	🟠 High	65%	All
`sec-018`	DigitalOcean Personal Access Token	🟠 High	60%	All
`sec-019`	Terraform Cloud Token	🟠 High	65%	All
`sec-021`	CircleCI API Token	🟠 High	65%	All
`sec-022`	Travis CI API Token	🟠 High	65%	All
`sec-024`	Vercel API Token	🟠 High	60%	All
`sec-025`	Discord Bot Token	🟠 High	65%	All
`sec-027`	Twitch API Credentials	🟠 High	65%	All
`sec-028`	Telegram Bot Token	🟠 High	65%	All
`sec-029`	Facebook / Meta App Secret	🟠 High	60%	All
`sec-030`	Firebase API Key	🟠 High	60%	All
`sec-031`	Algolia Admin API Key	🟠 High	65%	All
`sec-034`	Datadog API and Application Keys	🟠 High	65%	All
`sec-036`	Consul ACL Token	🟠 High	65%	All
`sec-039`	Hardcoded JWT Token	🟠 High	55%	All
`sec-041`	Generic Secret in URL Query Parameter	🟠 High	70%	All
`sec-045`	Shopify Access Token	🟠 High	65%	All
`sec-046`	Okta API Token	🟠 High	65%	All
`sec-048`	Elastic Cloud API Key	🟠 High	65%	All
`sec-052`	Pinecone API Key	🟠 High	65%	All
`sec-053`	Cohere API Key	🟠 High	65%	All
`sec-054`	Hugging Face Token	🟠 High	60%	All
`sec-055`	Replicate API Token	🟠 High	65%	All
`sec-015`	Mailchimp API Key	🟡 Medium	65%	All
`sec-020`	Sentry DSN	🟡 Medium	70%	All
`sec-023`	Codecov Upload Token	🟡 Medium	65%	All
`sec-026`	Discord Webhook URL	🟡 Medium	70%	All
`sec-032`	Segment Write Key	🟡 Medium	65%	All
`sec-033`	Mixpanel Token and Secret	🟡 Medium	65%	All
`sec-040`	Generic API Key Assignment	🟡 Medium	75%	All
`sec-042`	High-Entropy Hex String Assigned to Secret Variable	🟡 Medium	75%	All
`sec-043`	PagerDuty Integration Key	🟡 Medium	65%	All
`sec-044`	Zendesk API Token	🟡 Medium	65%	All
`sec-047`	Atlassian API Token	🟡 Medium	65%	All
`sec-049`	Airtable API Key	🟡 Medium	65%	All
`sec-050`	Linear API Key	🟡 Medium	65%	All
`sec-051`	Notion Integration Token	🟡 Medium	65%	All
`sec-057`	Pusher Application Secret	🟡 Medium	65%	All
`sec-058`	Amplitude API Key and Secret	🟡 Medium	65%	All
`sec-059`	Mapbox Access Token	🟡 Medium	65%	All
`sec-060`	Intercom Access Token	🟡 Medium	65%	All

Rule Details

`sec-007` — Stripe Live Secret Key

Severity: 🔴 Critical | Category: Secret Detection | Confidence threshold: 55% | Platforms: All

Detects Stripe live-mode secret API keys which allow full account access

Remediation:

This is a critical incident. A Stripe live secret key can create charges, access customer data, and perform refunds. Immediately:

Roll the key in the Stripe dashboard (Developers > API keys)
Audit recent API calls for unauthorized activity
Store keys exclusively in environment variables or a secrets manager

`sec-009` — Stripe Restricted Key

Severity: 🔴 Critical | Category: Secret Detection | Confidence threshold: 55% | Platforms: All

Detects Stripe restricted API keys

Remediation:

Roll the restricted key immediately in the Stripe dashboard (Developers > API keys). Even restricted keys can perform sensitive operations within their scope. Store all Stripe keys in a secrets manager, never in source code.

`sec-010` — Square Application Secret

Severity: 🔴 Critical | Category: Secret Detection | Confidence threshold: 60% | Platforms: All

Detects Square OAuth application secrets and access tokens

Remediation:

Revoke the exposed Square credential immediately in the Square Developer dashboard under OAuth > Applications. Square application secrets can be used to impersonate your application. Rotate and store exclusively in a secrets manager or environment variables.

`sec-011` — PayPal / Braintree Credentials

Severity: 🔴 Critical | Category: Secret Detection | Confidence threshold: 60% | Platforms: All

Detects PayPal REST API client secrets and Braintree tokens

Remediation:

Revoke PayPal/Braintree credentials immediately in their respective dashboards. These credentials can process financial transactions. Use environment variables or a secrets manager and enforce secret scanning on your repositories.

`sec-035` — HashiCorp Vault Token

Severity: 🔴 Critical | Category: Secret Detection | Confidence threshold: 60% | Platforms: All

Detects Vault service tokens and batch tokens used for secrets access

Remediation:

Revoke the Vault token immediately using vault token revoke <token> or via the Vault UI. Vault tokens can access any secret in their policy scope. Use short-TTL tokens, AppRole authentication, or Kubernetes auth instead of static tokens.

`sec-037` — Cloudflare API Token and Key

Severity: 🔴 Critical | Category: Secret Detection | Confidence threshold: 60% | Platforms: All

Detects Cloudflare API tokens and global API keys

Remediation:

Revoke the Cloudflare token in My Profile > API Tokens. Cloudflare global API keys have access to your entire account including DNS, WAF, and Workers. Use scoped API tokens (not the global key) and grant only the permissions required for the specific use case.

`sec-038` — Base64-Encoded Private Key

Severity: 🔴 Critical | Category: Secret Detection | Confidence threshold: 70% | Platforms: All

Detects base64-encoded PEM private keys used to obfuscate credentials

Remediation:

A base64-encoded private key is just as sensitive as the raw PEM key. Remove it from code immediately, rotate the key pair, and use a secrets manager or environment variable to provide keys at runtime. Encoding is not encryption and provides no security benefit.

`sec-056` — Supabase Service Role Key (Inline)

Severity: 🔴 Critical | Category: Secret Detection | Confidence threshold: 55% | Platforms: All

Detects full Supabase service role JWT tokens hardcoded inline

Remediation:

The Supabase service role key bypasses Row Level Security on all tables. Rotate it immediately in the Supabase dashboard under Settings > API. Never expose it in client-side code or commit it to version control. Use the anon key for client-side access and apply strict RLS policies.

`sec-001` — Azure Storage Account Key

Severity: 🟠 High | Category: Secret Detection | Confidence threshold: 70% | Platforms: All

Detects Azure Storage account access keys embedded in code or config

Remediation:

Never hardcode Azure Storage keys. Use managed identities, Azure Key Vault, or environment variables instead:

Assign the Storage Blob Data Contributor role to your managed identity
Reference secrets via Key Vault references in App Service configuration
Rotate the exposed key immediately in the Azure portal

`sec-002` — Azure SAS Token

Severity: 🟠 High | Category: Secret Detection | Confidence threshold: 65% | Platforms: All

Detects Azure Shared Access Signature tokens which grant time-limited storage access

Remediation:

SAS tokens provide direct access to Azure resources. If exposed:

Revoke the SAS token by regenerating the storage account key it was derived from
Use short-lived SAS tokens generated server-side on demand
Prefer managed identities over SAS tokens for service-to-service access

`sec-003` — Azure Active Directory Client Secret

Severity: 🟠 High | Category: Secret Detection | Confidence threshold: 60% | Platforms: All

Detects Azure AD application client secrets used for service principal authentication

Remediation:

Rotate the Azure AD client secret immediately in the Azure portal under App Registrations > Certificates & secrets. Switch to certificate-based authentication or managed identities to avoid secret rotation entirely.

`sec-004` — Azure Subscription Key (Cognitive Services / API Management)

Severity: 🟠 High | Category: Secret Detection | Confidence threshold: 65% | Platforms: All

Detects Azure API Management or Cognitive Services subscription keys

Remediation:

Regenerate the exposed subscription key in Azure API Management or Cognitive Services. Use Azure Key Vault to store and retrieve keys at runtime rather than embedding them in source or config files.

`sec-005` — Alibaba Cloud Access Key

Severity: 🟠 High | Category: Secret Detection | Confidence threshold: 70% | Platforms: All

Detects Alibaba Cloud (Aliyun) access key IDs and secrets

Remediation:

Revoke the exposed Alibaba Cloud access key in the RAM console immediately. Use RAM roles with STS temporary credentials or instance RAM roles instead of long-lived access key pairs.

`sec-006` — IBM Cloud API Key

Severity: 🟠 High | Category: Secret Detection | Confidence threshold: 70% | Platforms: All

Detects IBM Cloud IAM API keys

Remediation:

Revoke the IBM Cloud API key in the IAM console (Manage > Access > API keys). Use service IDs with IAM policies scoped to the minimum required permissions and generate keys via the API rather than storing static keys.

`sec-008` — Stripe Live Publishable Key

Severity: 🟠 High | Category: Secret Detection | Confidence threshold: 60% | Platforms: All

Detects Stripe live-mode publishable keys which can be used to initiate payments

Remediation:

While publishable keys are designed for client-side use, they should not appear in server-side secrets files or VCS history. If the corresponding secret key is also exposed, treat this as a critical incident. Roll both keys in the Stripe dashboard.

`sec-012` — Twilio Account SID and Auth Token

Severity: 🟠 High | Category: Secret Detection | Confidence threshold: 65% | Platforms: All

Detects Twilio account credentials which enable SMS/voice actions and billing

Remediation:

Rotate the Twilio auth token immediately in the Twilio Console under Account > General Settings. Auth tokens can send SMS/calls charged to your account. Use API Keys (more limited scope) instead of auth tokens where possible.

`sec-013` — SendGrid API Key

Severity: 🟠 High | Category: Secret Detection | Confidence threshold: 60% | Platforms: All

Detects SendGrid email API keys which allow sending emails on your behalf

Remediation:

Delete the exposed SendGrid API key immediately in Settings > API Keys. Create a replacement key with the minimum required permissions (e.g., Mail Send only). Store in environment variables or a secrets manager.

`sec-014` — Mailgun API Key

Severity: 🟠 High | Category: Secret Detection | Confidence threshold: 65% | Platforms: All

Detects Mailgun email service API keys

Remediation:

Rotate the Mailgun API key in the Mailgun Control Panel under Settings > API Keys. Use domain-level sending keys rather than the primary account API key to limit the blast radius of a leak.

`sec-016` — Postmark Server Token

Severity: 🟠 High | Category: Secret Detection | Confidence threshold: 65% | Platforms: All

Detects Postmark transactional email server API tokens

Remediation:

Regenerate the Postmark server token in the Postmark app under Servers > API Tokens. Use separate tokens per environment and store in a secrets manager.

`sec-017` — Heroku API Key

Severity: 🟠 High | Category: Secret Detection | Confidence threshold: 65% | Platforms: All

Detects Heroku platform API keys which allow full account management

Remediation:

Revoke the Heroku API key in Account Settings > API Key. Heroku API keys grant full control over all your apps and pipelines. Use OAuth tokens with limited scopes for CI/CD automation instead.

`sec-018` — DigitalOcean Personal Access Token

Severity: 🟠 High | Category: Secret Detection | Confidence threshold: 60% | Platforms: All

Detects DigitalOcean API tokens for infrastructure management

Remediation:

Delete the token in DigitalOcean API Settings and generate a new one with read-only or scoped access. DigitalOcean tokens with write access can create/destroy Droplets and databases.

`sec-019` — Terraform Cloud Token

Severity: 🟠 High | Category: Secret Detection | Confidence threshold: 65% | Platforms: All

Detects Terraform Cloud / Enterprise API tokens

Remediation:

Revoke the token in Terraform Cloud under User/Organization Settings > Tokens. Terraform tokens can apply infrastructure changes. Use short-lived tokens and machine users (team tokens) for CI pipelines rather than personal tokens.

`sec-021` — CircleCI API Token

Severity: 🟠 High | Category: Secret Detection | Confidence threshold: 65% | Platforms: All

Detects CircleCI personal or project API tokens

Remediation:

Delete the CircleCI personal API token in User Settings > Personal API Tokens. CircleCI tokens with project access can trigger pipelines and read secrets. Use project-scoped tokens and rotate them regularly.

`sec-022` — Travis CI API Token

Severity: 🟠 High | Category: Secret Detection | Confidence threshold: 65% | Platforms: All

Detects Travis CI authentication tokens

Remediation:

Regenerate the Travis CI token in Profile > Settings > API Authentication. Ensure Travis CI environment variables containing secrets are marked as hidden and not displayed in build logs.

`sec-024` — Vercel API Token

Severity: 🟠 High | Category: Secret Detection | Confidence threshold: 60% | Platforms: All

Detects Vercel deployment and management API tokens

Remediation:

Delete the Vercel token in Account Settings > Tokens. Vercel tokens can deploy code and manage projects. Use team-scoped tokens with the minimum required access level for CI/CD pipelines.

`sec-025` — Discord Bot Token

Severity: 🟠 High | Category: Secret Detection | Confidence threshold: 65% | Platforms: All

Detects Discord bot tokens which allow full bot account control

Remediation:

Reset the bot token immediately in the Discord Developer Portal under Applications > Bot > Reset Token. Anyone with the token can act as your bot, join servers, and send messages. Rotate and store in environment variables only.

`sec-027` — Twitch API Credentials

Severity: 🟠 High | Category: Secret Detection | Confidence threshold: 65% | Platforms: All

Detects Twitch application client secrets and OAuth tokens

Remediation:

Revoke the Twitch application secret in the Twitch Developer Console. OAuth tokens should be treated as passwords and stored only in secure server-side secret stores.

`sec-028` — Telegram Bot Token

Severity: 🟠 High | Category: Secret Detection | Confidence threshold: 65% | Platforms: All

Detects Telegram bot API tokens issued by BotFather

Remediation:

Request a new token from Telegram’s BotFather using /revoke. Anyone with the bot token can read all messages sent to the bot and send messages as it. Never commit bot tokens to version control.

`sec-029` — Facebook / Meta App Secret

Severity: 🟠 High | Category: Secret Detection | Confidence threshold: 60% | Platforms: All

Detects Facebook/Meta application secrets and access tokens

Remediation:

Rotate the Facebook app secret in the Meta Developer Console under App Settings > Basic. App secrets can generate user access tokens and make server-side API calls. Treat them as passwords and never expose them in client-side code.

`sec-030` — Firebase API Key

Severity: 🟠 High | Category: Secret Detection | Confidence threshold: 60% | Platforms: All

Detects Firebase project configuration API keys and service account credentials

Remediation:

Firebase Web API keys are intended for client-side use but should be restricted in the Google Cloud Console to specific HTTP referrers or IP addresses. For server-side access, use Firebase Admin SDK with a service account and store the private key in a secrets manager. Restrict Firebase security rules to prevent unauthorized database/storage access.

`sec-031` — Algolia Admin API Key

Severity: 🟠 High | Category: Secret Detection | Confidence threshold: 65% | Platforms: All

Detects Algolia admin API keys which provide full index management access

Remediation:

Rotate the Algolia admin key in API Keys settings. The admin key can add, delete, and modify all records and indices. Use search-only or restricted API keys for client-side use, and never expose admin keys in frontend code.

`sec-034` — Datadog API and Application Keys

Severity: 🟠 High | Category: Secret Detection | Confidence threshold: 65% | Platforms: All

Detects Datadog API keys and application keys for monitoring access

Remediation:

Revoke and regenerate keys in Datadog Organization Settings > API Keys. Datadog application keys have broad read/write access to metrics, logs, and monitors. Use scoped API keys and rotate them on a schedule.

`sec-036` — Consul ACL Token

Severity: 🟠 High | Category: Secret Detection | Confidence threshold: 65% | Platforms: All

Detects HashiCorp Consul access control list tokens

Remediation:

Revoke the Consul ACL token via the Consul API or UI. Consul tokens control access to service discovery and KV store. Rotate bootstrap tokens immediately and use scoped service tokens for application access.

`sec-039` — Hardcoded JWT Token

Severity: 🟠 High | Category: Secret Detection | Confidence threshold: 55% | Platforms: All

Detects live JWT tokens hardcoded in source code

Remediation:

JWTs contain identity and authorization claims. A hardcoded JWT is valid until it expires or the signing key is rotated. Identify the issuer from the decoded payload, revoke or invalidate the token if possible, and rotate the JWT signing secret/key immediately.

`sec-041` — Generic Secret in URL Query Parameter

Severity: 🟠 High | Category: Secret Detection | Confidence threshold: 70% | Platforms: All

Detects secrets embedded directly in URL query strings

Remediation:

Never pass secrets as URL query parameters. They are logged by web servers, proxies, and browsers. Use HTTP Authorization headers or POST body instead. Rotate any exposed secrets immediately.

`sec-045` — Shopify Access Token

Severity: 🟠 High | Category: Secret Detection | Confidence threshold: 65% | Platforms: All

Detects Shopify private app and OAuth access tokens

Remediation:

Revoke the Shopify token in Partners > Apps or in the store admin under Apps > App and sales channel settings. Shopify tokens can read/write orders, customers, and inventory. Rotate immediately if exposed.

`sec-046` — Okta API Token

Severity: 🟠 High | Category: Secret Detection | Confidence threshold: 65% | Platforms: All

Detects Okta identity platform API tokens

Remediation:

Revoke the Okta API token in Security > API > Tokens. Okta tokens with admin privileges can manage users and applications. Use OAuth 2.0 service apps instead of SSWS tokens for non-human access.

`sec-048` — Elastic Cloud API Key

Severity: 🟠 High | Category: Secret Detection | Confidence threshold: 65% | Platforms: All

Detects Elasticsearch/Elastic Cloud API keys and credentials

Remediation:

Invalidate the API key via the Elasticsearch API: DELETE /_security/api_key with the key ID. Create replacement keys with minimal index privileges and source IP restrictions where possible.

`sec-052` — Pinecone API Key

Severity: 🟠 High | Category: Secret Detection | Confidence threshold: 65% | Platforms: All

Detects Pinecone vector database API keys

Remediation:

Revoke the Pinecone API key in the Pinecone console under API Keys. Pinecone keys can upsert, query, and delete vector embeddings. Rotate immediately and store replacements in environment variables or a secrets manager.

`sec-053` — Cohere API Key

Severity: 🟠 High | Category: Secret Detection | Confidence threshold: 65% | Platforms: All

Detects Cohere AI API keys for NLP model access

Remediation:

Revoke the Cohere API key in the Cohere dashboard under API Keys. API keys can be used to invoke paid LLM endpoints. Create a new key and store it exclusively in environment variables or a secrets manager.

`sec-054` — Hugging Face Token

Severity: 🟠 High | Category: Secret Detection | Confidence threshold: 60% | Platforms: All

Detects Hugging Face user access tokens for model hub and inference API

Remediation:

Revoke the token in Hugging Face Account Settings > Access Tokens. Tokens with write access can modify model repositories and datasets. Use read-only tokens for inference workloads and store in a secrets manager.

`sec-055` — Replicate API Token

Severity: 🟠 High | Category: Secret Detection | Confidence threshold: 65% | Platforms: All

Detects Replicate AI inference platform API tokens

Remediation:

Revoke the Replicate API token in Account Settings > API Tokens. Tokens can be used to run paid model predictions. Create a replacement and store in environment variables or a secrets manager.

`sec-015` — Mailchimp API Key

Severity: 🟡 Medium | Category: Secret Detection | Confidence threshold: 65% | Platforms: All

Detects Mailchimp marketing API keys

Remediation:

Revoke the Mailchimp API key in Account > Extras > API Keys. Create a new key with read-only access where possible and store in environment variables.

`sec-020` — Sentry DSN

Severity: 🟡 Medium | Category: Secret Detection | Confidence threshold: 70% | Platforms: All

Detects Sentry Data Source Names which expose project identifiers and can ingest events

Remediation:

Sentry DSNs are semi-public (client-side use is expected) but should not appear in server-side secret stores or allow event submission from untrusted sources. Enable rate limiting and trusted domain filtering in Sentry project settings. For server-side Sentry auth tokens, treat as high severity.

`sec-023` — Codecov Upload Token

Severity: 🟡 Medium | Category: Secret Detection | Confidence threshold: 65% | Platforms: All

Detects Codecov coverage upload tokens

Remediation:

Regenerate the Codecov token in repository settings. Codecov tokens can be used to upload falsified coverage reports; always store them as CI secrets.

`sec-026` — Discord Webhook URL

Severity: 🟡 Medium | Category: Secret Detection | Confidence threshold: 70% | Platforms: All

Detects Discord webhook URLs which allow posting messages to channels

Remediation:

Delete the webhook in Discord channel settings and recreate it. Discord webhooks can be used to spam channels or phish users. Never hardcode webhook URLs in client-side code or public repositories.

`sec-032` — Segment Write Key

Severity: 🟡 Medium | Category: Secret Detection | Confidence threshold: 65% | Platforms: All

Detects Segment analytics write keys

Remediation:

While write keys are designed for client-side use, server-side write keys should be stored in environment variables. Rotate in the Segment workspace Settings > Sources if you suspect server-side keys were leaked.

`sec-033` — Mixpanel Token and Secret

Severity: 🟡 Medium | Category: Secret Detection | Confidence threshold: 65% | Platforms: All

Detects Mixpanel project tokens and API secrets

Remediation:

Mixpanel project tokens are semi-public for ingestion but API secrets must be kept server-side. Rotate the API secret in Project Settings if exposed. Restrict data export access via Mixpanel service accounts.

`sec-040` — Generic API Key Assignment

Severity: 🟡 Medium | Category: Secret Detection | Confidence threshold: 75% | Platforms: All

Detects high-entropy strings assigned to variables named key, token, or secret

Remediation:

Replace hardcoded credentials with environment variable references. Rotate any exposed keys/tokens. Use a secrets manager such as HashiCorp Vault, AWS Secrets Manager, or your cloud provider’s equivalent.

`sec-042` — High-Entropy Hex String Assigned to Secret Variable

Severity: 🟡 Medium | Category: Secret Detection | Confidence threshold: 75% | Platforms: All

Detects 32+ character hex strings assigned to secret-sounding variable names

Remediation:

Even if these appear to be test values, they may be real secrets committed by mistake. Rotate any values that may have been used in production and move them to environment variables or a secrets manager.

`sec-043` — PagerDuty Integration Key

Severity: 🟡 Medium | Category: Secret Detection | Confidence threshold: 65% | Platforms: All

Detects PagerDuty service integration and API keys

Remediation:

Revoke the PagerDuty key in Integrations > API Access Keys. Leaked integration keys can trigger or silence incidents. Generate minimal-permission API keys and store them in a secrets manager.

`sec-044` — Zendesk API Token

Severity: 🟡 Medium | Category: Secret Detection | Confidence threshold: 65% | Platforms: All

Detects Zendesk support platform API tokens

Remediation:

Revoke the Zendesk API token in Settings > Apps and Integrations > Zendesk API. Zendesk tokens can access ticket data and customer PII. Rotate and store securely in a secrets manager.

`sec-047` — Atlassian API Token

Severity: 🟡 Medium | Category: Secret Detection | Confidence threshold: 65% | Platforms: All

Detects Atlassian (Jira/Confluence) API tokens

Remediation:

Revoke the Atlassian API token in Account Settings > Security > API tokens. These tokens authenticate as your user account. Generate tokens with the minimum required permissions and store them in a secrets manager.

`sec-049` — Airtable API Key

Severity: 🟡 Medium | Category: Secret Detection | Confidence threshold: 65% | Platforms: All

Detects Airtable personal access tokens and legacy API keys

Remediation:

Revoke the Airtable personal access token in Account > Developer hub > PATs. Create replacement tokens scoped to specific bases and operations. Airtable keys can read and modify all base data in scope.

`sec-050` — Linear API Key

Severity: 🟡 Medium | Category: Secret Detection | Confidence threshold: 65% | Platforms: All

Detects Linear project management API keys

Remediation:

Revoke the Linear API key in Settings > API > Personal API Keys. Create a replacement key and store it in a secrets manager or CI/CD secrets.

`sec-051` — Notion Integration Token

Severity: 🟡 Medium | Category: Secret Detection | Confidence threshold: 65% | Platforms: All

Detects Notion internal integration tokens

Remediation:

Revoke the Notion integration token in Settings & Members > Integrations. Create a replacement token and limit its access to only the required pages and databases.

`sec-057` — Pusher Application Secret

Severity: 🟡 Medium | Category: Secret Detection | Confidence threshold: 65% | Platforms: All

Detects Pusher real-time API application secrets

Remediation:

Rotate the Pusher app secret in the Pusher dashboard under App Keys. The app secret is used to sign webhook payloads and authenticate server-side publishing. Store in environment variables only.

`sec-058` — Amplitude API Key and Secret

Severity: 🟡 Medium | Category: Secret Detection | Confidence threshold: 65% | Platforms: All

Detects Amplitude analytics API keys and secret keys

Remediation:

Rotate keys in Amplitude under Settings > Projects. The secret key is required for server-side event ingestion and export APIs. Store in a secrets manager and use the API key for client-side tracking only.

`sec-059` — Mapbox Access Token

Severity: 🟡 Medium | Category: Secret Detection | Confidence threshold: 65% | Platforms: All

Detects Mapbox public and secret access tokens

Remediation:

Rotate the Mapbox token in Account > Access Tokens. Secret tokens should never appear in client-side code. Public tokens should be URL-restricted in Mapbox account settings to prevent unauthorized tile requests.

`sec-060` — Intercom Access Token

Severity: 🟡 Medium | Category: Secret Detection | Confidence threshold: 65% | Platforms: All

Detects Intercom customer messaging platform access tokens

Remediation:

Revoke the Intercom access token in Settings > Developers > Access tokens. Intercom tokens can read customer conversations and user data. Store in a secrets manager and scope to the minimum required permissions.

Supply Chain

ID	Name	Severity	Confidence	Platforms
`adv-014`	Self-Replacing Code via writeFileSync	🔴 Critical	50%	All
`supply-001`	Known Malicious NPM Package	🔴 Critical	40%	All
`supply-005`	Known Malicious Python Package	🔴 Critical	40%	All
`supply-006`	Known Malicious NPM Package (Extended)	🔴 Critical	40%	All
`supply-007`	Known Malicious Python Package (Extended)	🔴 Critical	40%	All
`supply-009`	SANDWORM_MODE NPM Worm Packages	🔴 Critical	40%	All
`supply-010`	SANDWORM Git Hook Persistence	🔴 Critical	50%	All
`supply-011`	Vulnerable mcp-remote Package (CVE-2025-6514)	🔴 Critical	50%	mcp, claude, cursor
`yara-004`	Package.json Hijacking	🔴 Critical	40%	All
`supply-002`	NPM Typosquatting Pattern	🟠 High	50%	All
`supply-004`	Dangerous Postinstall Script	🟠 High	50%	All
`supply-008`	Common Typosquatting Heuristics	🟠 High	50%	All
`supply-003`	Overly Permissive Version Range	🟡 Medium	50%	All

Rule Details

`adv-014` — Self-Replacing Code via writeFileSync

Severity: 🔴 Critical | Category: Supply Chain | Confidence threshold: 50% | Platforms: All

Detects code that overwrites its own source files with remote content — rug pull attack

Remediation:

Code that writes to its own source files (especially index.js in __dirname) is a rug pull attack. Version 1.0 is clean, but the auto-update mechanism replaces the source with whatever a remote server returns. Pin dependencies and use lockfiles to prevent silent code replacement.

`supply-001` — Known Malicious NPM Package

Severity: 🔴 Critical | Category: Supply Chain | Confidence threshold: 40% | Platforms: All

Dependency on an npm package known to be malicious or compromised

Remediation:

This dependency has a known security incident. Check if you’re using a patched version or find an alternative package.

`supply-005` — Known Malicious Python Package

Severity: 🔴 Critical | Category: Supply Chain | Confidence threshold: 40% | Platforms: All

Dependency on a Python package known to be malicious

Remediation:

This Python package is known to be malicious. Remove it immediately and audit your system.

`supply-006` — Known Malicious NPM Package (Extended)

Severity: 🔴 Critical | Category: Supply Chain | Confidence threshold: 40% | Platforms: All

Dependency on an npm package known to be malicious or compromised (extended list)

Remediation:

This package is known to be malicious or compromised. Remove it immediately and use the legitimate version.

`supply-007` — Known Malicious Python Package (Extended)

Severity: 🔴 Critical | Category: Supply Chain | Confidence threshold: 40% | Platforms: All

Dependency on a Python package known to be malicious (extended list)

Remediation:

This Python package is known to be malicious. Remove it immediately and audit your system.

`supply-009` — SANDWORM_MODE NPM Worm Packages

Severity: 🔴 Critical | Category: Supply Chain | Confidence threshold: 40% | Platforms: All

Detects typosquatted npm packages from the SANDWORM_MODE worm campaign (Feb 2026) targeting AI coding tools

Remediation:

This package is part of the SANDWORM_MODE npm worm campaign (Feb 2026) that targets AI coding tools. It performs multi-stage attacks: credential harvest, MCP injection, git hook persistence, and self-propagation via npm publish. Remove immediately and audit your system.

References:

https://socket.dev/blog/sandworm-mode-ai-worm

`supply-010` — SANDWORM Git Hook Persistence

Severity: 🔴 Critical | Category: Supply Chain | Confidence threshold: 50% | Platforms: All

Detects git template directory manipulation used by the SANDWORM_MODE worm for persistence across new git repos

Remediation:

Modifying global git template directories or hooks paths is a persistence technique. The SANDWORM worm uses this to inject malicious hooks into every new git repo. Inspect and restore your git config: git config —global —unset init.templateDir

`supply-011` — Vulnerable mcp-remote Package (CVE-2025-6514)

Severity: 🔴 Critical | Category: Supply Chain | Confidence threshold: 50% | Platforms: mcp, claude, cursor

Detects mcp-remote versions 0.0.5-0.1.15 with critical RCE vulnerability (CVSS 9.6)

Remediation:

mcp-remote versions 0.0.5 through 0.1.15 have a critical RCE vulnerability (CVE-2025-6514, CVSS 9.6) allowing arbitrary OS command execution. Upgrade immediately to >= 0.1.16.

References:

CVE-2025-6514

`yara-004` — Package.json Hijacking

Severity: 🔴 Critical | Category: Supply Chain | Confidence threshold: 40% | Platforms: All

Detects preinstall/postinstall scripts with encoded or obfuscated payloads

Remediation:

No remediation guidance available.

`supply-002` — NPM Typosquatting Pattern

Severity: 🟠 High | Category: Supply Chain | Confidence threshold: 50% | Platforms: All

Dependency name appears to be a typosquat of a popular package

Remediation:

Verify the package name is correct. Typosquatting is a common supply chain attack vector.

`supply-004` — Dangerous Postinstall Script

Severity: 🟠 High | Category: Supply Chain | Confidence threshold: 50% | Platforms: All

Package runs scripts during installation that download or execute external code

Remediation:

Inspect install scripts before running. Use —ignore-scripts flag with npm install for untrusted packages.

`supply-008` — Common Typosquatting Heuristics

Severity: 🟠 High | Category: Supply Chain | Confidence threshold: 50% | Platforms: All

Detects common typosquatting patterns of popular packages

Remediation:

Verify the package name is correct. This appears to be a typosquat of a popular package.

`supply-003` — Overly Permissive Version Range

Severity: 🟡 Medium | Category: Supply Chain | Confidence threshold: 50% | Platforms: All

Dependencies use wildcard or overly permissive version ranges

Remediation:

Use exact versions or semver ranges with upper bounds (e.g., ^1.2.3 or ~1.2.3). Never use * or latest in production.

Suspicious Behavior

ID	Name	Severity	Confidence	Platforms
`sus-007`	Keylogging Patterns	🔴 Critical	85%	All
`sus-009`	Data Wiping Patterns	🔴 Critical	85%	All
`sus-010`	Reverse Shell Patterns	🔴 Critical	90%	All
`aaa-001`	Scheduled Task Injection	🟠 High	55%	All
`aaa-002`	Unrestricted Resource Consumption	🟠 High	50%	All
`adv-002`	Function Constructor Code Execution	🟠 High	55%	All
`adv-005`	Dynamic Module Require via String Concatenation	🟠 High	60%	All
`sus-003`	Anti-Debugging Techniques	🟠 High	80%	All
`sus-005`	Persistence Mechanisms	🟠 High	80%	All
`sus-006`	Cryptocurrency Mining Indicators	🟠 High	80%	All
`sus-008`	Camera/Microphone Access	🟠 High	80%	All
`sus-013`	Self-Modification	🟠 High	80%	All
`sus-016`	Python Dangerous Execution with Dynamic Input	🟠 High	65%	crewai, autogpt, mcp
`sus-001`	Obfuscated Code Detection	🟡 Medium	70%	All
`sus-002`	Dynamic Code Execution	🟡 Medium	70%	All
`sus-004`	Network Reconnaissance	🟡 Medium	75%	All
`sus-011`	Timestomping	🟡 Medium	75%	All
`sus-012`	Unusual File Locations	🟡 Medium	70%	All
`sus-014`	Abnormal Process Spawning	🟡 Medium	70%	All
`sus-015`	Encoding Without Clear Purpose	🟢 Low	60%	All

Rule Details

`sus-007` — Keylogging Patterns

Severity: 🔴 Critical | Category: Suspicious Behavior | Confidence threshold: 85% | Platforms: All

Detects keylogging or input capture patterns

Remediation:

Keylogging is highly malicious. This should never be present in an AI agent.

`sus-009` — Data Wiping Patterns

Severity: 🔴 Critical | Category: Suspicious Behavior | Confidence threshold: 85% | Platforms: All

Detects patterns that could wipe data

Remediation:

Data wiping commands are extremely dangerous. These should never be in an AI agent.

`sus-010` — Reverse Shell Patterns

Severity: 🔴 Critical | Category: Suspicious Behavior | Confidence threshold: 90% | Platforms: All

Detects reverse shell creation patterns

Remediation:

Reverse shells are highly malicious. This is a critical security threat.

`aaa-001` — Scheduled Task Injection

Severity: 🟠 High | Category: Suspicious Behavior | Confidence threshold: 55% | Platforms: All

Detects cron jobs, heartbeat configs, or scheduled tasks that can be created or modified by agent tools, enabling persistent autonomous loops

Remediation:

Scheduled tasks and heartbeat configurations must not be modifiable by agent tools or external inputs. Implement rate limits and maximum execution counts for recurring tasks. Require owner approval for any new scheduled task registration.

References:

Agents of Chaos (arXiv:2602.20021) — CS4: Heartbeat/cron injection enabled 9-day infinite resource loop
MITRE ATLAS AML.T0040

`aaa-002` — Unrestricted Resource Consumption

Severity: 🟠 High | Category: Suspicious Behavior | Confidence threshold: 50% | Platforms: All

Detects agent configurations missing rate limits, token limits, or execution timeouts, enabling denial-of-service and runaway cost attacks

Remediation:

All agent tool invocations must have explicit rate limits, token budgets, and execution timeouts. Implement circuit breakers for agent-to-agent relay patterns. Set maximum iteration counts for loops and recursive tool calls.

References:

Agents of Chaos (arXiv:2602.20021) — CS4: Mutual relay loop lasting ~1 hour
Agents of Chaos — CS5: Mass email flooding
OWASP LLM04 (Denial of Service)

`adv-002` — Function Constructor Code Execution

Severity: 🟠 High | Category: Suspicious Behavior | Confidence threshold: 55% | Platforms: All

Detects new Function() used to execute dynamically constructed code — indirect eval that bypasses eval() detection

Remediation:

new Function() is equivalent to eval() and executes arbitrary code. This is commonly used to evade static scanners that only detect direct eval() calls. Never construct functions from untrusted or dynamic strings.

`adv-005` — Dynamic Module Require via String Concatenation

Severity: 🟠 High | Category: Suspicious Behavior | Confidence threshold: 60% | Platforms: All

Detects require() with string concatenation to hide the actual module being loaded

Remediation:

Using string concatenation inside require() (e.g., require(‘node:’ + ‘http’)) hides the actual module being imported from static analysis. This is a common evasion technique to bypass module import detection rules.

`sus-003` — Anti-Debugging Techniques

Severity: 🟠 High | Category: Suspicious Behavior | Confidence threshold: 80% | Platforms: All

Detects attempts to detect or evade debugging

Remediation:

Anti-debugging techniques indicate the code may be trying to hide malicious behavior.

`sus-005` — Persistence Mechanisms

Severity: 🟠 High | Category: Suspicious Behavior | Confidence threshold: 80% | Platforms: All

Detects attempts to establish persistence

Remediation:

Persistence mechanisms should not be created by AI agents. Remove these patterns.

`sus-006` — Cryptocurrency Mining Indicators

Severity: 🟠 High | Category: Suspicious Behavior | Confidence threshold: 80% | Platforms: All

Detects potential cryptocurrency mining code

Remediation:

Cryptocurrency mining should never be present in AI agent code.

`sus-008` — Camera/Microphone Access

Severity: 🟠 High | Category: Suspicious Behavior | Confidence threshold: 80% | Platforms: All

Detects attempts to access camera or microphone

Remediation:

Camera and microphone access requires explicit user consent. Review this carefully.

`sus-013` — Self-Modification

Severity: 🟠 High | Category: Suspicious Behavior | Confidence threshold: 80% | Platforms: All

Detects code that modifies itself

Remediation:

Self-modifying code is suspicious and may be used to hide malicious payloads.

`sus-016` — Python Dangerous Execution with Dynamic Input

Severity: 🟠 High | Category: Suspicious Behavior | Confidence threshold: 65% | Platforms: crewai, autogpt, mcp

Detects dangerous Python execution with user-controlled or dynamic input

Remediation:

Avoid these functions in AI agent code. Use safe alternatives like ast.literal_eval() and yaml.safe_load().

`sus-001` — Obfuscated Code Detection

Severity: 🟡 Medium | Category: Suspicious Behavior | Confidence threshold: 70% | Platforms: All

Detects heavily obfuscated or encoded code

Remediation:

Heavily obfuscated code is suspicious. Deobfuscate and review the actual behavior.

`sus-002` — Dynamic Code Execution

Severity: 🟡 Medium | Category: Suspicious Behavior | Confidence threshold: 70% | Platforms: All

Detects dynamic code execution with user-controlled or variable input

Remediation:

Dynamic code execution can hide malicious behavior. Review the executed code carefully.

`sus-004` — Network Reconnaissance

Severity: 🟡 Medium | Category: Suspicious Behavior | Confidence threshold: 75% | Platforms: All

Detects network scanning or reconnaissance patterns

Remediation:

Network reconnaissance should not be performed by AI agents without explicit permission.

`sus-011` — Timestomping

Severity: 🟡 Medium | Category: Suspicious Behavior | Confidence threshold: 75% | Platforms: All

Detects file timestamp manipulation

Remediation:

Timestamp manipulation is often used to hide malicious activity. Review carefully.

`sus-012` — Unusual File Locations

Severity: 🟡 Medium | Category: Suspicious Behavior | Confidence threshold: 70% | Platforms: All

Detects operations in unusual file locations

Remediation:

Hidden files in unusual locations may indicate attempts to hide malicious activity.

`sus-014` — Abnormal Process Spawning

Severity: 🟡 Medium | Category: Suspicious Behavior | Confidence threshold: 70% | Platforms: All

Detects suspicious process creation patterns

Remediation:

Detached background processes may indicate persistence attempts. Review carefully.

`sus-015` — Encoding Without Clear Purpose

Severity: 🟢 Low | Category: Suspicious Behavior | Confidence threshold: 60% | Platforms: All

Detects unnecessary encoding or weak encryption

Remediation:

Unnecessary encoding or weak encryption may be used to obfuscate malicious code.

third-party-content

ID	Name	Severity	Confidence	Platforms
`tpc-001`	Email Content Ingestion	🟠 High	50%	All
`tpc-002`	Chat Message Ingestion	🟠 High	50%	All
`tpc-003`	Social Media Content Ingestion	🟠 High	50%	All
`tpc-005`	GitHub Content Ingestion	🟠 High	50%	All
`tpc-006`	Registry/Marketplace Content Ingestion	🟠 High	50%	All
`tpc-004`	Web Content Fetch	🟡 Medium	50%	All

Rule Details

`tpc-001` — Email Content Ingestion

Severity: 🟠 High | Category: third-party-content | Confidence threshold: 50% | Platforms: All

Skill reads email content (IMAP, Gmail, Outlook) — untrusted third-party content enters agent context

Remediation:

Email content is untrusted third-party content. Agents processing email bodies are exposed to indirect prompt injection. Consider: content sanitization, untrusted-content wrapping, or operator approval before acting on email content.

`tpc-002` — Chat Message Ingestion

Severity: 🟠 High | Category: third-party-content | Confidence threshold: 50% | Platforms: All

Skill reads chat messages (Discord, Slack, WhatsApp, iMessage, Telegram) — untrusted content

Remediation:

Chat messages are untrusted third-party content. Any contact can send crafted messages containing prompt injection payloads.

`tpc-003` — Social Media Content Ingestion

Severity: 🟠 High | Category: third-party-content | Confidence threshold: 50% | Platforms: All

Skill reads social media content (Twitter/X, Reddit, HN) — untrusted user-generated content

Remediation:

Social media content is untrusted. Posts, comments, and threads can contain prompt injection payloads targeting the agent.

`tpc-005` — GitHub Content Ingestion

Severity: 🟠 High | Category: third-party-content | Confidence threshold: 50% | Platforms: All

Skill reads GitHub issues, PRs, or comments — user-generated content enters agent context

Remediation:

GitHub issues and PR bodies are user-generated content. Attackers can craft issues with embedded prompt injection instructions.

`tpc-006` — Registry/Marketplace Content Ingestion

Severity: 🟠 High | Category: third-party-content | Confidence threshold: 50% | Platforms: All

Skill installs or reads content from skill registries or marketplaces — untrusted code

Remediation:

Skills from registries are untrusted third-party code. Install with verification. Community content can contain prompt injection.

`tpc-004` — Web Content Fetch

Severity: 🟡 Medium | Category: third-party-content | Confidence threshold: 50% | Platforms: All

Skill fetches and processes arbitrary web page content — untrusted external content

Remediation:

Web content from arbitrary URLs is untrusted. A malicious page can contain prompt injection targeting agents that process its content.

Tool Poisoning

ID	Name	Severity	Confidence	Platforms
`tp-001`	Hidden Instructions in Tool Descriptions	🔴 Critical	50%	All
`tp-004`	MCP Server Config Injection	🔴 Critical	50%	All
`tp-006`	Homoglyph Characters in Tool Names	🔴 Critical	70%	All
`adv-003`	Hidden Directive in HTML Comment	🟠 High	55%	All
`adv-008`	Tool Shadowing of AI Agent Built-ins	🟠 High	55%	All
`adv-010`	Cross-Tool Instruction in Tool Description	🟠 High	55%	All
`tp-002`	Prompt Override in Tool Description	🟠 High	55%	All
`tp-003`	Tool Shadowing via Known Trusted Names	🟠 High	55%	All
`tp-007`	Base64-Encoded Payload in Tool Description	🟠 High	65%	All
`tp-008`	Tool Name Shadows Common System Commands	🟠 High	60%	mcp, claude, codex, cursor
`tp-009`	Hidden Markdown or HTML Directives in Tool Descriptions	🟠 High	60%	All
`tp-011`	Cursor MCPoison — MCP Config in Git Repository	🟠 High	50%	cursor, codex, mcp
`tp-005`	Suspicious Sensitive Parameters in Tool Definitions	🟡 Medium	60%	All
`tp-010`	Tool Description Length Anomaly	🟡 Medium	50%	All
`tp-012`	MCP Sampling Attack Vector	🟡 Medium	55%	mcp, claude, cursor

Rule Details

`tp-001` — Hidden Instructions in Tool Descriptions

Severity: 🔴 Critical | Category: Tool Poisoning | Confidence threshold: 50% | Platforms: All

Detects invisible Unicode characters and HTML comments used to hide malicious instructions inside tool or function descriptions

Remediation:

Remove all invisible Unicode characters and HTML comments from tool descriptions. These are used by attackers to smuggle hidden instructions that are processed by AI agents but invisible to human reviewers. Audit any tool description that was fetched from an external or untrusted source.

`tp-004` — MCP Server Config Injection

Severity: 🔴 Critical | Category: Tool Poisoning | Confidence threshold: 50% | Platforms: All

Detects code that writes to MCP configuration files or dynamically adds server entries, which can silently register malicious tools

Remediation:

Code must not write to MCP configuration files at runtime. MCP server registration is an administrative action that should only happen through official, user-approved configuration channels. Dynamic modification of MCP configs is a primary attack vector for silently registering malicious tool servers. Remove any code that constructs or writes mcpServers entries programmatically.

`tp-006` — Homoglyph Characters in Tool Names

Severity: 🔴 Critical | Category: Tool Poisoning | Confidence threshold: 70% | Platforms: All

Detects visually deceptive Unicode characters mixed with Latin text in tool names — homoglyph attacks that impersonate legitimate tools

Remediation:

Tool names containing mixed-script homoglyphs are a visual deception attack. An attacker registers a tool whose name looks identical to a trusted tool but uses different Unicode codepoints. Validate that all tool names contain only standard ASCII characters (U+0020-U+007E). Reject any tool with non-ASCII identifiers.

References:

https://owasp.org/www-project-top-10-for-large-language-model-applications/ LLM02 Insecure Output
https://atlas.mitre.org/techniques/AML.T0043

`adv-003` — Hidden Directive in HTML Comment

Severity: 🟠 High | Category: Tool Poisoning | Confidence threshold: 55% | Platforms: All

Detects HTML comments containing data collection directives like endpoint URLs or include-env flags

Remediation:

HTML comments containing data collection directives are a tool poisoning vector. Attackers embed hidden configuration that instructs the tool to exfiltrate data. Tool descriptions and skill files should not contain HTML comments with directives.

`adv-008` — Tool Shadowing of AI Agent Built-ins

Severity: 🟠 High | Category: Tool Poisoning | Confidence threshold: 55% | Platforms: All

Detects MCP tools registered with names matching AI agent built-in tools (Read, Bash, Edit, Write) — lower confidence to unverified for deep scan triage

Remediation:

A tool registered with the exact name of an AI agent built-in (Read, Bash, Edit, Write) is a tool shadowing attack. The malicious tool intercepts operations intended for the legitimate built-in and can exfiltrate all data passed through it. Tool names must be unique and must not collide with agent built-in names.

`adv-010` — Cross-Tool Instruction in Tool Description

Severity: 🟠 High | Category: Tool Poisoning | Confidence threshold: 55% | Platforms: All

Detects tool descriptions that instruct the AI to pipe output to another tool — cross-server data exfiltration chain

Remediation:

Tool descriptions should describe the tool’s purpose, not instruct the AI to chain calls to other tools. This is a cross-server orchestration attack where a legitimate tool’s description directs the AI to pipe sensitive data to a second tool controlled by the attacker.

`tp-002` — Prompt Override in Tool Description

Severity: 🟠 High | Category: Tool Poisoning | Confidence threshold: 55% | Platforms: All

Detects prompt injection language embedded in tool descriptions or metadata that attempts to override AI instructions

Remediation:

Remove all prompt injection language from tool descriptions and metadata. Tool descriptions should only describe the tool’s legitimate purpose and parameters. Any text attempting to override AI instructions is a tool poisoning attack. Validate all tool descriptions fetched from external MCP servers before use.

`tp-003` — Tool Shadowing via Known Trusted Names

Severity: 🟠 High | Category: Tool Poisoning | Confidence threshold: 55% | Platforms: All

Detects tool registrations that use the names of well-known trusted tools to hijack AI behavior

Remediation:

A tool is being registered under a name that matches a well-known trusted tool. This is a classic tool shadowing attack: a malicious MCP server registers a tool with an identical name to intercept calls intended for the legitimate tool. Audit the source of this tool registration and verify the server’s identity before use.

`tp-007` — Base64-Encoded Payload in Tool Description

Severity: 🟠 High | Category: Tool Poisoning | Confidence threshold: 65% | Platforms: All

Detects base64-encoded content with decode operations or data URIs in tool descriptions, which may hide malicious instructions

Remediation:

Tool descriptions must contain only human-readable text describing the tool’s legitimate purpose. Base64-encoded content in descriptions is used to smuggle hidden instructions that are decoded and executed by the AI agent. Remove all encoded payloads and fetch tool descriptions only from trusted sources.

References:

https://owasp.org/www-project-top-10-for-large-language-model-applications/ LLM01 Prompt Injection
https://atlas.mitre.org/techniques/AML.T0043

`tp-008` — Tool Name Shadows Common System Commands

Severity: 🟠 High | Category: Tool Poisoning | Confidence threshold: 60% | Platforms: mcp, claude, codex, cursor

Detects tool registrations using names of common system commands (ls, cat, curl, wget, bash) to intercept agent shell operations

Remediation:

A tool with the same name as a system command is a tool shadowing attack. The malicious tool intercepts calls intended for the legitimate system command. Tool names must be unique, namespaced (e.g., vendor-toolname), and must not collide with system command names or other registered tools.

References:

https://owasp.org/www-project-top-10-for-large-language-model-applications/ LLM05 Supply Chain
https://atlas.mitre.org/techniques/AML.T0043

`tp-009` — Hidden Markdown or HTML Directives in Tool Descriptions

Severity: 🟠 High | Category: Tool Poisoning | Confidence threshold: 60% | Platforms: All

Detects dangerous HTML elements, suspicious HTML comments, and malicious markdown links in tool descriptions

Remediation:

Tool descriptions must be plain text only. HTML, Markdown with active links, and CSS styles embedded in descriptions are used to hide instructions from human reviewers while remaining visible to AI agents parsing the raw text. Strip all HTML/Markdown formatting from tool descriptions before display.

References:

https://owasp.org/www-project-top-10-for-large-language-model-applications/ LLM01 Prompt Injection
https://atlas.mitre.org/techniques/AML.T0043

`tp-011` — Cursor MCPoison — MCP Config in Git Repository

Severity: 🟠 High | Category: Tool Poisoning | Confidence threshold: 50% | Platforms: cursor, codex, mcp

Detects .cursor/mcp.json or .vscode/mcp.json files committed to a git repository — CVE-2025-54136. Attackers commit benign configs then silently modify them to backdoor.

Remediation:

MCP configuration files (.cursor/mcp.json, .vscode/mcp.json) should not be committed to repositories. CVE-2025-54136 demonstrated that attackers commit benign configs, then silently modify server entries to backdoor the development environment. Add these files to .gitignore and use user-level MCP configuration instead.

References:

CVE-2025-54136

`tp-005` — Suspicious Sensitive Parameters in Tool Definitions

Severity: 🟡 Medium | Category: Tool Poisoning | Confidence threshold: 60% | Platforms: All

Detects tool parameter definitions that request sensitive credentials, keys, or secrets from the user

Remediation:

Tool parameter definitions must not request passwords, tokens, API keys, or private keys. Legitimate tools access credentials through secure environment variables or secrets managers, never by asking the user (or the AI agent) to supply them as tool arguments. A tool that requires credentials as parameters is likely a credential-harvesting attack.

`tp-010` — Tool Description Length Anomaly

Severity: 🟡 Medium | Category: Tool Poisoning | Confidence threshold: 50% | Platforms: All

Detects abnormally long tool descriptions (>5000 characters) which strongly suggest hidden content or embedded instructions

Remediation:

Legitimate tool descriptions are concise (typically under 500 characters). Descriptions over 5000 characters almost always indicate hidden content: invisible text, encoded payloads, or injected instructions. Cap tool description length at 1000 characters and reject over-length descriptions.

References:

https://owasp.org/www-project-top-10-for-large-language-model-applications/ LLM01 Prompt Injection
https://atlas.mitre.org/techniques/AML.T0043

`tp-012` — MCP Sampling Attack Vector

Severity: 🟡 Medium | Category: Tool Poisoning | Confidence threshold: 55% | Platforms: mcp, claude, cursor

Detects MCP servers declaring sampling capability, which enables reverse prompt injection by allowing the server to request the AI generate content

Remediation:

MCP servers with sampling capability can request the AI to generate content, creating a reverse injection channel. The server crafts prompts that manipulate the AI into executing actions the user did not intend. Only grant sampling capability to fully trusted MCP servers. Audit what the server sends via sampling requests.

References:

https://unit42.paloaltonetworks.com/mcp-security-risks/

unsupervised-execution

ID	Name	Severity	Confidence	Platforms
`kc-004`	Autonomous Agent Spawn Without Oversight	🟠 High	55%	All
`uex-001`	Background Agent Spawning	🟠 High	50%	All
`uex-002`	Persistent Daemon / Service	🟠 High	50%	All
`uex-003`	Multi-Agent Orchestration	🟡 Medium	50%	All

Rule Details

`kc-004` — Autonomous Agent Spawn Without Oversight

Severity: 🟠 High | Category: unsupervised-execution | Confidence threshold: 55% | Platforms: All

Detects spawning of detached/autonomous agent processes that bypass approval mechanisms

Remediation:

Spawning autonomous agent processes with approval bypass enables uncontrolled execution. Always require human-in-the-loop approval for agent tool invocations.

`uex-001` — Background Agent Spawning

Severity: 🟠 High | Category: unsupervised-execution | Confidence threshold: 50% | Platforms: All

Spawns agents or processes in the background without per-action human oversight

Remediation:

Background agents run without per-action human oversight. If compromised, they can execute arbitrary actions until manually stopped. Require periodic check-ins or approval gates for long-running background agents.

`uex-002` — Persistent Daemon / Service

Severity: 🟠 High | Category: unsupervised-execution | Confidence threshold: 50% | Platforms: All

Installs a persistent daemon or system service that runs autonomously

Remediation:

Persistent daemons run indefinitely without human oversight. Combined with agent capabilities, a compromised daemon can continuously process and act on untrusted input.

`uex-003` — Multi-Agent Orchestration

Severity: 🟡 Medium | Category: unsupervised-execution | Confidence threshold: 50% | Platforms: All

Spawns multiple parallel agents with tool access — amplified blast radius

Remediation:

Multi-agent orchestration amplifies the blast radius of any single compromise. If one agent is compromised, it can influence outputs consumed by other agents.

Built-in Rules

Summary

Access Control

Rule Details

ac-002 — Authentication Bypass Patterns

ac-003 — JWT None Algorithm or Weak Signing

ac-001 — API Key or Token in URL Query Parameter

Agent Memory Poisoning

Rule Details

kc-006 — SpAIware Persistent Memory Injection

mem-003 — Agent Config File Modification

aci-002 — Agent Memory Injection via External Write

mem-001 — Agent Memory File Write

mem-002 — Session/Conversation File Access

mem-005 — Copilot Instructions Manipulation

mem-006 — OpenAI Agents Memory Manipulation

mem-007 — Aider Agent Config Manipulation

mem-008 — Memory Injection via Instruction-Like Content (MINJA)

mem-004 — Time-Delayed Execution

mem-009 — Inter-Session Message Without Provenance

credential-extraction

Rule Details

cex-001 — Browser Cookie Extraction

cex-002 — Password Manager Access

cex-003 — Credential File Enumeration

Credential Harvesting

Rule Details

cred-002 — SSH Private Key Access

cred-005 — Browser Cookie/Credential Access

cred-006 — Keychain/Credential Manager Access

cred-015 — Container Environment Variable Theft

cred-018 — Python Subprocess Credential Theft

cred-020 — Service Role Keys in MCP Config

cred-001 — AWS Credentials Access

cred-003 — GCP Service Account Key

cred-007 — Git Credentials Access

cred-008 — NPM Token Access

cred-009 — Docker Credentials Access

cred-010 — Kubernetes Credentials Access

cred-011 — API Key in Config

cred-012 — Azure CLI Credentials Access

cred-013 — AWS SSO Token Cache Access

cred-014 — Vault Token File Access

cred-016 — Python Pathlib Credential Access

cred-017 — Python Open Credential File

cred-019 — API Base URL Override for Key Exfiltration

kc-011 — Environment Variable Serialization to File

kc-012 — Credential Staging to Temp File

adv-004 — Credential Path via path.join or homedir

cred-004 — Environment Variable Harvesting

cross-agent-propagation

Rule Details

kc-007 — Cross-Repo Agent Propagation

mat-002 — Missing Authority Verification

mat-001 — Cross-Agent Trust Without Verification

Data Exfiltration

Rule Details

adv-001 — Process Environment in HTTP Body

exfil-011 — Cloud Metadata Service Access (IMDS/SSRF)

adv-006 — Base64-Decoded Network Hostname

adv-015 — Suspicious MCP Server Environment Variables

exfil-001 — Suspicious External HTTP Request

exfil-003 — File Upload to External Service

exfil-004 — DNS Exfiltration Pattern

exfil-006 — Screenshot Capture

exfil-008 — Archive Creation Before Upload

exfil-012 — WebSocket Exfiltration

kc-008 — DNS Exfiltration via Encoded Subdomain

kc-009 — Render-Based Data Exfiltration

kc-010 — Clipboard Content Exfiltration

adv-012 — String Concatenation URL Construction

exfil-002 — Base64 Encoded Data Transmission

exfil-005 — Clipboard Data Access

exfil-007 — Bulk File Read Pattern

exfil-009 — Webhook Data Transmission

exfil-010 — Email Data Transmission

File System Abuse

Rule Details

fs-003 — System Account File Access

fs-005 — Kernel Memory Access

`ac-002` — Authentication Bypass Patterns

`ac-003` — JWT None Algorithm or Weak Signing

`ac-001` — API Key or Token in URL Query Parameter

`kc-006` — SpAIware Persistent Memory Injection

`mem-003` — Agent Config File Modification

`aci-002` — Agent Memory Injection via External Write

`mem-001` — Agent Memory File Write

`mem-002` — Session/Conversation File Access

`mem-005` — Copilot Instructions Manipulation

`mem-006` — OpenAI Agents Memory Manipulation

`mem-007` — Aider Agent Config Manipulation

`mem-008` — Memory Injection via Instruction-Like Content (MINJA)

`mem-004` — Time-Delayed Execution

`mem-009` — Inter-Session Message Without Provenance

`cex-001` — Browser Cookie Extraction

`cex-002` — Password Manager Access

`cex-003` — Credential File Enumeration

`cred-002` — SSH Private Key Access

`cred-005` — Browser Cookie/Credential Access

`cred-006` — Keychain/Credential Manager Access

`cred-015` — Container Environment Variable Theft

`cred-018` — Python Subprocess Credential Theft

`cred-020` — Service Role Keys in MCP Config

`cred-001` — AWS Credentials Access

`cred-003` — GCP Service Account Key

`cred-007` — Git Credentials Access

`cred-008` — NPM Token Access

`cred-009` — Docker Credentials Access

`cred-010` — Kubernetes Credentials Access

`cred-011` — API Key in Config

`cred-012` — Azure CLI Credentials Access

`cred-013` — AWS SSO Token Cache Access

`cred-014` — Vault Token File Access

`cred-016` — Python Pathlib Credential Access

`cred-017` — Python Open Credential File

`cred-019` — API Base URL Override for Key Exfiltration

`kc-011` — Environment Variable Serialization to File

`kc-012` — Credential Staging to Temp File

`adv-004` — Credential Path via path.join or homedir

`cred-004` — Environment Variable Harvesting

`kc-007` — Cross-Repo Agent Propagation

`mat-002` — Missing Authority Verification

`mat-001` — Cross-Agent Trust Without Verification

`adv-001` — Process Environment in HTTP Body

`exfil-011` — Cloud Metadata Service Access (IMDS/SSRF)

`adv-006` — Base64-Decoded Network Hostname

`adv-015` — Suspicious MCP Server Environment Variables

`exfil-001` — Suspicious External HTTP Request

`exfil-003` — File Upload to External Service

`exfil-004` — DNS Exfiltration Pattern

`exfil-006` — Screenshot Capture

`exfil-008` — Archive Creation Before Upload

`exfil-012` — WebSocket Exfiltration

`kc-008` — DNS Exfiltration via Encoded Subdomain

`kc-009` — Render-Based Data Exfiltration

`kc-010` — Clipboard Content Exfiltration

`adv-012` — String Concatenation URL Construction

`exfil-002` — Base64 Encoded Data Transmission

`exfil-005` — Clipboard Data Access

`exfil-007` — Bulk File Read Pattern

`exfil-009` — Webhook Data Transmission

`exfil-010` — Email Data Transmission

`fs-003` — System Account File Access

`fs-005` — Kernel Memory Access

`fs-008` — Temp Directory Code Execution

`fs-010` — Recursive Directory Deletion

`fs-001` — /proc Filesystem Enumeration

`fs-002` — System Log Manipulation

`fs-004` — Symlink Attack

`fs-007` — Symlink Attack to Sensitive Files

`fs-009` — Audit Log Manipulation

`fs-011` — Config Include Path Traversal

`fs-012` — Local File Path in Media URL Parameter

`fs-006` — Insecure File Permissions

`aci-001` — Agent Identity File Tampering

`ic-002` — SSL/TLS Verification Disabled

`ic-004` — Claude Code RCE via Malicious Hooks

`kc-005` — MCP Config File Injection

`adv-007` — Wildcard Permission in Skill Definition

`ic-003` — Default or Hardcoded Credentials in Config Files