Open-Source AI Red Teaming

Find what your
AI guardrails
miss.

32 attack modules. Genetic prompt evolution.
Multi-turn exploitation. Test GPT-4, Claude,
Gemini, Grok, and any LLM.

32Modules
9Providers
15Mutations
5Formats
basilisk scan

$ basilisk scan -t https://api.target.com/chat

[*] Basilisk v1.1.0

[*] Recon: Fingerprinting target...

[+] Model: GPT-4o (OpenAI)

[*] Guardrails: 6/8 blocked

[*] SPE-NL: Evolving — Gen 1/5...

[!] System prompt extracted

[!] Multi-turn drift: 0.89

[+] Breakthrough — fitness 0.94

[+] 9 findings (3C, 4H, 2M)

Works with every LLM provider

OpenAIGPT-4o
AnthropicClaude
GoogleGemini
xAIGrok
GroqFast
AzureOpenAI
GitHubModels
OllamaLocal
CustomHTTP/WS

Capabilities

Everything you need to red team AI systems.

Smart Prompt Evolution

Genetic algorithm that evolves attack payloads across generations. 15 mutation operators, 5 crossover strategies, and multi-signal fitness scoring.

32 Attack Modules

Full OWASP LLM Top 10 coverage across 8 categories — injection, extraction, exfiltration, tool abuse, guardrail bypass, DoS, multi-turn, and RAG.

Multi-Turn Attacks

Prompt cultivation, authority escalation, and sycophancy exploitation. Social engineer LLMs over entire conversations with guardrail drift monitoring.

Reconnaissance

Fingerprint GPT-4, Claude, Gemini, Grok, and Llama. Profile guardrails, discover tools, measure context windows, and detect RAG pipelines.

9 LLM Providers

OpenAI, Anthropic, Google, xAI Grok, Groq, Azure, GitHub Models, Ollama, and any custom HTTP or WebSocket endpoint.

5 Report Formats

HTML with conversation replay, SARIF 2.1.0 for CI/CD, JSON for automation, Markdown for docs, and PDF for client deliverables.

Desktop App

Electron GUI with real-time scan visualization, module browser, session replay, and one-click export. Windows, macOS, Linux.

CI/CD Integration

GitHub Actions and GitLab CI with SARIF output. Fail pipelines on critical findings. Baseline regression detection.

Native Extensions

15 mutation operators in Go. Token analysis and entropy in C. Aho-Corasick pattern matching. Full ctypes bridge with Python fallbacks.

Attack Modules

See Basilisk In Action

Real attack scenarios demonstrating how Basilisk discovers AI vulnerabilities.

basilisk --console
● LIVE
$ basilisk scan -t https://api.target.com/chat -p openai --mode quick

[*] Basilisk v1.1.0 — Quick Scan Mode
[*] Recon: Fingerprinting target...
[+] Model: GPT-4o (OpenAI) | Context: 128K tokens
[*] Running top 50 payloads per module (no evolution)
[!] CRITICAL: Direct prompt injection succeeded
[!] HIGH: System prompt extracted via translation trick
[+] Scan complete in 47s. 4 findings.
Basilisk Engine v1.1.0● Ready
Quick Start

Up and running in 30 seconds

Install via pip, set your API key, scan. That's it.

terminal

$ pip install basilisk-ai

Installing... done ✓

$ export OPENAI_API_KEY="sk-..."

$ basilisk scan -t https://api.target.com/chat

[+] 7 findings (2 Critical, 3 High, 2 Medium)

Frequently Asked Questions

Common questions about using Basilisk for AI security testing.

Why You Need Automated AI Security

Large Language Models power critical applications — from customer service chatbots to financial advisors and healthcare assistants. These AI systems are vulnerable to prompt injection, system prompt extraction, data exfiltration, multi-turn social engineering, and guardrail bypass attacks.

Basilisk is the first open-source AI red teaming framework that uses genetic prompt evolution to discover these vulnerabilities. Unlike static testing tools, Basilisk's SPE-NL engine mutates attack payloads based on how the target responds — evolving increasingly effective attacks across generations.

Multi-Turn LLM Exploitation

Real-world attackers don't send a single jailbreak prompt. They build trust over multiple conversation turns, gradually eroding guardrails until the model complies. Basilisk v1.1.0 automates this with prompt cultivation, authority escalation, and sycophancy exploitation — three specialist modules that detect and exploit guardrail drift across multi-turn conversations.

32Attack Modules
10OWASP Categories
15Mutation Operators
9LLM Providers
5Report Formats
3Multi-Turn Modules

Built for Security Professionals

Whether you're a penetration tester, bug bounty hunter, AI engineer, or security team lead, Basilisk integrates into your existing workflow. Test OpenAI GPT-4, Anthropic Claude, Google Gemini, xAI Grok, Groq, and any custom endpoint. Export findings as SARIF for CI/CD, HTML for stakeholders, or JSON for automation.