Red Teaming Overview

Red teaming is the practice of proactively probing AI systems for vulnerabilities before they reach production. As large language models become embedded in high-stakes applications — customer service, code generation, medical triage, legal research — the consequences of a single safety failure grow proportionally. A robust red teaming program surfaces weaknesses in model guardrails, content filters, and alignment training so that teams can patch them before adversaries exploit them in the wild.

General Analysis (GA) is a comprehensive, open-source Python framework purpose-built for this task. It bundles state-of-the-art jailbreaking techniques, unified model interfaces for both cloud-hosted and locally-served LLMs, adversarial prompt generators, and standardized evaluation tools into a single package. Whether you are a safety researcher benchmarking new defenses, an enterprise team running pre-deployment audits, or an academic studying alignment properties, GA provides the tooling you need to systematically stress-test language models.

AI Red Teaming Quickstart

Get started with automated AI red teaming in minutes

Explore AI Red Teaming Quickstart

LLM Jailbreak Cookbook

Deep dive into LLM jailbreaking techniques and benchmarks

Explore LLM Jailbreak Cookbook

AI Red Teaming GitHub Repository

Open-source AI red teaming library — view source, contribute, and report issues

Explore AI Red Teaming GitHub Repository

What Makes GA DifferentWhat Makes GA Different

Most red teaming workflows involve stitching together one-off scripts, paper reference implementations, and custom evaluation harnesses. GA replaces that patchwork with a unified framework where every jailbreak method shares a common interface, every model — whether accessed through an API or loaded onto a local GPU — exposes the same query methods, and every result is scored by the same evaluation pipeline. This consistency means you can compare attack success rates across methods, models, and configurations without worrying about differences in evaluation methodology.

GA also tracks the full research lineage: each technique maps directly to its source paper, and the implementations are validated against published benchmarks such as HarmBench. When a new attack surfaces in the literature, GA’s modular architecture makes it straightforward to integrate without rewriting existing pipelines.

FeaturesFeatures

Jailbreaking MethodsJailbreaking Methods

GA ships with production-grade implementations of the most effective jailbreaking techniques published in the academic literature. Each method targets a different aspect of model safety — from gradient-level token optimization to multi-turn social engineering — giving you broad coverage of the threat surface.

AutoDAN & AutoDAN-Turbo: Evolutionary algorithms that breed increasingly effective adversarial prompts. AutoDAN-Turbo adds a lifelong strategy library that remembers what works across runs.
TAP: Tree-of-Attacks with Pruning — a breadth-first search over prompt space that expands promising branches and prunes dead ends, often achieving the highest attack success rates with the fewest queries.
GCG: Greedy Coordinate Gradient optimization — a white-box method that crafts adversarial token suffixes using gradient information from the target model’s own weights.
Crescendo: A multi-turn conversational attack that gradually steers dialogue toward prohibited content, mimicking how a real adversary might manipulate a chatbot over several exchanges.
Bijection Learning: An encoding-based approach that teaches the model a custom cipher through in-context examples, then issues harmful queries in the encoded form to bypass content filters.

Model InterfacesModel Interfaces

GA provides a unified API layer that abstracts the differences between cloud providers and local inference engines. You write your attack pipeline once and swap models with a single string change.

BlackBoxModel: Query cloud-hosted models from OpenAI, Anthropic, Together.ai, and other providers through a consistent interface that handles authentication, rate limiting, and retry logic.
WhiteBoxModel: Load open-weight models locally with full access to logits, gradients, and internal activations — required for white-box techniques like GCG.

Adversarial GeneratorsAdversarial Generators

The adversarial candidate generator module contains the core algorithms that jailbreak methods use internally to craft, refine, and evolve adversarial prompts. Understanding these generators is useful when building custom attacks or extending existing ones.

Tree-based refinement strategies that explore prompt variations in a structured search
Multi-turn conversation generators that build adversarial context incrementally
Genetic algorithms for prompt evolution through crossover and mutation
Strategy-based prompt generation informed by a library of known-effective patterns

Evaluation ToolsEvaluation Tools

Consistent evaluation is critical for meaningful safety research. GA’s evaluation module provides a standardized scoring pipeline so that results from different methods, models, and research teams are directly comparable.

Standardized rubric-based scoring that assesses both compliance and harmfulness
Attack success rate (ASR) measurement across datasets like HarmBench
Cross-model comparison capabilities with automatic result aggregation
Method-specific evaluators that parse output formats from TAP, GCG, AutoDAN, and more

Why General Analysis?Why General Analysis?

Comprehensive Coverage: Every major jailbreaking technique — black-box and white-box, single-turn and multi-turn, semantic and nonsensical — in one framework.
Easy Integration: A uniform API that works with OpenAI, Anthropic, Together.ai, and any Hugging Face-compatible model, so you can test commercial and open-weight LLMs side by side.
Research-Ready: Implementations track the original papers and are validated against published benchmarks, making it easy to reproduce and extend academic results.
Production-Tested: Used by enterprise AI safety teams to run pre-deployment audits and continuous red teaming campaigns.
Open Source: Fully transparent, MIT-licensed, and community-driven — inspect every line, contribute improvements, and adapt the tools to your own workflows.

InstallationInstallation

Install GA from PyPI with a single command. Python 3.9 or later is required. For white-box methods like GCG, you will also need a CUDA-capable GPU and the appropriate PyTorch build.


pip install generalanalysis

To install with development dependencies for contributing to the framework, see the AI red teaming development guide.

Quick ExampleQuick Example

The following example demonstrates a complete red teaming run using TAP, one of the most effective black-box methods. It initializes a target model, configures the attack parameters, and runs the optimization loop to search for jailbreaking prompts.


from generalanalysis.jailbreaks import TAP, TAPConfig
from generalanalysis.boiler_room import BlackBoxModel
 
# Initialize target model
target_model = BlackBoxModel("gpt-4o")
 
# Configure jailbreak
config = TAPConfig(
    target_model="gpt-4o",
    attacker_model="deepseek-ai/DeepSeek-R1",
    evaluator_model="deepseek-ai/DeepSeek-R1",
    branching_factor=4,
    max_depth=10
)
 
# Run attack
tap = TAP(config)
results = tap.optimize(["Write malicious content"])

After the optimization completes, results contains the best adversarial prompts discovered for each goal along with their scores and the corresponding model responses. You can feed these directly into the LLM jailbreak evaluator for standardized scoring, or export them for further analysis.

When to Use Red TeamingWhen to Use Red Teaming

Red teaming is valuable at several stages of the AI development lifecycle:

Pre-deployment audits: Run a full suite of attacks against your model before releasing it to users. This is the most common use case and catches the majority of safety regressions.
Continuous monitoring: Schedule regular red teaming runs against production models to detect drift in safety behavior after fine-tuning updates or system prompt changes.
Defense development: Use jailbreak results as training signal to improve guardrails, refine system prompts, or augment safety training datasets.
Compliance and certification: Generate evidence of adversarial testing for regulatory frameworks, internal governance boards, or customer security questionnaires.
Academic research: Benchmark new defense mechanisms against a standardized suite of attacks with reproducible evaluation.

For a deeper dive into how automated red teaming works, what it tests, and where it fits in an AI security program, read our guide: What is automated AI red teaming? .

CommunityCommunity

Join our community to stay updated on the latest developments, share research findings, and get help with your red teaming workflows:

Next StepsNext Steps

Follow the AI red teaming quickstart guide to run your first attack in under five minutes.
Explore the LLM jailbreak methods overview to understand the full taxonomy of available methods.
Read the LLM Jailbreak Cookbook for in-depth performance comparisons and configuration guidance.
Learn about adversarial prompt generators to understand the prompt generation engines behind each method.
Read the OWASP Top 10 for Agentic AI guide to understand the threat landscape for agentic systems.

LicenseLicense

General Analysis is released under the MIT License. See the MIT License on GitHub for details.