A production-grade autonomous simulation platform that generates diverse user personas, executes multi-step journeys with chaos injection, detects anomalies across three analytical layers, and discovers novel edge cases — all orchestrated as a 12-agent LangGraph pipeline.
The system is organized into four planes, each containing specialized agents that communicate through a shared CampaignState.
| Persona Class | Behavior Profile | Typical Session Depth | API Preference |
|---|---|---|---|
POWER_USER |
Deep navigation, complex workflows, high feature coverage | 10+ steps | Mixed |
CASUAL_USER |
Shallow browsing, quick tasks, low error tolerance | 2-4 steps | UI only |
EDGE_CASE_USER |
Unusual inputs, rare paths, boundary exploration | 5-8 steps | Mixed |
ADVERSARIAL_ACTOR |
Injection attempts, auth bypass, rate limit testing | 6-12 steps | API heavy |
ACCESSIBILITY_USER |
Screen reader paths, keyboard-only navigation, high-contrast | 4-7 steps | UI only |
LEGACY_MIGRANT |
Outdated patterns, deprecated endpoints, version mismatches | 3-6 steps | API heavy |
CONCURRENCY_USER |
Parallel requests, race conditions, session overlap | 8-15 steps | API only |
Personas are generated via K-Means clustering on clickstream data (real or synthetic), then enriched by LLM synthesis per cluster, and finally mutated deterministically (Unicode injection, boundary inputs, session timing variants) — no LLM needed for mutations.
| # | Agent | Plane | Input | Output |
|---|---|---|---|---|
| 1 | CampaignPlannerAgent |
Control | Campaign spec | ExecutionPlan |
| 2 | PersonaGeneratorAgent |
Control | ExecutionPlan + clickstream | List[Persona] |
| 3 | VulnScanAgent |
Control | OpenAPI spec + prod metrics | VulnerabilityMap |
| 4 | JourneyGeneratorAgent |
Execution | Personas + VulnerabilityMap | List[UserJourney] |
| 5 | SimulatorAgent (x N) |
Execution | UserJourney | SimulationResult |
| 6 | ChaosAgent |
Execution | Journeys + VulnerabilityMap | List[chaos events] |
| 7 | ObserverAgent |
Observation | SimulationResults + traces | List[AnomalyEvent] (Layer 1) |
| 8 | AnomalyDetectorAgent |
Observation | Results + L1 anomalies | Enriched List[AnomalyEvent] (L2+L3) |
| 9 | EdgeCaseAgent |
Observation | Anomalies | List[EdgeCase] + pytest stubs |
| 10 | ResilienceScorerAgent |
Reporting | Results + EdgeCases + VulnMap | ResilienceScore (0-100) |
| 11 | ReportGeneratorAgent |
Reporting | Score + EdgeCases + Anomalies | CampaignReport |
| 12 | QABridgeAgent |
Reporting | EdgeCases + Report | Push to QA pipeline |
# Clone
git clone <repo-url>
cd simulation-chaos-lab
# Create virtual environment
python -m venv .venv
source .venv/bin/activate # Linux/macOS
.venv\Scripts\activate # Windows
# Install dependencies
pip install -r requirements.txt
# Install Playwright browsers (optional — for UI simulation)
playwright install chromiumCreate a .env file in the project root:
# LLM — required
OPENAI_API_KEY=voc-your-key-here
OPENAI_BASE_URL=https://openai.vocareum.com/v1
# External services — all optional (graceful fallback)
KAFKA_BOOTSTRAP_SERVERS=localhost:9092
DATABASE_URL=postgresql://user:pass@localhost/simulation_lab
REDIS_URL=redis://localhost:6379
# Chaos tuning
CHAOS_HALT_THRESHOLD=0.40
CHAOS_INTENSITY=medium
# Cost governance
MAX_LLM_SPEND_USD_PER_CAMPAIGN=10.0
# Simulation
MAX_CONCURRENT_SIMULATORS=10
TARGET_BASE_URL=http://localhost:3000
SLA_RESPONSE_TIME_MS=2000
# QA Bridge
QA_AGENT_API_URL=http://localhost:8001
ADIP_API_URL=http://localhost:8002python main.py --mockRuns the entire 12-agent pipeline with:
- Synthetic clickstream data for persona generation
- Mock OpenAPI spec + production metrics for vulnerability scanning
- Simulated HTTP responses (no real network calls)
- In-memory fallbacks for Kafka, PostgreSQL, Redis, pgvector
- Real LLM calls to GPT-4o (only external dependency)
# Start services
docker compose up -d # Kafka, PostgreSQL, Redis
python main.pypython main.py --server
# Server starts at http://localhost:8000
# Trigger a campaign
curl -X POST http://localhost:8000/api/campaigns \
-H "Content-Type: application/json" \
-d '{
"target_workflows": ["login", "checkout", "search"],
"chaos_intensity": "medium",
"max_llm_spend_usd": 5.0
}'
# Health check
curl http://localhost:8000/api/healthpython main.py --mock --output report.jsonA real campaign execution proceeds as follows:
$ python main.py --mock
[INFO] Starting simulation campaign (mock=True, model=gpt-4o)
# 1. CONTROL PLANE
[INFO] LLM call by campaign_planner | spent: $0.00 / $10.00
[INFO] LLM call by campaign_planner complete | cost: $0.03
[INFO] LLM call by persona_generator x3 clusters | cost: $0.09
[INFO] Generated 6 personas across 3 clusters
[INFO] Static analysis flagged 7 endpoints
[INFO] Dynamic analysis flagged 7 endpoints
[INFO] LLM call by vuln_scan_agent | cost: $0.03
# 2. EXECUTION PLANE
[INFO] LLM call by journey_generator x6 journeys | cost: $0.18
[INFO] Generated 6 journeys
[INFO] Journey j1: CHAOS_TRIGGERED (6 steps, 24ms)
[INFO] Journey j2: CHAOS_TRIGGERED (5 steps, 33ms)
...
[INFO] Collected 6 simulation results
# 3. OBSERVATION PLANE
[INFO] Kafka stream processor initialized
[INFO] ObserverAgent identified 12 rule-based anomalies
[INFO] AnomalyDetectorAgent enriched 12 anomalies (3 statistical, 2 LLM semantic)
[INFO] EdgeCaseAgent discovered 4 novel edge cases
# 4. REPORTING PLANE
[INFO] ResilienceScorerAgent calculated score: 85/100
[INFO] ReportGeneratorAgent created campaign report
[INFO] QABridgeAgent pushed 4 edge cases to QA pipeline
[INFO] Campaign completed in 120 seconds.
.github/ # GitHub Actions workflows
├── workflows/
│ └── ci.yml
├── docs/ # Documentation and diagrams
├── agents/ # Core agent implementations
│ ├── control/
│ │ ├── campaign_planner.py
│ │ ├── persona_generator.py
│ │ └── vuln_scan_agent.py
│ ├── execution/
│ │ ├── journey_generator.py
│ │ ├── simulator.py
│ │ └── chaos_agent.py
│ ├── observation/
│ │ ├── observer.py
│ │ ├── anomaly_detector.py
│ │ └── edge_case_agent.py
│ └── reporting/
│ ├── resilience_scorer.py
│ ├── report_generator.py
│ └── qa_bridge_agent.py
├── persona/
│ ├── clustering.py # K-Means + elbow method + PII scrubber
│ ├── models.py # PersonaPopulation + diversity scoring
│ └── mutator.py # Unicode/boundary/timing mutations
├── chaos/
│ ├── vulnerability_map.py # 3-stage OpenAPI + metrics analysis
│ ├── injection_engine.py # mitmproxy + MockChaosEngine fallback
│ └── budget_governor.py # Redis atomic counters + auto-halt
├── observation/
│ ├── stream_processor.py # Kafka consumer + mock list fallback
│ ├── anomaly_rules.py # Layer 1 rule definitions
│ └── edge_case_discovery.py # DBSCAN + novelty scoring
├── persistence/
│ ├── campaign_db.py # PostgreSQL/JSONB -> SQLite fallback
│ └── knowledge_base.py # pgvector -> numpy fallback
├── llm/
│ ├── prompts.py # 12 prompt templates (6 system/user pairs)
│ └── factory.py # LLM factory + CostGovernor wrapper
├── observability/
│ ├── metrics.py # Prometheus counters/histograms/gauges
│ └── tracing.py # OpenTelemetry setup
├── api/app.py # FastAPI (POST /api/campaigns, GET /health)
├── main.py # CLI: --mock, --server, --output
├── requirements.txt
└── tests/
├── unit/ # 24 unit tests
└── integration/ # 3 integration tests (full pipeline)
- Playwright UI simulation: execute browser-based journeys alongside API journeys
- Real Kafka integration: production trace ingestion with Avro schema registry
- Checkpoint replay: LangGraph checkpoint-based edge case reproduction validation
- Sentence-transformers embeddings: replace hash-based embeddings with SBERT for higher-fidelity novelty scoring
- Multi-campaign comparison: track resilience score trends across campaigns over time
- Grafana dashboard: pre-built dashboards consuming Prometheus metrics
- CI/CD integration: GitHub Actions workflow running
--mockpipeline on every PR - Distributed simulation: Celery or Ray-based worker pool for large persona populations
- A/B chaos profiles: compare resilience under different chaos intensity configurations
- Auto-generated regression suite: export validated edge cases as permanent pytest fixtures
MIT
