NVIDIA DGX Spark Hackathon Submission Autonomous, GPU-accelerated AI security that detects attacks, generates hardened code variants, and deploys winning candidates — all locally on DGX Spark.
- Overview
- Problem Statement
- Architecture
- NVIDIA Stack Usage
- NVIDIA Requirement Mapping
- Demo Instructions
- Benchmark Evidence
- Screenshot Checklist
- Disclaimer
ZeroWall is a multi-agent AI Moving Target Defense (MTD) system that runs entirely on NVIDIA DGX Spark.
When an attack is detected, ZeroWall:
- Generates 8–20 behavior-preserving code mutation candidates via a safe deterministic transformer
- Replays known exploits against each candidate in parallel
- Runs test suites to verify functional correctness is preserved
- Scores risk using a weighted confidence model (served via Triton)
- Deploys the winning variant and rolls back if post-deploy checks fail
The attack surface keeps moving. Attackers can't re-use the same exploit.
Static defenses fail against adaptive attackers. Patch cycles are slow. ZeroWall makes the defenders' codebase a moving target — automatically, continuously, and safely.
┌──────────────────────────────────────────────────┐
│ NVIDIA DGX Spark │
│ │
┌──────────┐ │ ┌─────────────────────────────────────────────┐ │
│ Attacker │──exploit──► │ ZeroWall Core Engine │ │
└──────────┘ │ │ │ │
│ │ ┌──────────────────────────────────────┐ │ │
┌──────────────┐ │ │ │ Defense Loop Orchestrator │ │ │
│ OpenClaw CLI │──────► │ │ │ (defense_loop.py) │ │ │
│ /defend │ │ │ │ 1. Mutation Agent ─────────────────┼──►│ │
│ /replay │ │ │ │ 2. Safe Transform Engine (libcst) │ │ │
│ /status │ │ │ │ 3. Verifier Agent (pytest+bandit) │ │ │
│ /benchmark │ │ │ │ 4. Exploit Agent (HTTP replay) │ │ │
│ /alert │ │ │ │ 5. Risk Agent ─────────────────────┼──►│ │
└──────────────┘ │ │ │ 6. Explanation Agent │ │ │
│ │ └──────────────────────────────────────┘ │ │
│ │ │ │
┌──────────────┐ │ │ ┌──────────────┐ ┌───────────────────┐ │ │
│ Streamlit │◄───────├──┤ │ Triton │ │ vLLM Server │ │ │
│ Dashboard │ │ │ │ Inference │ │ (local LLM │ │ │
└──────────────┘ │ │ │ Server │ │ reasoning) │ │ │
│ │ │ • mutation- │ │ │ │ │
┌──────────────┐ │ │ │ planner │ └───────────────────┘ │ │
│ RAPIDS │◄───────├──┤ │ • risk- │ │ │
│ Analytics │ │ │ │ scorer │ ┌───────────────────┐ │ │
│ (cuDF GPU) │ │ │ └──────────────┘ │ Deploy Controller│ │ │
└──────────────┘ │ │ │ (blue/green swap) │ │ │
│ │ └───────────────────┘ │ │
│ └─────────────────────────────────────────┘ │
│ │
│ ┌──────────────────────────────────────────┐ │
│ │ Target FastAPI App (attack surface) │ │
│ │ v1: vulnerable → vN: hardened │ │
│ └──────────────────────────────────────────┘ │
└──────────────────────────────────────────────────┘
What we use it for: Serving two GPU-accelerated model endpoints:
mutation-planner— selects transform types for each defense cyclerisk-scorer— scores candidate confidence with batched GPU inference
Evidence in code:
inference/triton-model-repo/mutation-planner/config.pbtxt+1/model.pyinference/triton-model-repo/risk-scorer/config.pbtxt+1/model.pyinference/clients/triton_client.py— all agent calls route through Triton HTTP API- Triton inference latency logged in telemetry and displayed on dashboard
Docker service: docker-compose.yml → service triton
What we use it for: Local GPU LLM inference for:
- Mutation Agent narrative reasoning
- Explanation Agent judge-facing summaries
Why vLLM and not TRT-LLM directly: vLLM ships with an OpenAI-compatible API that integrates in minutes. The system is designed with a thin client layer (inference/clients/vllm_client.py) so swapping to TRT-LLM requires only pointing the base URL at a TRT-LLM server — both expose the same API. On a production DGX Spark with available time, TRT-LLM provides higher token/s throughput.
Evidence in code:
inference/clients/vllm_client.pycore/agents/explanation_agent.py— calls vLLM for generation- Per-call latency logged and shown in benchmark output
Docker service: docker-compose.yml → service vllm
What we use it for: GPU-accelerated telemetry analytics:
- Exploit success rate before vs after defense cycles
- Defense cycle latency statistics (mean, p95)
- Candidate evaluation counts
- Rolling exploit rate trends for dashboard
- Inference latency breakdown per agent
Evidence in code:
core/telemetry/rapids_analytics.py— primary cuDF path with pandas fallback- Uses
cudf.DataFramefor all analytics operations on DGX Spark - Backend shown in dashboard ("cuDF-GPU" vs "pandas-CPU")
- Analytics output feeds real-time Streamlit charts
| Reason | Detail |
|---|---|
| Triton GPU models | Requires KIND_GPU instance groups — CUDA mandatory |
| vLLM LLM inference | Requires GPU; float16 models won't fit in CPU RAM |
| RAPIDS cuDF | CUDA 12+ required; no CPU fallback for real cuDF |
| Parallel agent execution | 8–20 candidates + parallel exploit replay saturates GPU |
| Benchmark mode | 50+ concurrent HTTP requests + defense cycle timing requires NVLINK bandwidth |
| Component | Role | Evidence |
|---|---|---|
| Triton Inference Server | Multi-model serving | inference/triton-model-repo/ |
| vLLM | Local LLM inference (TRT-LLM path) | inference/clients/vllm_client.py |
| RAPIDS cuDF | GPU DataFrame analytics | core/telemetry/rapids_analytics.py |
- Multi-agent pipeline: 5 specialized agents (Mutation, Exploit, Verifier, Risk, Explanation)
- Not a chatbot: No human in the loop during defense cycle
- Autonomous decision-making: Risk Agent recommends deploy/reject/rollback
- Safe code generation: Deterministic AST transforms controlled by AI model output
- Continuous adaptation: Each cycle produces a different hardened variant
# Clone and setup
cp .env.example .env
# Edit .env: set HF_TOKEN, VLLM_MODEL, GPU counts
# Install local deps (for running outside Docker)
pip install -r requirements.core.txt
cd apps/target-fastapi && pip install -r requirements.txt && cd ../..# Start all services
docker-compose up -d
# Wait for health checks (Triton takes ~30s)
docker-compose ps
# Run the demo flow
bash scripts/run_demo.sh
# Open dashboard
open http://localhost:8501# 1. Start target app (standalone, no Docker)
cd apps/target-fastapi
uvicorn main:app --port 8000 &
# 2. Verify normal request
curl http://localhost:8000/health
# 3. Seed exploit attack
bash scripts/seed_attack.sh
# 4. Open OpenClaw CLI (interactive)
python -m core.orchestrator.openclaw_cli interactive
# Then type: /simulate-alert
# Then type: /defend
# Then type: /status
# 5. Open dashboard
streamlit run dashboard/streamlit_app.py| Command | Description |
|---|---|
/defend |
Start defense cycle |
/replay |
Replay exploit against active version |
/status |
Show system status |
/benchmark |
Run burst benchmark |
/simulate-alert |
Inject mock IDS alert |
Run the benchmark to produce hard numbers:
bash scripts/run_benchmark.sh
# Or: python -m core.orchestrator.openclaw_cli benchmark --burst-size 50 --with-defenseOutput files:
artifacts/benchmark/benchmark_summary.jsonartifacts/benchmark/benchmark_summary.csv- Rich terminal table printed automatically
Expected metrics on DGX Spark:
| Metric | Expected (DGX Spark) | Notes |
|---|---|---|
| Mutation candidates / cycle | 10 | Configurable 8–20 |
| Defense cycle latency | 5–15s | Depends on test suite size |
| Exploit replays / cycle | 10×5 = 50 | 10 candidates × 5 payloads |
| Triton inference latency | <50ms | Per model call |
| vLLM inference latency | <2s | Explanation generation |
| Burst throughput | 100+ rps | With cuDF analytics |
| Exploit success rate delta | >80% reduction | Vulnerable → hardened |
For live demo evidence, capture and place in artifacts/:
-
artifacts/gpu_screenshot.png— DGX dashboard GPU utilization during benchmark -
artifacts/dashboard_screenshot.png— Streamlit dashboard with real data -
artifacts/benchmark_terminal.png— Rich terminal benchmark table -
artifacts/openclaw_defend.png— OpenClaw CLI defense cycle output -
artifacts/exploit_before.png— Exploit success before defense cycle -
artifacts/exploit_after.png— Exploit blocked after deployment
/ZeroWall
├── apps/target-fastapi/ # Vulnerable demo FastAPI app
│ ├── main.py # 3 simulated vulnerable endpoints
│ ├── test_app.py # 25+ unit tests (verifier uses these)
│ ├── requirements.txt
│ └── Dockerfile
├── core/
│ ├── agents/ # 5 ZeroWall agents
│ │ ├── mutation_agent.py # Generates 8–20 candidate plans
│ │ ├── exploit_agent.py # Replays known attack payloads
│ │ ├── verifier_agent.py # Runs pytest + bandit
│ │ ├── risk_agent.py # Scores + recommends action
│ │ └── explanation_agent.py # Judge-facing summaries
│ ├── transforms/ # Safe deterministic AST transforms
│ │ ├── base.py # Transform registry
│ │ ├── rename_identifiers.py
│ │ ├── reorder_blocks.py
│ │ ├── split_helpers.py
│ │ ├── swap_validators.py # PRIMARY security hardening transform
│ │ └── route_rotation.py
│ ├── orchestrator/
│ │ ├── defense_loop.py # Main multi-agent coordinator
│ │ └── openclaw_cli.py # OpenClaw command interface
│ ├── deploy/
│ │ └── controller.py # Blue/green deploy + rollback
│ ├── telemetry/
│ │ ├── collector.py # Event collection → JSONL
│ │ └── rapids_analytics.py # cuDF GPU analytics
│ └── benchmark/
│ └── burst_sim.py # Burst attack benchmark suite
├── inference/
│ ├── triton-model-repo/ # Triton model repository
│ │ ├── mutation-planner/ # Transform type selector model
│ │ └── risk-scorer/ # Candidate risk scoring model
│ └── clients/
│ ├── triton_client.py # Triton HTTP v2 client
│ └── vllm_client.py # vLLM OpenAI-compatible client
├── dashboard/
│ └── streamlit_app.py # Judge-facing metrics dashboard
├── scripts/
│ ├── run_demo.sh # End-to-end demo flow
│ ├── run_benchmark.sh # Benchmark runner
│ └── seed_attack.sh # Seed known exploit payloads
├── docker-compose.yml # All services: target, triton, vllm, core, dashboard
├── Dockerfile.core # ZeroWall core engine container
├── requirements.core.txt
└── .env.example
⚠️ HACKATHON SAFETY NOTICEZeroWall is built for safe, controlled demonstration purposes only.
- The "vulnerable" endpoints do NOT expose real system resources, execute real commands, or perform any harmful operations
- All simulated vulnerabilities are sandboxed in an in-memory dictionary
- Exploit payloads target ONLY the local demo FastAPI container — no external network probing
- No real offensive tooling is included in this project
- The deploy controller only modifies the local demo app source file
- All "exploits" are pre-defined, non-harmful HTTP requests that trigger simulated response patterns
This project demonstrates the architecture and decision-making pipeline of a moving target defense system. Real deployment would use actual vulnerability detection, but safety is the top priority for this demonstration.