Automatically discover adversarial failures in AI APIs before users exploit them.
Built for the DigitalOcean Gradient AI Hackathon ๐
Adversarial testing platform powered by DigitalOcean Gradientโข AI for attack generation, deep vulnerability analysis, iterative refinement, and developer-friendly fix suggestions.
๐ Live app: https://shadowlab-h9yu6.ondigitalocean.app/
(Gradient AI is pre-configured โ scans use AI-generated attacks and AI-powered analysis)
# 1. Clone and configure
git clone https://github.com/prabhakaran-jm/shadowlab-ai.git
cd shadowlab-ai
# 2. Backend: set GRADIENT_MODEL_ACCESS_KEY in backend/.env (see docs/GRADIENT_SETUP.md)
cd backend
pip install -r requirements.txt
cp .env.example .env
# Edit .env: GRADIENT_MODEL_ACCESS_KEY=<your key>, ALLOW_LOCALHOST_TARGET=1 for local demo
uvicorn app.main:app --reload# 3. Frontend (new terminal)
cd frontend
npm install
npm run dev4. Open http://localhost:3000
5. Try the mock vulnerable API:
- API Endpoint:
http://localhost:8000/mock-vulnerable-api - Ensure
ALLOW_LOCALHOST_TARGET=1inbackend/.env - Click Start Scan โ youโll see failures, a lower safety score, and AI-generated fix suggestions
AI APIs often fail under adversarial or edge-case inputs. Common failure modes include:
| Risk | Description |
|---|---|
| Prompt injection | Users override system instructions or inject malicious prompts |
| System prompt leakage | Internal instructions or system prompts exposed in responses |
| Policy bypass | Guardrails circumvented via hypotheticals, roleplay, or phrasing |
| Edge-case inputs | Unexpected or malformed inputs that trigger unsafe behavior |
Developers lack tools to proactively test these vulnerabilities before they are exploited in production.
ShadowLab is an adversarial testing platform that:
- Generates adversarial prompts via DigitalOcean Gradient AI or a curated seed set (15 payloads)
- Runs automated red-team scans against HTTP AI APIs (POST with JSON:
messageor OpenAI-stylemessages) - Detects vulnerabilities using heuristic rules and Gradient AIโpowered deep analysis on every response
- Iteratively refines attacks โ when a target defends successfully, Gradient generates follow-up bypass attempts
- Suggests fixes with developer-friendly remediation (including Gradient-generated suggestions)
- Computes a safety score (0โ100) so you can track and compare API robustness over time
ShadowLab uses DigitalOcean Gradientโข AI in four distinct ways:
| Use case | Model | What it does |
|---|---|---|
| Attack generation | GPT-OSS-20B | Generates targeted adversarial prompts from the target API description |
| Vulnerability detection | Llama 3.3 70B | Analyzes every API response for security failures (paraphrased leakage, roleplay compliance, tone shifts) |
| Attack refinement | GPT-OSS-20B | Generates follow-up attacks that bypass the targetโs specific defenses (adaptive multi-round testing) |
| Fix suggestions | Llama 3.3 70B | Provides developer-friendly remediation for each finding |
This two-model design optimizes performance and cost. Without a Model Access Key, the app falls back to seed attacks and heuristic-only judging.
๐ Setup: docs/GRADIENT_SETUP.md for GRADIENT_MODEL_ACCESS_KEY and optional overrides.
| Component | Description |
|---|---|
| Frontend | Next.js dashboard โ scan form, Gradient status indicator, security report, filterable results table |
| Backend | FastAPI scan engine โ /scan, /scan/demo, /gradient/status, health check |
| DigitalOcean Gradient AI | GPT-OSS-20B (prompt generation + refinement); Llama 3.3 70B (vulnerability detection + fix suggestions) |
| Attack generator | Gradient AI when GRADIENT_MODEL_ACCESS_KEY is set; else 15 seed attacks (JSON) |
| Target runner | POST with message or OpenAI-style messages body; returns response text for judging |
| Response judge | Two-layer: heuristic rules + Gradient AI deep analysis (either can flag a failure) |
| Iterative refinement | When some attacks pass, Gradient generates targeted follow-up attacks (up to 2 rounds) |
| Safety scoring | 0โ100; only failed tests reduce the score |
| Persistence | SQLite-backed storage for targets and recent reports (survives restarts) |
| Deployment | DigitalOcean App Platform (optional); storage: DigitalOcean Spaces (optional) |
- Attack generation โ Gradient AIโgenerated or seed-based (15 payloads: prompt injection, system prompt extraction, policy bypass, encoding bypass, multi-language, and more)
- AI-powered detection โ Gradient AI analyzes all responses, not only heuristic matches (subtle leakage, compliance, tone)
- Iterative refinement โ Multi-round adaptive testing that learns from the targetโs defenses
- Safety score โ 0โ100 derived from severity of findings
- Developer-friendly fix recommendations โ Actionable suggestions (including Gradient AIโenhanced)
- Security report dashboard โ Summary, vulnerability counts, filterable/sortable results table, recommended fixes
- Gradient connectivity indicator โ Real-time badge showing whether Gradient AI is connected
- Honest loading state โ Real scan progress with status indicator (no fake logs)
- Persistent data โ Targets and recent reports stored in SQLite (not in-memory only)
- Target URL guard โ Private and localhost URLs rejected unless
ALLOW_LOCALHOST_TARGET=1(for local demo)
cd backend && pip install -r requirements.txt && cp .env.example .env && uvicorn app.main:app --reload
# In another terminal:
cd frontend && npm install && npm run devThen open http://localhost:3000.
- Optional: Set
GRADIENT_MODEL_ACCESS_KEY(orGRADIENT_API_KEY) inbackend/.envfor Gradient AI โ docs/GRADIENT_SETUP.md - Local mock demo: Set
ALLOW_LOCALHOST_TARGET=1inbackend/.envand use targethttp://localhost:8000/mock-vulnerable-api
cd backend
pip install -r requirements.txt
cp .env.example .env # then set GRADIENT_MODEL_ACCESS_KEY or GRADIENT_API_KEY for Gradient AI
uvicorn app.main:app --reload- API: http://localhost:8000
- Docs: http://localhost:8000/docs
Target URL guard: Private and localhost URLs are rejected unless ALLOW_LOCALHOST_TARGET=1 (use for the mock-vulnerable-api demo).
cd frontend
npm install
npm run dev- Dashboard: http://localhost:3000
- Production: Set
NEXT_PUBLIC_API_URLto your backend URL. - Production build locally:
npm run buildthennpm start(port 3000 unlessPORTis set).
Backend (pytest):
cd backend
pip install -r requirements.txt
pytest tests/ -v
# CI: pytest tests/ -v --timeout=10Frontend (Jest + React Testing Library):
cd frontend
npm install
npm run test- Push this repo to GitHub and connect it in the Apps dashboard (or use
doctl apps create --spec .do/app.yamlafter setting your repo in.do/app.yaml). - Add two services: backend (source dir
backend, runsh run.sh, port 8080) and frontend (source dirfrontend,npm run build/npm start, port 8080). - Backend env:
CORS_ORIGINS= your frontend Live URL; optionallyGRADIENT_MODEL_ACCESS_KEY. - Frontend env:
NEXT_PUBLIC_API_URL= your backend Live URL, then redeploy the frontend.
๐ Full guide: docs/DEPLOYMENT.md
| Doc | Description |
|---|---|
| docs/GRADIENT_SETUP.md | Gradient AI API key and model configuration |
| docs/DEPLOYMENT.md | DigitalOcean App Platform deployment |
| Layer | Technologies |
|---|---|
| Frontend | Next.js, TypeScript, Tailwind CSS |
| Backend | Python, FastAPI, Pydantic, httpx |
| AI | DigitalOcean Gradientโข AI (GPT-OSS-20B, Llama 3.3 70B) |
| Storage | SQLite (targets + reports) |
| Deploy | DigitalOcean App Platform (optional) |
- Start the stack โ run backend (
uvicorn) and frontend (npm run dev). - Open the dashboard โ go to the frontend URL (e.g.
http://localhost:3000). - Check Gradient status โ scan form shows whether Gradient AI is connected.
- Enter a target โ API endpoint URL and optional target description.
- Start scan โ click Start Scan; status indicator shows real progress. With Gradient configured, attacks are generated by DigitalOcean Gradient AI and responses are analyzed by Gradient AI.
- View report โ Safety Score (with round count if refinement ran), vulnerability summary, filterable results table, and recommended fixes.
- Filter findings โ use severity filter (All / High / Medium / Low) to focus on issues.
- Optional โ try
GET /scan/demofor a quick scan against a mock endpoint.
Built for the DigitalOcean Gradient AI Hackathon. The app integrates Gradient AI for attack generation, deep vulnerability detection, iterative refinement, and fix suggestions as described above.
Before submitting:
- Demo video: https://youtu.be/kv1Ye9RTBD8
- Live demo: https://shadowlab-h9yu6.ondigitalocean.app/
For judges: Set GRADIENT_MODEL_ACCESS_KEY in the backend (docs/GRADIENT_SETUP.md) so scans use Gradient. Without it, the app uses seed attacks and heuristic-only judging; the report should show "Adversarial attacks generated by DigitalOcean Gradientโข AI."
Note: Targets and reports are persisted in SQLite (backend/shadowlab.db by default; SHADOWLAB_DB_PATH). Up to 50 recent reports are retained automatically.
- Real-time attack streaming โ stream attack events and judge results as they complete
- CI/CD integration โ fail builds or block deploys when safety score or critical findings exceed thresholds
- Advanced adversarial mutation โ multi-generation attack evolution for broader coverage
- Comparative reporting โ track safety score trends across scan history
Built with โค๏ธ for the DigitalOcean Gradient AI Hackathon
Chaos engineering for AI APIs โ find vulnerabilities before attackers do.