Skip to content

prabhakaran-jm/shadowlab-ai

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

29 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

๐Ÿ›ก๏ธ ShadowLab โ€“ Chaos Engineering for AI APIs

Automatically discover adversarial failures in AI APIs before users exploit them.

Built for the DigitalOcean Gradient AI Hackathon ๐ŸŽ‰

Adversarial testing platform powered by DigitalOcean Gradientโ„ข AI for attack generation, deep vulnerability analysis, iterative refinement, and developer-friendly fix suggestions.

License Platform AI Stack


๐ŸŽฏ Quick Demo for Judges

๐ŸŒ Live app: https://shadowlab-h9yu6.ondigitalocean.app/
(Gradient AI is pre-configured โ€” scans use AI-generated attacks and AI-powered analysis)

Run locally with Gradient AI (โ‰ˆ2 min)

# 1. Clone and configure
git clone https://github.com/prabhakaran-jm/shadowlab-ai.git
cd shadowlab-ai

# 2. Backend: set GRADIENT_MODEL_ACCESS_KEY in backend/.env (see docs/GRADIENT_SETUP.md)
cd backend
pip install -r requirements.txt
cp .env.example .env
# Edit .env: GRADIENT_MODEL_ACCESS_KEY=<your key>, ALLOW_LOCALHOST_TARGET=1 for local demo
uvicorn app.main:app --reload
# 3. Frontend (new terminal)
cd frontend
npm install
npm run dev

4. Open http://localhost:3000

5. Try the mock vulnerable API:

  • API Endpoint: http://localhost:8000/mock-vulnerable-api
  • Ensure ALLOW_LOCALHOST_TARGET=1 in backend/.env
  • Click Start Scan โ€” youโ€™ll see failures, a lower safety score, and AI-generated fix suggestions

๐Ÿ”ฅ The Problem

AI APIs often fail under adversarial or edge-case inputs. Common failure modes include:

Risk Description
Prompt injection Users override system instructions or inject malicious prompts
System prompt leakage Internal instructions or system prompts exposed in responses
Policy bypass Guardrails circumvented via hypotheticals, roleplay, or phrasing
Edge-case inputs Unexpected or malformed inputs that trigger unsafe behavior

Developers lack tools to proactively test these vulnerabilities before they are exploited in production.


โœ… The Solution

ShadowLab is an adversarial testing platform that:

  • Generates adversarial prompts via DigitalOcean Gradient AI or a curated seed set (15 payloads)
  • Runs automated red-team scans against HTTP AI APIs (POST with JSON: message or OpenAI-style messages)
  • Detects vulnerabilities using heuristic rules and Gradient AIโ€“powered deep analysis on every response
  • Iteratively refines attacks โ€” when a target defends successfully, Gradient generates follow-up bypass attempts
  • Suggests fixes with developer-friendly remediation (including Gradient-generated suggestions)
  • Computes a safety score (0โ€“100) so you can track and compare API robustness over time

๐Ÿค– How Gradient AI Powers ShadowLab

ShadowLab uses DigitalOcean Gradientโ„ข AI in four distinct ways:

Use case Model What it does
Attack generation GPT-OSS-20B Generates targeted adversarial prompts from the target API description
Vulnerability detection Llama 3.3 70B Analyzes every API response for security failures (paraphrased leakage, roleplay compliance, tone shifts)
Attack refinement GPT-OSS-20B Generates follow-up attacks that bypass the targetโ€™s specific defenses (adaptive multi-round testing)
Fix suggestions Llama 3.3 70B Provides developer-friendly remediation for each finding

This two-model design optimizes performance and cost. Without a Model Access Key, the app falls back to seed attacks and heuristic-only judging.

๐Ÿ“– Setup: docs/GRADIENT_SETUP.md for GRADIENT_MODEL_ACCESS_KEY and optional overrides.


๐Ÿ—๏ธ Architecture

Component Description
Frontend Next.js dashboard โ€” scan form, Gradient status indicator, security report, filterable results table
Backend FastAPI scan engine โ€” /scan, /scan/demo, /gradient/status, health check
DigitalOcean Gradient AI GPT-OSS-20B (prompt generation + refinement); Llama 3.3 70B (vulnerability detection + fix suggestions)
Attack generator Gradient AI when GRADIENT_MODEL_ACCESS_KEY is set; else 15 seed attacks (JSON)
Target runner POST with message or OpenAI-style messages body; returns response text for judging
Response judge Two-layer: heuristic rules + Gradient AI deep analysis (either can flag a failure)
Iterative refinement When some attacks pass, Gradient generates targeted follow-up attacks (up to 2 rounds)
Safety scoring 0โ€“100; only failed tests reduce the score
Persistence SQLite-backed storage for targets and recent reports (survives restarts)
Deployment DigitalOcean App Platform (optional); storage: DigitalOcean Spaces (optional)

๐Ÿ“Š Key Features

๐ŸŽฏ Adversarial Testing

  • Attack generation โ€” Gradient AIโ€“generated or seed-based (15 payloads: prompt injection, system prompt extraction, policy bypass, encoding bypass, multi-language, and more)
  • AI-powered detection โ€” Gradient AI analyzes all responses, not only heuristic matches (subtle leakage, compliance, tone)
  • Iterative refinement โ€” Multi-round adaptive testing that learns from the targetโ€™s defenses

๐Ÿ“‹ Reporting & UX

  • Safety score โ€” 0โ€“100 derived from severity of findings
  • Developer-friendly fix recommendations โ€” Actionable suggestions (including Gradient AIโ€“enhanced)
  • Security report dashboard โ€” Summary, vulnerability counts, filterable/sortable results table, recommended fixes
  • Gradient connectivity indicator โ€” Real-time badge showing whether Gradient AI is connected
  • Honest loading state โ€” Real scan progress with status indicator (no fake logs)

๐Ÿ”ง Production-Ready

  • Persistent data โ€” Targets and recent reports stored in SQLite (not in-memory only)
  • Target URL guard โ€” Private and localhost URLs rejected unless ALLOW_LOCALHOST_TARGET=1 (for local demo)

๐Ÿš€ Quick Start (3 commands)

cd backend && pip install -r requirements.txt && cp .env.example .env && uvicorn app.main:app --reload
# In another terminal:
cd frontend && npm install && npm run dev

Then open http://localhost:3000.

  • Optional: Set GRADIENT_MODEL_ACCESS_KEY (or GRADIENT_API_KEY) in backend/.env for Gradient AI โ†’ docs/GRADIENT_SETUP.md
  • Local mock demo: Set ALLOW_LOCALHOST_TARGET=1 in backend/.env and use target http://localhost:8000/mock-vulnerable-api

๐Ÿ“– Running Locally

Backend

cd backend
pip install -r requirements.txt
cp .env.example .env   # then set GRADIENT_MODEL_ACCESS_KEY or GRADIENT_API_KEY for Gradient AI
uvicorn app.main:app --reload

Target URL guard: Private and localhost URLs are rejected unless ALLOW_LOCALHOST_TARGET=1 (use for the mock-vulnerable-api demo).

Frontend

cd frontend
npm install
npm run dev
  • Dashboard: http://localhost:3000
  • Production: Set NEXT_PUBLIC_API_URL to your backend URL.
  • Production build locally: npm run build then npm start (port 3000 unless PORT is set).

Tests

Backend (pytest):

cd backend
pip install -r requirements.txt
pytest tests/ -v
# CI: pytest tests/ -v --timeout=10

Frontend (Jest + React Testing Library):

cd frontend
npm install
npm run test

๐ŸŒ Deployment (DigitalOcean App Platform)

  1. Push this repo to GitHub and connect it in the Apps dashboard (or use doctl apps create --spec .do/app.yaml after setting your repo in .do/app.yaml).
  2. Add two services: backend (source dir backend, run sh run.sh, port 8080) and frontend (source dir frontend, npm run build / npm start, port 8080).
  3. Backend env: CORS_ORIGINS = your frontend Live URL; optionally GRADIENT_MODEL_ACCESS_KEY.
  4. Frontend env: NEXT_PUBLIC_API_URL = your backend Live URL, then redeploy the frontend.

๐Ÿ“– Full guide: docs/DEPLOYMENT.md


๐Ÿ“š Documentation

Doc Description
docs/GRADIENT_SETUP.md Gradient AI API key and model configuration
docs/DEPLOYMENT.md DigitalOcean App Platform deployment

๐Ÿ› ๏ธ Technology Stack

Layer Technologies
Frontend Next.js, TypeScript, Tailwind CSS
Backend Python, FastAPI, Pydantic, httpx
AI DigitalOcean Gradientโ„ข AI (GPT-OSS-20B, Llama 3.3 70B)
Storage SQLite (targets + reports)
Deploy DigitalOcean App Platform (optional)

๐ŸŽฌ Demo Flow

  1. Start the stack โ€” run backend (uvicorn) and frontend (npm run dev).
  2. Open the dashboard โ€” go to the frontend URL (e.g. http://localhost:3000).
  3. Check Gradient status โ€” scan form shows whether Gradient AI is connected.
  4. Enter a target โ€” API endpoint URL and optional target description.
  5. Start scan โ€” click Start Scan; status indicator shows real progress. With Gradient configured, attacks are generated by DigitalOcean Gradient AI and responses are analyzed by Gradient AI.
  6. View report โ€” Safety Score (with round count if refinement ran), vulnerability summary, filterable results table, and recommended fixes.
  7. Filter findings โ€” use severity filter (All / High / Medium / Low) to focus on issues.
  8. Optional โ€” try GET /scan/demo for a quick scan against a mock endpoint.

๐Ÿ† Hackathon Submission

Built for the DigitalOcean Gradient AI Hackathon. The app integrates Gradient AI for attack generation, deep vulnerability detection, iterative refinement, and fix suggestions as described above.

Before submitting:

For judges: Set GRADIENT_MODEL_ACCESS_KEY in the backend (docs/GRADIENT_SETUP.md) so scans use Gradient. Without it, the app uses seed attacks and heuristic-only judging; the report should show "Adversarial attacks generated by DigitalOcean Gradientโ„ข AI."

Note: Targets and reports are persisted in SQLite (backend/shadowlab.db by default; SHADOWLAB_DB_PATH). Up to 50 recent reports are retained automatically.


๐Ÿ”ฎ Future Improvements

  • Real-time attack streaming โ€” stream attack events and judge results as they complete
  • CI/CD integration โ€” fail builds or block deploys when safety score or critical findings exceed thresholds
  • Advanced adversarial mutation โ€” multi-generation attack evolution for broader coverage
  • Comparative reporting โ€” track safety score trends across scan history

Built with โค๏ธ for the DigitalOcean Gradient AI Hackathon

Chaos engineering for AI APIs โ€” find vulnerabilities before attackers do.

About

Chaos engineering for AI APIs. ShadowLab automatically red-teams AI endpoints with adversarial prompts to uncover hidden failures and suggest fixes.

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors