Open-source content moderation for group chats. Protects communities — especially those with minors — from violence, NSFW content, sexual violence, cyberbullying, and deepfakes.
Works with Telegram and WhatsApp. More platforms planned.
- What It Does
- Quick Start
- Deepfake Detection Setup
- How It Works (Under the Hood)
- Configuration Reference
- API Reference
- GDPR & Privacy
- Running Tests
- Roadmap
- Contributing
- License
A bot sits in your group chat. When someone sends a message, image, or video, the bot checks it using ML models and returns one of three verdicts:
| Verdict | What happens |
|---|---|
| allow | Nothing — content is safe |
| flag | Admins get notified for review |
| delete | Message is removed (or admins are pinged if bot lacks permissions) |
What are ML models? ML (machine learning) models are pre-trained programs that can understand content — like recognising violent images or toxic language. Deepfake Guardian downloads these models automatically the first time they're needed. They run locally on your server, so no message content is sent to external services (unless you choose a cloud-based deepfake provider).
Categories detected: violence, sexual violence, NSFW, cyberbullying, deepfakes
Who is this for?
- Schools and class chats
- Youth organisations and sports clubs
- Companies and teams
- Any community group that needs moderation
Use
MODERATION_PROFILE=minors_strictfor groups with children — it lowers all thresholds automatically.
- Docker (recommended) — or Python 3.11+ and Node.js 18+
- A Telegram bot token from @BotFather
- ~4 GB disk and ~4 GB RAM (models are downloaded on demand, see below)
# 1. Clone and copy config files
git clone https://github.com/richardkfm/deepfake-guardian.git
cd deepfake-guardian
cp engine/.env.example engine/.env
cp telegram-bot/.env.example telegram-bot/.env
# 2. Set your Telegram bot token
# Open telegram-bot/.env and paste your token:
# TELEGRAM_BOT_TOKEN=123456:ABC-DEF...
# 3. Start everything
docker compose up --buildThat's it. Add the bot to a Telegram group, give it admin rights, and it starts moderating.
Without admin rights the bot will @-mention admins instead of deleting messages.
The engine starts immediately — no upfront download required. ML models are downloaded on demand when the first message of each type arrives:
| What triggers it | Model downloaded | Size |
|---|---|---|
| First English text message | BART language model | ~1.6 GB |
| First German text message | German toxic-comments model | ~500 MB |
| First image | NSFW detector + CLIP violence model | ~750 MB |
| Deepfake detection (if enabled) | Depends on provider | 0–50 MB |
After the first download, models are cached. Docker volumes persist the cache across container restarts, so subsequent starts are instant.
Tip: To pre-download all models before going live, send one test request per type after starting the engine:
# Pre-download text models curl -X POST http://localhost:8000/moderate_text \ -H "Content-Type: application/json" -d '{"text": "test"}' # Pre-download image models curl -X POST http://localhost:8000/moderate_image \ -H "Content-Type: application/json" -d '{"image_url": "https://placehold.co/10x10.png"}'
Set ENABLED_LANGUAGES=en (or =de) in engine/.env to skip loading the
other language model entirely — saves ~500 MB–1.6 GB of RAM and disk.
# Terminal 1 — Engine
cd engine && pip install -r requirements.txt && python main.py
# Terminal 2 — Telegram bot
cd telegram-bot && pip install -r requirements.txt && python main.py
# Terminal 3 — WhatsApp bot (optional)
cd whatsapp-bot && npm install && npm run build && npm startBare-metal also needs ffmpeg installed (apt install ffmpeg or
brew install ffmpeg).
Deepfake detection is off by default (stub mode — no models downloaded,
no API calls). To enable it, pick a provider and set the config in
engine/.env.
DEEPFAKE_PROVIDER=openai
OPENAI_API_KEY=sk-your-key-hereGet a key at platform.openai.com/api-keys. Uses GPT-4o vision. No local model download needed. Face images are sent to OpenAI — consider GDPR implications for groups with minors.
DEEPFAKE_PROVIDER=ollama
OLLAMA_BASE_URL=http://localhost:11434
OLLAMA_MODEL=llavaInstall Ollama, then ollama pull llava. All data stays
on your machine. No additional downloads by the engine itself.
DEEPFAKE_PROVIDER=localRuns an EfficientNet-B0 model on CPU. Downloads a ~50 MB ONNX model on first
use. Needs ~500 MB extra RAM and the onnxruntime, mediapipe,
opencv-python-headless packages (already included in requirements.txt).
Nothing leaves the server.
# SightEngine
DEEPFAKE_PROVIDER=sightengine
SIGHTENGINE_API_USER=your-user
SIGHTENGINE_API_SECRET=your-secret
# Or any HTTP endpoint
DEEPFAKE_PROVIDER=api
DEEPFAKE_API_URL=https://your-api.com/detect
DEEPFAKE_API_KEY=your-key┌─────────────┐ ┌────────────────────┐
│ Telegram Bot │──── HTTP/JSON ───▶│ │
└─────────────┘ │ Moderation Engine │
┌──────────────┐ │ (FastAPI) │
│ WhatsApp Bot │──── HTTP/JSON ───▶│ │
└──────────────┘ └────────────────────┘
| Directory | Stack | Role |
|---|---|---|
engine/ |
Python 3.11, FastAPI | ML classification + verdict logic |
telegram-bot/ |
Python, python-telegram-bot | Listens to Telegram groups |
whatsapp-bot/ |
Node.js, TypeScript, Baileys | Listens to WhatsApp groups |
The engine is the brain. Bots are thin clients — they forward content, get a verdict, and act on it.
- Bot receives a message (text / image / video) in a group chat
- Forwards content to the engine API
- Engine auto-detects language, runs ML classifiers + pattern matchers
- Engine returns verdict + confidence scores
- Bot deletes, flags, or allows the message
| Category | Model | Downloads on demand |
|---|---|---|
| Violence (EN) | BART zero-shot | ~1.6 GB (shared with other EN categories) |
| Violence (DE) | German toxic model | ~500 MB (shared with other DE categories) |
| Sexual violence | Same as above | (shared) |
| NSFW (text) | Same as above | (shared) |
| NSFW (images) | Falconsai/nsfw_image_detection | ~400 MB |
| Violence (images) | CLIP zero-shot | ~350 MB |
| Cyberbullying | Language-specific patterns + ML | (shared with text model) |
| Deepfake | Depends on DEEPFAKE_PROVIDER |
0 MB (API) to ~50 MB (local) |
| Video | Frame extraction + above models | No extra download |
Auto-detects language per message. Each language has its own ML model, patterns, and educational messages.
| Language | Pack | Model |
|---|---|---|
| English | en |
facebook/bart-large-mnli + EN patterns |
| German | de |
ml6team/distilbert-base-german-cased-toxic-comments + DE patterns |
Add a new language by creating one file in engine/i18n/packs/. Enable with
ENABLED_LANGUAGES=en,de,fr.
All config lives in .env files. Copy from .env.example and edit.
Set all thresholds at once with MODERATION_PROFILE:
| Profile | Violence | Sexual V. | NSFW | Deepfake | Cyberbullying | Best for |
|---|---|---|---|---|---|---|
minors_strict |
0.5 | 0.3 | 0.4 | 0.6 | 0.4 | Schools, youth groups |
default |
0.5 | 0.5 | 0.8 | 0.7 | 0.65 | General communities |
permissive |
0.85 | 0.7 | 0.75 | 0.9 | 0.8 | Adult groups |
How thresholds work: each threshold is the minimum confidence score that triggers an automatic deletion. A lower value means stricter — the engine deletes at lower model confidence, catching the category earlier and more aggressively. A higher value means the model must be very confident before acting, reducing false positives.
Individual THRESHOLD_* vars override specific values within the profile.
| Variable | Default | Description |
|---|---|---|
HOST |
0.0.0.0 |
Bind address |
PORT |
8000 |
Listen port |
LOG_LEVEL |
info |
Logging verbosity |
API_KEY |
(empty) | Auth key for X-API-Key header; empty = no auth (dev only) |
RATE_LIMIT |
60/minute |
Rate limit per IP on moderation endpoints |
ENABLED_LANGUAGES |
en,de |
Active language packs (controls which text models load) |
MODERATION_PROFILE |
default |
Threshold profile (see above) |
THRESHOLD_VIOLENCE |
from profile | Override violence threshold (0–1) |
THRESHOLD_SEXUAL_VIOLENCE |
from profile | Override sexual violence threshold |
THRESHOLD_NSFW |
from profile | Override NSFW threshold |
THRESHOLD_DEEPFAKE |
from profile | Override deepfake threshold |
THRESHOLD_CYBERBULLYING |
from profile | Override cyberbullying threshold |
DEEPFAKE_PROVIDER |
stub |
openai | ollama | local | sightengine | api | stub |
OPENAI_API_KEY |
(empty) | OpenAI key (for openai provider) |
OPENAI_MODEL |
gpt-4o |
OpenAI vision model |
OPENAI_API_BASE |
https://api.openai.com/v1 |
OpenAI-compatible base URL (works with Azure, LiteLLM) |
OLLAMA_BASE_URL |
http://localhost:11434 |
Ollama server (for ollama provider) |
OLLAMA_MODEL |
llava |
Ollama vision model |
DEEPFAKE_MODEL_PATH |
(auto) | ONNX model path (for local provider) |
SIGHTENGINE_API_USER |
(empty) | SightEngine user (for sightengine provider) |
SIGHTENGINE_API_SECRET |
(empty) | SightEngine secret |
DEEPFAKE_API_URL |
(empty) | Custom endpoint (for api provider) |
DEEPFAKE_API_KEY |
(empty) | Bearer token for custom API |
FRAME_INTERVAL |
2.0 |
Seconds between sampled video frames |
MAX_FRAMES |
10 |
Max frames per video |
MAX_VIDEO_DURATION |
300 |
Max video length in seconds |
DATABASE_URL |
sqlite+aiosqlite:///./deepfake_guardian.db |
Database URL |
GDPR_SALT |
(change me) | Secret salt for hashing user IDs |
DATA_RETENTION_DAYS |
30 |
Days before auto-deletion of moderation events |
| Variable | Required | Description |
|---|---|---|
TELEGRAM_BOT_TOKEN |
Yes | Token from @BotFather |
ENGINE_URL |
No | Engine URL (default: http://engine:8000) |
ENGINE_API_KEY |
No | Must match engine's API_KEY |
BOT_LANGUAGE |
No | Admin notification language: en or de |
| Variable | Default | Description |
|---|---|---|
ENGINE_URL |
http://engine:8000 |
Engine URL |
POST /moderate_text {"text": "...", "language": "en"}
POST /moderate_image {"image_base64": "..." | "image_url": "..."}
POST /moderate_video {"video_base64": "..." | "video_url": "..."}
GET /health
{
"verdict": "allow",
"reasons": [],
"scores": {
"violence": 0.02,
"sexual_violence": 0.01,
"nsfw": 0.01,
"deepfake_suspect": 0.0,
"cyberbullying": 0.01
},
"language": "en"
}curl -s -X POST http://localhost:8000/moderate_text \
-H "Content-Type: application/json" \
-d '{"text": "hello world"}' | python3 -m json.toolPOST /gdpr/export — export all data for a user (Article 15)
POST /gdpr/delete_request — submit erasure request (Article 17)
GET /gdpr/delete_request/{id} — check erasure status
POST /warnings/record — record a violation, returns escalation action
GET /warnings/{user_id_hash} — fetch warning history
Escalation: 1st = educational notice, 2nd = admin notification, 3rd+ = supervisor escalation.
| Data | Stored? | Details |
|---|---|---|
| Message text / images / video | Never | Processed in memory, not persisted |
| Moderation verdicts + scores | Yes (metadata) | Auto-deleted after DATA_RETENTION_DAYS |
| User/group IDs | Yes (hashed) | SHA-256 with secret salt — not reversible |
| Warning counts | Yes | Deleted on erasure request |
Telegram bot commands:
/privacy— shows privacy notice in the group/delete_my_data— submits an Article 17 erasure request
Privacy notes:
- Content is never stored — only an 80-char preview in debug logs (never in DB)
- User IDs are hashed before storage
DEEPFAKE_PROVIDER=ollamaorlocalkeeps all data on your serverDEEPFAKE_PROVIDER=openai,sightengine, orapisends face crops externally
cd engine && pip install -r requirements.txt && pytest
cd telegram-bot && pip install -r requirements.txt && pytest| Phase | Focus | Status |
|---|---|---|
| 1 | Tests, CI/CD, API auth, resilience | Done |
| 2 | i18n, language packs, cyberbullying | Done |
| 3 | GDPR compliance, warnings, escalation | Done |
| 4 | Deepfake detection, video analysis | Done |
| 5 | Admin dashboard, bot commands | Planned |
| 6 | WhatsApp parity, Signal, Discord | Planned |
See ROADMAP.md for full details.
Contributions welcome. Open an issue before starting large changes.
Help needed with:
- Language packs (French, Spanish, Turkish, Arabic)
- Tests and documentation
- WhatsApp bot features
See CLAUDE.md for a detailed technical guide to the codebase.
MIT — see LICENSE.