A multi-agent AI swarm that generates images, video, music, streams to YouTube, posts to Instagram, runs a website, and learns from every interaction — all from Discord.
DiscoClaw is the orchestration layer that wires NVIDIA NemoClaw and HermesClaw into a single production system. Four AI agents run inside NVIDIA OpenShell sandboxes on a local GPU, coordinating through Discord to handle every stage of content creation: ideation, generation, editing, publishing, and memory.
It is not a chatbot. It is not a wrapper around one API. It is an autonomous creative pipeline where:
- Pipes (team lead) handles 28 Discord commands, generates images/video/music, manages button flows, and runs inside a Landlock+seccomp sandboxed K3s pod
- Candy (creative director) reacts to every generation with aesthetic critique, writes captions, and directs style
- MaoMao (logic auditor) catches hallucinations, flags bad reasoning, and keeps the crew honest
- Hermes (deep research) synthesizes long documents, reviews code, generates media through tool calls, and learns from every tool outcome via its own vector memory
All four agents talk to each other, share memory through Qdrant, and publish to Instagram, YouTube, and Facebook — without any human touching a browser.
Image gen with button flow · Suno music + video gen
9-clip video edit with beat-sync, transitions & effects
| Capability | Details |
|---|---|
| Image generation | 6 providers: ZImage Turbo (local GPU, 19 styles), Imagen 4 Fast (Google), Grok Aurora (xAI), NVIDIA NIM (SDXL/Flux), Freepik (Kontext Pro), Runware (Kontext Dev) |
| Video generation | LTX 2.3 text-to-video (fast 2-pass + standard), image-to-video, first+last frame interpolation, multi-segment story chains, Grok video |
| Video editing | Full FFmpeg render engine: 6 presets, 8 styles, 12 crossfade transitions, beat-sync via RMS analysis, Ken Burns, per-segment effects (grain, RGB shift, vignette pulse), lyrics/captions via faster-whisper |
| Music generation | ACE-Step (local ComfyUI) + Suno AI v5.5 (custom lyrics, instrumental, gen_lyrics) |
| Social media | Instagram (image posts + video reels via Graph API), YouTube (Data API v3 + live streaming + chat bridge), Facebook (Graph API) |
| Deep research | Hermes agent: grounded search, arxiv, GitHub inspection, long-context synthesis, code review |
| Memory | Qdrant vector DB with per-user + global recall, cross-agent shared knowledge, Hermes learning memory, Google Drive backup every 5 min |
| Website | PIPEBOX — Next.js 15, Holo-Diamond TCG, CRT Vault, AI gallery with dithering, retro themes |
| Crew system | Conditional multi-agent reactions: Pipes responds, Candy + MaoMao react in parallel, skip factual/short messages |
Windows 11 + RTX 5080 (16 GB VRAM)
│
├── ComfyUI :8188 ─── image / video / music gen (GPU-bound)
│
└── WSL2 Ubuntu
│
├── PM2 (12 processes)
│ │
│ ├── discord-bridge.js ── Pipes_AI gateway (:9339 :9340 :9341)
│ │ │ 28 slash commands, buttons, crew reactions
│ │ │
│ │ ├── candy.js ─────── Mistral Large 3 via NVIDIA NIM (:7701)
│ │ ├── maomai.js ────── Qwen 3.6+ via OpenRouter (:7702)
│ │ └── grok-server.js ─ Playwright headless Grok Aurora (:3091)
│ │
│ ├── hermes-bridge.js ─── Hermes gateway (:9350)
│ │ │ /ask endpoint, /generate tools, file upload
│ │ │
│ │ ├── hermes-inference.js ── OpenAI → Vertex AI Gemini (:9351)
│ │ │ context caching, agentic tool loop
│ │ │ hermes_memory (learns from tool outcomes)
│ │ │
│ │ └── hermes-mcp.js ──────── MCP server (:9360)
│ │ 17 tools: gen, post, search, stream
│ │
│ ├── memory-server (:7338) ──── Qdrant MCP, shared crew memory
│ ├── monitor.js (:7337) ─────── web dashboard
│ └── agent-watchdog.sh ──────── 3-min healthcheck
│
└── Docker Desktop
│
├── Redis :6379 ─── pub/sub (ai:events, ai:help-wanted, ai:teaching)
│
├── Qdrant :6333 ── vector DB (nemoclaw_memory, hermes_memory, shared_knowledge)
│
├── K3s cluster
│ └── OpenShell sandbox pod (Landlock + seccomp + netns)
│ └── Pipes_AI agent ── network-isolated, fs-controlled
│ ├── SOUL.md (persona)
│ ├── policy.yaml (allowlist)
│ └── custom skills
│
└── hermesclaw container (debian + hermes-agent)
2 CPU / 4 GB RAM limits, fallback for agentic tasks
| Command | What it does |
|---|---|
/chat |
Talk to Pipes (sandbox agent) |
/grok |
Text-to-image via Grok Aurora (xAI) |
/grok-img2img |
Upload image, get AI variations |
/grok-img2vid |
Upload image, animate it |
/imagine |
Text-to-image via Imagen 4 Fast (Google) |
/zturbo |
Text-to-image local GPU (19 styles, ~10s) |
/video |
Text-to-video via LTX 2.3 |
/combi |
First + last frame video (2 images) |
/story |
Multi-segment chained story video |
/music |
Generate song via ACE-Step (local) |
/suno |
Generate song via Suno AI (v5.5, lyrics, instrumental) |
/combine |
Replace video audio track |
/edit |
Compose video (6 presets, 8 styles, beat-sync, lyrics) |
/edit-add |
Add media to edit queue |
/edit-go |
Render edit queue |
/edit-queue |
Show edit queue |
/edit-clear |
Clear edit queue |
/capcut |
Video with effects & transitions |
/create |
Media picker: type + model + params |
/create2 |
YouTube stream + IG poster + Breakroom |
/post |
Post last media to Instagram / YouTube |
/yt |
Search YouTube channel videos |
/transcript |
YouTube video transcript + summary |
/analyze |
Batch analyze channel streams |
/ask |
Grounded doc search (Vertex AI) |
/model |
Show current models |
/queue |
ComfyUI render queue status |
/help |
Show all commands |
Every generation spawns context-aware buttons:
- Image: Regenerate, Enhance, Img2Img, Make Video, Grok I2I, Grok I2V, Post IG
- Video: Regenerate, Chain, Chain+Prompt, Chain+Enhance, GIF, Post IG, Post YT, Stitch All
- Grok: Select 1-4, Regen, Img2Img, Img2Vid, Edit, Video, Post, Back
- Music: Suno Video, Post Music
- GIF: Loop, GIF→MP4, Post IG Image, Post IG Reel
- Enhance flow: Enhanced / Original / Cancel (LLM prompt comparison)
| Agent | Model | Provider | Role |
|---|---|---|---|
| Pipes | Gemini 2.5 Flash | Vertex AI | Team lead. 28 commands, all generation, sandbox agent, button flows |
| Candy | Mistral Large 3 | NVIDIA NIM | Creative director. Aesthetic critique, captions, style direction |
| MaoMao | Qwen 3.6+ | OpenRouter | Logic auditor. Catches errors, flags hallucinations, cat energy |
| Hermes | Gemini 2.5 Flash | Vertex AI | Deep research. Synthesis, code review, tool-calling with learning memory |
Crew reactions fire after Pipes responds. Candy and MaoMao react in parallel. Skips factual lookups, YouTube queries, and short messages.
| Layer | Details |
|---|---|
| Primary LLM | Gemini 2.5 Flash (Vertex AI, service account OAuth2) |
| Fallback | Mistral Large 3 675B via NVIDIA NIM (on Gemini 429) |
| Candy | Mistral Large 3 (NVIDIA NIM) |
| MaoMao | Qwen 3.6+ (OpenRouter, reasoning enabled) |
| Hermes | Gemini 2.5 Flash (Vertex AI, context caching, 30min TTL) |
| Gemini proxy | Port 9340 — OpenAI-to-Vertex translation |
| Embeddings | gemini-embedding-001 (3072-dim) |
| Cost optimization | Vertex context caching — 75% reduction on repeated system prompts |
- Qdrant vector DB (port 6333) with 3 collections:
nemoclaw_memory,hermes_memory,shared_knowledge [REMEMBER:]tokens in any agent response store per-user memories- Pre-message recall merges user-specific + global context before every LLM call
- Hermes learning stores tool success/failure outcomes, injects past experience on future requests
- Google Drive backup every 5 minutes
- MCP server (port 7338) exposes memory to Claude Code and external tools
| Platform | What |
|---|---|
| Image posts via Google Drive + Graph API. Video reels via Catbox + Graph API. Auto-upload for files >8MB | |
| YouTube | Data API v3: search, transcripts, batch analysis, live streaming, chat bridge to Discord, scene transitions |
| Graph API posting |
Owner-gated. No accidental posts.
- All secrets in
~/.nemoclaw_env(gitignored, never committed) - No hardcoded credentials, user IDs, or server IDs in this repo
- Sandbox agent isolated via OpenShell (Landlock + seccomp + network namespace)
- Hermes container: 2 CPU, 4 GB RAM resource limits
- Owner-gated social posting and admin commands
- Rate limiting: 7 video gens/hr per user, 20 Hermes requests/hr per user
- GPU: NVIDIA RTX with 16GB+ VRAM (tested on RTX 5080)
- OS: Windows 11 with WSL2 + Docker Desktop
- RAM: 32GB+ recommended
- Runtime: Node.js 22, PM2, FFmpeg, Playwright, faster-whisper
# 1. Install upstream frameworks
git clone https://github.com/NVIDIA/NemoClaw ~/.nemoclaw/source
git clone https://github.com/TheAiSingularity/hermesclaw ~/hermesclaw
# 2. Clone DiscoClaw
git clone https://github.com/dknos/DiscoClaw ~/discoclaw
cd ~/discoclaw
# 3. Run installer (creates symlinks; existing files get .bak)
bash install.sh
# 4. Add secrets to ~/.nemoclaw_env
# Requires: DISCORD_BOT_TOKEN, HERMES_DISCORD_TOKEN, HERMES_GCP_PROJECT,
# HERMES_GCP_SA_KEY, NVIDIA_API_KEY, OPENROUTER_API_KEY, SUNO_COOKIE,
# INSTAGRAM_ACCESS_TOKEN, YOUTUBE_API_KEY, etc.
# 5. Start
bash ~/start-both.sh # NemoClaw + HermesClaw, crash-isolatedpm2 list # all processes
docker ps # all containers
curl -s http://127.0.0.1:9351/metrics | jq # Hermes inference stats + cost
curl -s http://127.0.0.1:9351/health # inference shim
curl -s http://127.0.0.1:9350/health # hermes bridge
curl http://172.20.224.1:8188/queue # ComfyUI render queue- NVIDIA NemoClaw — OpenClaw agent sandbox orchestration (Apache 2.0)
- NVIDIA OpenShell — Landlock/seccomp/netns container isolation
- HermesClaw — Hermes agent framework (MIT)
Custom integration, agent orchestration, and all bridge code by dknos.
Apache 2.0 — see LICENSE


