Neural image synthesizer — text-to-image and animation powered by CLIP + VQGAN/Limited Palette
PyTTI Portable is a self-contained distributable of pytti-core with a Gradio web UI. Everything is bootstrapped from a single install.bat — no system-wide Python required.
- Gradio web UI with live preview, config save/load, and render controls
- 3D animation with camera transforms (translate, rotate, zoom) and depth estimation
- Multiple image models: Limited Palette, VQGAN (multiple checkpoints)
- CLIP-guided rendering with multi-model ensemble (ViT-B/32, ViT-B/16, RN50x4, etc.)
- Video Source mode for style transfer onto existing video
- Breath mode — linear crossfade from init image to CLIP-optimized output over the full render. Frame 1 is your original image; the final frame is fully transformed by CLIP. Creates smooth "emergence" animations showing the AI interpretation gradually taking over
- Video encoding — built-in ffmpeg encoding (MP4 H.264, ProRes 4444, ProRes HQ) directly from the Output tab
- Audioreactive animation support
- Config system powered by Hydra — save, load, and share render presets as YAML files
- Windows 10 or 11
- NVIDIA GPU (GTX 10xx through RTX 40xx)
- RTX 50xx (Blackwell) is not supported
- Up-to-date NVIDIA drivers
- Git installed and on PATH
- ~8 GB disk space (after install)
git clone https://github.com/pxl-pshr/pytti.git
cd pytti
- Double-click
install.bat(one-time, ~30-60 min) - Double-click
launch.bat - A browser window opens — start rendering
To update: git pull from the pytti folder.
The first render will download CLIP and depth models (~1-4 GB), cached after that.
pytti/
├── install.bat # One-time installer (downloads Python, PyTorch, deps)
├── launch.bat # Starts the Gradio UI
├── app/
│ ├── ui.py # Gradio web UI
│ ├── patch_gradio.py # Post-install dependency patches
│ └── config/
│ ├── default.yaml # Default render settings
│ └── conf/ # User-saved presets
└── examples/ # Sample renders
PyTTI uses CLIP to guide an image generator (Limited Palette or VQGAN) toward text prompts. In animation mode, each frame is warped via 2D/3D transforms with AdaBins depth estimation, then re-optimized toward the prompt — producing dreamlike, evolving visuals.
- Demo video — pytti in action
- pytti-book — documentation, tutorials, and parameter guide
- pytti-motion-preview — camera motion preview tool
- pytti-notebook — the original Colab notebook this project is based on
- David Marx & sportsracer48 — original pytti creators and maintainers
- pytti-core — the rendering engine
- CLIP — OpenAI's vision-language model
- taming-transformers — VQGAN
- AdaBins — monocular depth estimation (3D mode)
- GMA — optical flow for video mode
- Gradio — web UI framework
MIT — see LICENSE for details.




