PyTTI Portable

Neural image synthesizer — text-to-image and animation powered by CLIP + VQGAN/Limited Palette

PyTTI Portable is a self-contained distributable of pytti-core with a Gradio web UI. Everything is bootstrapped from a single install.bat — no system-wide Python required.

Features

Gradio web UI with live preview, config save/load, and render controls
3D animation with camera transforms (translate, rotate, zoom) and depth estimation
Multiple image models: Limited Palette, VQGAN (multiple checkpoints)
CLIP-guided rendering with multi-model ensemble (ViT-B/32, ViT-B/16, RN50x4, etc.)
Video Source mode for style transfer onto existing video
Breath mode — linear crossfade from init image to CLIP-optimized output over the full render. Frame 1 is your original image; the final frame is fully transformed by CLIP. Creates smooth "emergence" animations showing the AI interpretation gradually taking over
Video encoding — built-in ffmpeg encoding (MP4 H.264, ProRes 4444, ProRes HQ) directly from the Output tab
Audioreactive animation support
Config system powered by Hydra — save, load, and share render presets as YAML files

Requirements

Windows 10 or 11
NVIDIA GPU (GTX 10xx through RTX 40xx)
- RTX 50xx (Blackwell) is not supported
Up-to-date NVIDIA drivers
Git installed and on PATH
~8 GB disk space (after install)

Quick Start

git clone https://github.com/pxl-pshr/pytti.git
cd pytti

Double-click install.bat (one-time, ~30-60 min)
Double-click launch.bat
A browser window opens — start rendering

To update: git pull from the pytti folder.

The first render will download CLIP and depth models (~1-4 GB), cached after that.

Project Structure

pytti/
├── install.bat          # One-time installer (downloads Python, PyTorch, deps)
├── launch.bat           # Starts the Gradio UI
├── app/
│   ├── ui.py            # Gradio web UI
│   ├── patch_gradio.py  # Post-install dependency patches
│   └── config/
│       ├── default.yaml # Default render settings
│       └── conf/        # User-saved presets
└── examples/            # Sample renders

How It Works

PyTTI uses CLIP to guide an image generator (Limited Palette or VQGAN) toward text prompts. In animation mode, each frame is warped via 2D/3D transforms with AdaBins depth estimation, then re-optimized toward the prompt — producing dreamlike, evolving visuals.

Resources

Demo video — pytti in action
pytti-book — documentation, tutorials, and parameter guide
pytti-motion-preview — camera motion preview tool
pytti-notebook — the original Colab notebook this project is based on

Credits

David Marx & sportsracer48 — original pytti creators and maintainers
pytti-core — the rendering engine
CLIP — OpenAI's vision-language model
taming-transformers — VQGAN
AdaBins — monocular depth estimation (3D mode)
GMA — optical flow for video mode
Gradio — web UI framework

License

MIT — see LICENSE for details.

Name		Name	Last commit message	Last commit date
Latest commit History 32 Commits
.github		.github
app		app
examples		examples
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
install.bat		install.bat
launch.bat		launch.bat

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PyTTI Portable

Features

Requirements

Quick Start

Project Structure

How It Works

Resources

Credits

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

PyTTI Portable

Features

Requirements

Quick Start

Project Structure

How It Works

Resources

Credits

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages