GitHub - Westlake-AGI-Lab/SwitchCraft: Official Implementation of SwitchCraft: Training-Free Multi-Event Video Generation with Attention Controls [CVPR 2026]

🧙‍♀️🪄SwitchCraft: Training-Free Multi-Event Video Generation with Attention Controls

Qianxun Xu^{1, 2} • Chenxi Song¹ • Yujun Cai³ • Chi Zhang^1*

¹AGI Lab, Westlake University • ²Duke Kunshan University • ³The University of Queensland

🧹 Abstract

Recent advances in text-to-video diffusion models have enabled high-fidelity and temporally coherent videos synthesis. However, current models are predominantly optimized for single-event generation. When handling multi-event prompts, without explicit temporal grounding, such models often produce blended or collapsed scenes that break the intended narrative. To address this limitation, we present SwitchCraft, a training-free framework for multi-event video generation. Our key insight is that uniform prompt injection across time ignores the correspondence between events and frames. To this end, we introduce Event-Aligned Query Steering (EAQS), which steers frame-level attention to align with relevant event prompts. Furthermore, we propose Auto-Balance Strength Solver (ABSS), which adaptively balances steering strength to preserve temporal consistency and visual fidelity. Extensive experiments demonstrate that SwitchCraft substantially improves prompt alignment, event clarity, and scene consistency compared with existing baselines, offering a simple yet effective solution for multi-event video generation.

Prompt: Look and extend arms (up → to the side → down)	Prompt: Look and extend arms (up → to the side → down)	Prompt: In a park (jump → look at watch)	Prompt: In a park (jump → look at watch)
❌ Without Controls	✅ With Controls	❌ Without Controls	✅ With Controls

🎯 Event Alignment: Aligns each event description with its intended time span during inference, fundamentally addressing event blending and event collapse.

⚡ Training-Free Query Steering: Employs a novel projection-based query steering framework to map frames to their intended event prompts.

⚖️ Adaptive Balancing: Dynamically balances steering strength throughout the generation process to ensure smooth, coherent multi-event videos.

📦 Installation

# Clone the repository
git clone [https://github.com/CeciliaTheBirb/SwitchCraft.git](https://github.com/CeciliaTheBirb/SwitchCraft.git)
cd SwitchCraft

# Create environment
conda create -n switchcraft python=3.10
conda activate switchcraft

# Install dependencies
pip install -r requirements.txt

🧪 Quickstart

Generate multi-event videos with 👇

bash gen_multi.sh

💡 Acknowledgement

This project is built upon the Wan 2.1 pipeline.

✒️ Citation

If you find this code or our paper useful for your research, please consider citing:

@inproceedings{xu2026switchcraft,
  title={SwitchCraft: Training-Free Multi-Event Video Generation with Attention Controls},
  author={Xu, Qianxun and Song, Chenxi and Cai, Yujun and Zhang, Chi},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
  year={2026}
}

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
assets		assets
wan		wan
README.md		README.md
gen_multi.sh		gen_multi.sh
generate.py		generate.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🧙‍♀️🪄SwitchCraft: Training-Free Multi-Event Video Generation with Attention Controls

🧹 Abstract

📦 Installation

🧪 Quickstart

💡 Acknowledgement

✒️ Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

🧙‍♀️🪄SwitchCraft: Training-Free Multi-Event Video Generation with Attention Controls

🧹 Abstract

📦 Installation

🧪 Quickstart

💡 Acknowledgement

✒️ Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages