Skip to content

Westlake-AGI-Lab/SwitchCraft

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🧙‍♀️🪄SwitchCraft: Training-Free Multi-Event Video Generation with Attention Controls

Qianxun Xu1, 2  •  Chenxi Song1  •  Yujun Cai3  •  Chi Zhang1*

1AGI Lab, Westlake University  •  2Duke Kunshan University  •  3The University of Queensland

arXiv Project Page

SwitchCraft Teaser Figure

🧹 Abstract

Recent advances in text-to-video diffusion models have enabled high-fidelity and temporally coherent videos synthesis. However, current models are predominantly optimized for single-event generation. When handling multi-event prompts, without explicit temporal grounding, such models often produce blended or collapsed scenes that break the intended narrative. To address this limitation, we present SwitchCraft, a training-free framework for multi-event video generation. Our key insight is that uniform prompt injection across time ignores the correspondence between events and frames. To this end, we introduce Event-Aligned Query Steering (EAQS), which steers frame-level attention to align with relevant event prompts. Furthermore, we propose Auto-Balance Strength Solver (ABSS), which adaptively balances steering strength to preserve temporal consistency and visual fidelity. Extensive experiments demonstrate that SwitchCraft substantially improves prompt alignment, event clarity, and scene consistency compared with existing baselines, offering a simple yet effective solution for multi-event video generation.


Prompt: Look and extend arms (up → to the side → down) Prompt: Look and extend arms (up → to the side → down) Prompt: In a park (jump → look at watch) Prompt: In a park (jump → look at watch)
❌ Without Controls ✅ With Controls ❌ Without Controls ✅ With Controls
Without With Without With

🎯 Event Alignment: Aligns each event description with its intended time span during inference, fundamentally addressing event blending and event collapse.

Training-Free Query Steering: Employs a novel projection-based query steering framework to map frames to their intended event prompts.

⚖️ Adaptive Balancing: Dynamically balances steering strength throughout the generation process to ensure smooth, coherent multi-event videos.


📦 Installation

# Clone the repository
git clone [https://github.com/CeciliaTheBirb/SwitchCraft.git](https://github.com/CeciliaTheBirb/SwitchCraft.git)
cd SwitchCraft

# Create environment
conda create -n switchcraft python=3.10
conda activate switchcraft

# Install dependencies
pip install -r requirements.txt

🧪 Quickstart

Generate multi-event videos with 👇

bash gen_multi.sh

💡 Acknowledgement

This project is built upon the Wan 2.1 pipeline.

✒️ Citation

If you find this code or our paper useful for your research, please consider citing:

@inproceedings{xu2026switchcraft,
  title={SwitchCraft: Training-Free Multi-Event Video Generation with Attention Controls},
  author={Xu, Qianxun and Song, Chenxi and Cai, Yujun and Zhang, Chi},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
  year={2026}
}

About

Official Implementation of SwitchCraft: Training-Free Multi-Event Video Generation with Attention Controls [CVPR 2026]

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors