🚀[NeurIPS 2025] Temporal In-Context Fine-Tuning with Temporal Reasoning for Versatile Control of Video Diffusion Models✨

📑Paper

Arxiv: Temporal In-Context Fine-Tuning with Temporal Reasoning for Versatile Control of Video Diffusion Models

🌐Project Page

TIC-FT Project Page

📰 News

[2025.09.19] 🏆 TIC-FT officially accepted to NeurIPS 2025!
🎤 Stay tuned for our presentation at the conference.

⚙️ Requirements

We recommend using conda to manage the environment:

# Create a new conda environment
conda create -n tic-ft python=3.10

# Activate the environment
conda activate tic-ft

# Install python dependencies
pip install -r requirements.txt

# Install ftfy package
pip install ftfy

# Install ffmpeg via conda-forge
conda install -c conda-forge ffmpeg

# Additional packages for video processing
pip install imageio imageio-ffmpeg

🚀 Try It Yourself!

Follow these steps to easily test the I2V pipeline:

Prepare Your Image
Convert your face image into either Cartoon or 3D Animation style with a white background using an image generation tool such as ChatGPT.
Save the Image
Save your generated image to:
dataset/custom/{mode}/images
- {mode} could be either Cartoon or 3DAnimation.
- By default, an example 1.png is provided. You can:
  - Add new images as 2.png, 3.png, etc.
  - Or replace 1.png directly.
Convert Image to Reference Video
Use the following script to duplicate the image into 49 frames and generate a condition video:
```
python dataset/utils/make_video_by_copying_image.py {image_path}
```
Save the generated condition video into: dataset/custom/{mode}/videos
Prepare Dataset Files
- In dataset/custom/{mode}/videos.txt, list the relative video paths (one per line).
- In dataset/custom/{mode}/prompt.txt, write the corresponding text prompts (one per line).
Download Pretrained Weights
Download the safetensors weights for your selected mode from:
Google Drive

Run Inference
Example command:

python validate_repeat.py \
--model_name wan \
--model_id Wan-AI/Wan2.1-T2V-14B-Diffusers \
--lora_weight_path /data/kinamkim/TIC-FT/outputs/wan/3DAnimation/pytorch_lora_weights.safetensors \
--latent_partition_mode c1b3t9 \
--dataset_dir /data/kinamkim/TIC-FT/dataset/custom/Cartoon

This command will generate multiple videos with different random seeds and save them under validation_videos/ in your weight directory.

⚠ Note (Wan Model Specific)
- Due to a known issue, the first generated sample may appear noisy.
- Valid results typically start from the second sample.
Now you have your own video featuring your character!

cartoon.mp4

3DAnimation.mp4

🚧 Progress

✅ Completed

Implement I2V code on both CogVideoX and Wan

🔄 In Progress

Prepare model weights for various I2V applications
Implement V2V code for CogVideoX

🔜 Upcoming

Implement remaining features: Multiple Conditions, Action Transfer, and Video Interpolation

🗺️Start Guide

🔗 Weights

Download pretrained weights from here: Drive

📂 Dataset

To prepare your dataset, follow the structure provided in dataset/example/.

Each video should have 49 frames in total:
- 13 condition image frames
- 36 target video frames

When converted into latent representations, the 13 frames are split into:

The first 4 frames → condition frames
The next 9 frames → target frames

During training:

Only the first condition frame is kept as a pure condition.
The remaining 3 condition frames are progressively noised and used as buffer frames.

In the training scripts, you will find that latent_partition_mode is set to c1b3t9, which means:

c1b3t9 → 1 pure condition frame, 3 buffer frames, and 9 target frames.

🚀 Train

For CogVideoX:
Example:
```
scripts/cogvideox/I2V/train.sh
```
For Wan: Example:
```
scripts/wan/I2V/train.sh
```

🔎 Inference

python validate.py \
  --model_name wan \
  --model_id {checkpoint path} \
  --lora_weight_path {safetensors path} \
  --latent_partition_mode c1b3t9 \
  --dataset_dir {dataset dir}

🎥 Video Examples

Below are example videos showcasing various applications of TIC-FT.

🖼️ I2V

We emphasize that our I2V approach is not limited to the first frame; instead, it leverages the identity of the first frame to enable the generation of diverse and coherent videos.

i2v-1.mp4

i2v-2.mp4

i2v-3.mp4

i2v-4.mp4

🔁 V2V

v2v-1.mp4

v2v-2.mp4

🖼️ Multiple Conditions

MC-1.mp4

MC-2.mp4

🎯 Action Transfer

ActionTransfer-1.mp4

🕰️ Keyframe Interpolation

Interpolation-1.mp4

🙏Acknowledgements

This project is built upon the following works:

finetrainers

📖 BibTeX

@article{kim2025temporal,
  title={Temporal In-Context Fine-Tuning for Versatile Control of Video Diffusion Models},
  author={Kim, Kinam and Hyung, Junha and Choo, Jaegul},
  journal={arXiv preprint arXiv:2506.00996},
  year={2025}
}

Name		Name	Last commit message	Last commit date
Latest commit History 50 Commits
accelerate_configs		accelerate_configs
assets		assets
dataset		dataset
finetrainers		finetrainers
scripts		scripts
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
setup.py		setup.py
train.py		train.py
validate.py		validate.py
validate_repeat.py		validate_repeat.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🚀[NeurIPS 2025] Temporal In-Context Fine-Tuning with Temporal Reasoning for Versatile Control of Video Diffusion Models✨

📑Paper

🌐Project Page

📰 News

⚙️ Requirements

🚀 Try It Yourself!

🚧 Progress

✅ Completed

🔄 In Progress

🔜 Upcoming

🗺️Start Guide

🔗 Weights

📂 Dataset

🚀 Train

🔎 Inference

🎥 Video Examples

🖼️ I2V

🔁 V2V

🖼️ Multiple Conditions

🎯 Action Transfer

🕰️ Keyframe Interpolation

🙏Acknowledgements

📖 BibTeX

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

🚀[NeurIPS 2025] Temporal In-Context Fine-Tuning with Temporal Reasoning for Versatile Control of Video Diffusion Models✨

📑Paper

🌐Project Page

📰 News

⚙️ Requirements

🚀 Try It Yourself!

🚧 Progress

✅ Completed

🔄 In Progress

🔜 Upcoming

🗺️Start Guide

🔗 Weights

📂 Dataset

🚀 Train

🔎 Inference

🎥 Video Examples

🖼️ I2V

🔁 V2V

🖼️ Multiple Conditions

🎯 Action Transfer

🕰️ Keyframe Interpolation

🙏Acknowledgements

📖 BibTeX

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages