OmniInsert: Mask-Free Video Insertion of Any Reference via Diffusion Transformer Models

Jinshu Chen^*, Xinghui Li^{* †}, Xu Bai^*, Tianxiang Ma, Pengze Zhang,
Zhuowei Chen, Gen Li, Lijie Liu, Songtao Zhao^†, Bingchuan Li^‡, Qian He

Accepted by CVPR Findings 2026

^*Equal contribution, ^†Corresponding author, ^‡Project lead

Intelligent Creation Lab, ByteDance

📃 Abstract

Recent advances in video insertion based on diffusion models are impressive. However, existing methods rely on complex control signals but struggle with subject consistency, limiting their practical applicability. In this paper, we focus on the task of Mask-free Video Insertion, and aim to resolve three key challenges: data scarcity, subject–scene equilibrium, and insertion harmonization. To address the data scarcity, we propose a new data pipeline InsertPipe, constructing diverse cross-pair data automatically. Building upon our data pipeline, we develop OmniInsert, a novel unified framework for mask-free video insertion from both single and multiple subject references. Specifically, to maintain subject-scene equilibrium, we introduce a simple yet effective Condition-Specific Feature Injection mechanism to distinctly inject multi-source conditions and propose a novel Progressive Training strategy that enables the model to balance feature injection from subjects and source video. Meanwhile, we design the Subject-Focused Loss to improve the detailed appearance of the subjects. To further enhance insertion harmonization, we propose an Insertive Preference Optimization methodology to optimize the model by simulating human preferences, and incorporate a Context-Aware Rephraser module during reference to seamlessly integrate the subject into the original scenes. To address the lack of a benchmark for the field, we introduce InsertBench, a comprehensive benchmark comprising diverse scenes with meticulously selected subjects. Evaluation on InsertBench indicates OmniInsert outperforms state-of-the-art closed-source commercial solutions.

🔥 Latest News

Sep 23, 2025: We release the Project Page and Technique Report of OmniInsert.

📑 Todo List

Release Paper
Release InsertBench
Inference Codes

⭐ Citation

If OmniInsert is helpful, please help to ⭐ the repo.

If you find this project useful for your research, please consider citing our paper.

BibTeX

@misc{chen2025omniinsert,
      title={OmniInsert: Mask-Free Video Insertion of Any Reference via Diffusion Transformer Models}, 
      author={Jinshu Chen and Xinghui Li and Xu Bai and Tianxiang Ma and Pengze Zhang and Zhuowei Chen and Gen Li and Lijie Liu and Songtao Zhao and Bingchuan Li and Qian He},
      year={2025},
      eprint={2509.17627},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2509.17627}, 
}

📧 Contact

If you have any comments or questions regarding this open-source project, please open a new issue or contact Jinshu Chen and Xinghui Li.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
assets		assets
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

OmniInsert: Mask-Free Video Insertion of Any Reference via Diffusion Transformer Models

📃 Abstract

🔥 Latest News

📑 Todo List

⭐ Citation

BibTeX

📧 Contact

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

OmniInsert: Mask-Free Video Insertion of Any Reference via Diffusion Transformer Models

📃 Abstract

🔥 Latest News

📑 Todo List

⭐ Citation

BibTeX

📧 Contact

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Packages