Kaixin Ding1 †,
Xi Chen1,
Sihui Ji1,
Yuan Gao2,
Liang Hou2,
Xin Tao2,
Pengfei Wan2,
Hengshuang Zhao1 ✉
1The University of Hong Kong
2Kling Team, Kuaishou Technology
†: Intern at Kling Team, KuaishouTechnology, ✉: Corresponding Author
- [2025.10.14] Release Arxiv paper.
High-resolution video generation is slow: for example, Wan 2.1 takes over 50 minutes to generate a 720p video. Existing acceleration methods often compromise model priors (layout, semantics, motion). We propose SURF, a two-stage framework: first, a fast low-resolution preview using a pretrained model; second, a Refiner to upscale while preserving priors. Key techniques include noise reshifting to reduce prior loss and shifting windows with careful training design. SURF is simple, efficient, and compatible with various base models, achieving 12.5× speedup for generating 5-second, 16fps, 720p Wan 2.1 videos and 8.7× speedup for generating 5-second, 24fps, 720p HunyuanVideo.
| Method | QS↑ | AQ↑ | DD↑ | MS↑ | OC↑ | SA↑ | PC↑ | Time↓ | Speed↑ | PFLOPs↓ |
|---|---|---|---|---|---|---|---|---|---|---|
| Wan 2.1 | 83.31 | 66.9 | 63.89 | 97.65 | 27.08 | 41.82 | 45.45 | 3497 (58min) | 1× | 658.46 |
| 30% Step | 77.92 | 58.43 | 56.94 | 96.95 | 24.56 | 18.18 | 16.36 | 1049 | 3.34× | 197.54 |
| 50% Step | 81.51 | 63.52 | 66.67 | 96.99 | 25.90 | 25.45 | 23.64 | 1748 | 2× | 329.23 |
| SVG | 83.36 | 65.6 | 68.06 | 97.69 | 27.32 | 25.45 | 20.00 | 2712 | 1.29× | 429.86 |
| DMD | 83.31 | 66.11 | 52.78 | 98.96 | 26.77 | 34.55 | 30.91 | 282 | 12.40× | 39.51 |
| Ours | 83.26 | 66.86 | 72.22 | 97.95 | 27.38 | 41.82 | 38.18 | 278 | 12.58× | 34.3 |
720p_compress.mp4
720p_compress.mp4
1080p_compress.mp4
@article{ding2025surf,
title={Surf: Signature-Retained Fast Video Generation},
author={Kaixin Ding, Xi Chen, Sihui Ji, Yuan Gao, Liang Hou, Xin Tao, Hengshuang Zhao},
journal={arXiv preprint arXiv:xxxx.xxxxx},
year={2025}
}