MeanCache: From Instantaneous to Average Velocity for Accelerating Flow Matching Inference

✨ Accepted by ICLR 2026 ✨

Huanlin Gao^1,2, Ping Chen^1,2, Fuyuan Shi^1,2, Ruijia Wu², Yantao Li^1,2,3, Qiang Hui^1,2, Yuren You², Ting Lu^1,2, Chao Tan^1,2, Shaoan Zhao^1,2, Zhaoxiang Liu^1,2
Fang Zhao^1,2^*, Kai Wang^1,2, Shiguo Lian^1,2^*

¹Data Science & Artificial Intelligence Research Institute, China Unicom, ²Unicom Data Intelligence, China Unicom, ³National Key Laboratory for Novel Software Technology, Nanjing University

(* Corresponding author.)

Code Paper Project Page BibTex

Abstract

We present MeanCache, a training-free caching framework for efficient Flow Matching inference. Existing caching methods reduce redundant computation but typically rely on instantaneous velocity information (e.g., feature caching), which often leads to severe trajectory deviations and error accumulation under high acceleration ratios. MeanCache introduces an average-velocity perspective: by leveraging cached Jacobian–vector products (JVP) to construct interval aver age velocities from instantaneous velocities, it effectively mitigates local error accumulation. To further improve cache timing and JVP reuse stability, we de velop a trajectory-stability scheduling strategy as a practical tool, employing a Peak-Suppressed Shortest Path under budget constraints to determine the sched ule. Experiments on FLUX.1, Qwen-Image, and HunyuanVideo demonstrate that MeanCache achieves 4.12×, 4.56×, and 3.59× acceleration, respectively, while consistently outperforming state-of-the-art caching baselines in generation qual ity. We believe this simple yet effective approach provides a new perspective for Flow Matching inference and will inspire further exploration of stability-driven acceleration in commercial-scale generative models.

MeanCache

Motivation: From Instantaneous to Average Velocity

In Flow Matching inference, existing caching methods primarily rely on reusing Instantaneous Velocity or its feature-level proxies. However, we observe that instantaneous velocity often exhibits sharp fluctuations across timesteps. This leads to severe trajectory deviations and cumulative errors, especially as the cache interval increases.

Inspired by MeanFlow, we propose MeanCache. Compared to unstable instantaneous velocity, Average Velocity is significantly smoother and more robust over time. By shifting the caching perspective from a single "point" to an "interval," MeanCache effectively mitigates trajectory drift under high acceleration ratios.

Figure 1: Instantaneous vs. Average Velocity and JVP Caching. (Left) Along the original trajectory, instantaneous velocity shows sharp fluctuations, while average velocity is much smoother. (Middle) At timestep 927, JVP Caching reduces error accumulation, though its effectiveness depends on the cache interval and hyperparameter $K$. (Right) At timestep 551, it achieves stronger error mitigation, showing that effectiveness varies across timesteps. Both middle and right figures are under the single-cache setting on the original trajectory.

Implementation: Average-Velocity Perspective

1. JVP-Based Cache Construction

We leverage Jacobian-Vector Products (JVP) to construct estimated interval average velocities. By reusing JVP information calculated at prior timesteps, we transform the current instantaneous velocity $v$ into an estimated average velocity $\hat{u}$ for the target interval. This approach compensates for trajectory deviations without additional inference overhead:

Figure 2: From Instantaneous to Average Velocity. Directly caching the instantaneous velocity $v(z_t,t)$ over $[t,s]$ easily leads to trajectory drift and error accumulation, whereas the average velocity $u(z_t,t,s)$ accurately reaches the target $s$. MeanCache introduces a prior timestep $r$ and reuses $\mathrm{JVP}_{r \to t}$ to estimate the average velocity $\hat{u}(z_t,t,s)$, thereby correcting the trajectory and effectively mitigating error accumulation.

2. Trajectory-Stability Scheduling

To achieve an optimal balance between speed and quality while accounting for temporal heterogeneity, we propose a trajectory-stability scheduling algorithm:

Stability Map via Graph Representation: We model the inference process as a Multigraph, where the edge weight $\mathcal{L}_K(t,s)$ represents the Stability Deviation—the error between predicted and true average velocities under a cache span $K$: $$\mathcal{L}_K(t,s) = \frac{1}{N} \left\| u(z_t,t,s) - v(z_t,t) - (s-t)\widehat{\text{JVP}}_K \right\|_1$$
Peak-Suppressed Shortest Path: Given a computation budget $\mathcal{B}$, we solve for a "peak-suppressed" shortest path. By introducing a penalty coefficient $\gamma$ for high-error edges, this strategy ensures smooth and continuous trajectory generation: $$\pi^\star = \arg\min_{\pi \in \mathcal{P}(T,0)} \sum_{e \in \pi} \mathcal{C}(e)^\gamma \quad \text{s.t.} \quad |\pi| \leq \mathcal{B}$$

🖼️ Visual Results

Z-Image

Method	Z-Image-base	MeanCache (B=25)	MeanCache (B=20)	MeanCache (B=15)	MeanCache (B=13)
Latency	18.07 s	9.15 s	7.36 s	5.58 s	4.85 s
T2I

Comparisons on a Single H800 GPU

Content Consistency

Maintaining content consistency is a primary challenge for acceleration frameworks. Rare words, characterized by ambiguous semantics and low frequency, often lead to significant visual drift during the denoising process. MeanCache demonstrates superior potential in addressing this challenge.

Figure 3: Content consistency under rare-word prompts "Matutinal".

MeanCache vs. LeMiCa

This benchmark evaluates the performance of MeanCache against LeMiCa using the Qwen-Image-2512 model as the base.

🚀 Efficiency Comparison

Baseline Latency (Original Qwen-Image-2512): 32.8s

Constraint Method Latency Speedup Time Reduction

$B=25$ LeMiCa 18.83 s 1.74x -

MeanCache 17.13 s 1.91x 9.0%

$B=17$ LeMiCa 14.35 s 2.29x -

MeanCache 11.67 s 2.81x 18.7%

$B=10$ LeMiCa 10.41 s 3.15x -

MeanCache 6.95 s 4.72x 33.2%

🎨 Quality Comparison

Constraint Method PSNR (↑) SSIM (↑) LPIPS (↓)

$B=25$ LeMiCa 29.20 0.945 0.065

MeanCache 29.46 0.944 0.057

$B=17$ LeMiCa 24.31 0.835 0.176

MeanCache 26.49 0.907 0.104

$B=10$ LeMiCa 17.80 0.637 0.368

MeanCache 19.44 0.767 0.237

Constraint	Method	Latency	Speedup	Time Reduction
$B=25$	LeMiCa	18.83 s	1.74x	-
MeanCache	17.13 s	1.91x	9.0%
$B=17$	LeMiCa	14.35 s	2.29x	-
MeanCache	11.67 s	2.81x	18.7%
$B=10$	LeMiCa	10.41 s	3.15x	-
MeanCache	6.95 s	4.72x	33.2%

Constraint	Method	PSNR (↑)	SSIM (↑)	LPIPS (↓)
$B=25$	LeMiCa	29.20	0.945	0.065
MeanCache	29.46	0.944	0.057
$B=17$	LeMiCa	24.31	0.835	0.176
MeanCache	26.49	0.907	0.104
$B=10$	LeMiCa	17.80	0.637	0.368
MeanCache	19.44	0.767	0.237

BibTeX

@inproceedings{gao2025meancache, title = {MeanCache: From Instantaneous to Average Velocity for Accelerating Flow Matching Inference}, author = {Huanlin Gao and Ping Chen and Fuyuan Shi and Ruijia Wu and Yantao Li and Qiang Hui and Yuren You and Ting Lu and Chao Tan and Shaoan Zhao and Zhaoxiang Liu and Fang Zhao and Kai Wang and Shiguo Lian}, booktitle = {International Conference on Learning Representations (ICLR)}, year = {2026}, url = {https://arxiv.org/abs/2601.19961} }