MeanCache: From Instantaneous to Average Velocity for Accelerating Flow Matching Inference

✨ Accepted by ICLR 2026 ✨


Huanlin Gao1,2 Ping Chen1,2 Fuyuan Shi1,2 Ruijia Wu2 Yantao Li1,2,3 Qiang Hui1,2 Yuren You2 Ting Lu1,2 Chao Tan1,2 Shaoan Zhao1,2 Zhaoxiang Liu1,2
Fang Zhao1,2* Kai Wang1,2 Shiguo Lian1,2*
1Data Science & Artificial Intelligence Research Institute, China Unicom,  2Unicom Data Intelligence, China Unicom,  3National Key Laboratory for Novel Software Technology, Nanjing University
(* Corresponding author.)

Abstract

We present MeanCache, a training-free caching framework for efficient Flow Matching inference. Existing caching methods reduce redundant computation but typically rely on instantaneous velocity information (e.g., feature caching), which often leads to severe trajectory deviations and error accumulation under high acceleration ratios. MeanCache introduces an average-velocity perspective: by leveraging cached Jacobian–vector products (JVP) to construct interval aver age velocities from instantaneous velocities, it effectively mitigates local error accumulation. To further improve cache timing and JVP reuse stability, we de velop a trajectory-stability scheduling strategy as a practical tool, employing a Peak-Suppressed Shortest Path under budget constraints to determine the sched ule. Experiments on FLUX.1, Qwen-Image, and HunyuanVideo demonstrate that MeanCache achieves 4.12×, 4.56×, and 3.59× acceleration, respectively, while consistently outperforming state-of-the-art caching baselines in generation qual ity. We believe this simple yet effective approach provides a new perspective for Flow Matching inference and will inspire further exploration of stability-driven acceleration in commercial-scale generative models.

MeanCache

Motivation: From Instantaneous to Average Velocity

In Flow Matching inference, existing caching methods primarily rely on reusing Instantaneous Velocity or its feature-level proxies. However, we observe that instantaneous velocity often exhibits sharp fluctuations across timesteps. This leads to severe trajectory deviations and cumulative errors, especially as the cache interval increases.

Inspired by MeanFlow, we propose MeanCache. Compared to unstable instantaneous velocity, Average Velocity is significantly smoother and more robust over time. By shifting the caching perspective from a single "point" to an "interval," MeanCache effectively mitigates trajectory drift under high acceleration ratios.

Instantaneous vs Average Velocity

Figure 1: Instantaneous vs. Average Velocity and JVP Caching. (Left) Along the original trajectory, instantaneous velocity shows sharp fluctuations, while average velocity is much smoother. (Middle) At timestep 927, JVP Caching reduces error accumulation, though its effectiveness depends on the cache interval and hyperparameter $K$. (Right) At timestep 551, it achieves stronger error mitigation, showing that effectiveness varies across timesteps. Both middle and right figures are under the single-cache setting on the original trajectory.

Implementation: Average-Velocity Perspective

1. JVP-Based Cache Construction

We leverage Jacobian-Vector Products (JVP) to construct estimated interval average velocities. By reusing JVP information calculated at prior timesteps, we transform the current instantaneous velocity $v$ into an estimated average velocity $\hat{u}$ for the target interval. This approach compensates for trajectory deviations without additional inference overhead:

MeanCache Principle

Figure 2: From Instantaneous to Average Velocity. Directly caching the instantaneous velocity $v(z_t,t)$ over $[t,s]$ easily leads to trajectory drift and error accumulation, whereas the average velocity $u(z_t,t,s)$ accurately reaches the target $s$. MeanCache introduces a prior timestep $r$ and reuses $\mathrm{JVP}_{r \to t}$ to estimate the average velocity $\hat{u}(z_t,t,s)$, thereby correcting the trajectory and effectively mitigating error accumulation.

2. Trajectory-Stability Scheduling

To achieve an optimal balance between speed and quality while accounting for temporal heterogeneity, we propose a trajectory-stability scheduling algorithm:

  • Stability Map via Graph Representation: We model the inference process as a Multigraph, where the edge weight $\mathcal{L}_K(t,s)$ represents the Stability Deviation—the error between predicted and true average velocities under a cache span $K$: $$\mathcal{L}_K(t,s) = \frac{1}{N} \left\| u(z_t,t,s) - v(z_t,t) - (s-t)\widehat{\text{JVP}}_K \right\|_1$$
  • Peak-Suppressed Shortest Path: Given a computation budget $\mathcal{B}$, we solve for a "peak-suppressed" shortest path. By introducing a penalty coefficient $\gamma$ for high-error edges, this strategy ensures smooth and continuous trajectory generation: $$\pi^\star = \arg\min_{\pi \in \mathcal{P}(T,0)} \sum_{e \in \pi} \mathcal{C}(e)^\gamma \quad \text{s.t.} \quad |\pi| \leq \mathcal{B}$$

🖼️ Visual Results

Z-Image

Method Z-Image-base MeanCache (B=25) MeanCache (B=20) MeanCache (B=15) MeanCache (B=13)
Latency 18.07 s 9.15 s 7.36 s 5.58 s 4.85 s
T2I

Comparisons on a Single H800 GPU

Content Consistency

Maintaining content consistency is a primary challenge for acceleration frameworks. Rare words, characterized by ambiguous semantics and low frequency, often lead to significant visual drift during the denoising process. MeanCache demonstrates superior potential in addressing this challenge.

Content Consistency Comparison

Figure 3: Content consistency under rare-word prompts "Matutinal".


MeanCache vs. LeMiCa

This benchmark evaluates the performance of MeanCache against LeMiCa using the Qwen-Image-2512 model as the base.

🚀 Efficiency Comparison

Baseline Latency (Original Qwen-Image-2512): 32.8s

Constraint Method Latency Speedup Time Reduction
$B=25$ LeMiCa 18.83 s 1.74x -
MeanCache 17.13 s 1.91x 9.0%
$B=17$ LeMiCa 14.35 s 2.29x -
MeanCache 11.67 s 2.81x 18.7%
$B=10$ LeMiCa 10.41 s 3.15x -
MeanCache 6.95 s 4.72x 33.2%

🎨 Quality Comparison

Constraint Method PSNR (↑) SSIM (↑) LPIPS (↓)
$B=25$ LeMiCa 29.20 0.945 0.065
MeanCache 29.46 0.944 0.057
$B=17$ LeMiCa 24.31 0.835 0.176
MeanCache 26.49 0.907 0.104
$B=10$ LeMiCa 17.80 0.637 0.368
MeanCache 19.44 0.767 0.237

BibTeX


      @inproceedings{gao2025meancache,
        title     = {MeanCache: From Instantaneous to Average Velocity for Accelerating Flow Matching Inference},
        author    = {Huanlin Gao and Ping Chen and Fuyuan Shi and Ruijia Wu and Yantao Li and Qiang Hui and Yuren You and Ting Lu and Chao Tan and Shaoan Zhao and Zhaoxiang Liu and Fang Zhao and Kai Wang and Shiguo Lian},
        booktitle = {International Conference on Learning Representations (ICLR)},
        year      = {2026},
        url       = {https://arxiv.org/abs/2601.19961}
      }