[VectorDistribute] Add VectorTileSizeAnalysis by sommerlukas · Pull Request #23668 · iree-org/iree

sommerlukas · 2026-03-05T15:39:19Z

Motivation for this change is to better support masking to make up for differences between actual tensor shape and tile sizes chosen to match the HW. For example, lowering config selection would be able to choose a tile size of 64, even if the input has size 127. To ensure correct computation, proper masking needs to be introduced in this case during GenericVectorization. To that end, GenericVectorization needs reliable vector tile size information.

The goal of the VectorTileSizeAnalysis is to provide such information. The analysis is seeded from to_layout operations, which in turn have been inserted based on lowering_config. The VectorTileSizeAnalysis propagates the information through the graph. It uses the upstream MLIR dataflow analysis framework and combines a sparse forward and sparse backward analysis in a single solver with a shared lattice.

Conflicts are resolved by an overdefined/"top" state. Consumers such as GenericVectorization can then choose a candidate, e.g. based on heuristics.

To make the vector tile size readily available in GenericVectorization, the MaterializeVectorTileSizePass materializes the analysis results as discardable attribute on compute ops.

This is part of #23415.

Assisted-by: Claude Code

compiler/src/iree/compiler/Codegen/Common/test/materialize_vector_tile_sizes.mlir

compiler/src/iree/compiler/Codegen/Common/VectorTileSizeAnalysis.cpp

hanhanW

Thanks for the patch! I can see the value of this pass, which is an improved version of useConfiguredSizes in generic vectorization. I'm thinking to extend the pass (in a follow-up) which moves the useConfiguredSizes logic into the anslysis, which can work well with the getVectorSize lowering config interface method.

The main concern on my side is about multiple candidates. Isn't it a bug if it happens?

compiler/src/iree/compiler/Codegen/Common/VectorTileSizeAnalysis.cpp

sommerlukas · 2026-03-17T10:25:32Z

Thanks for the feedback!

The main concern on my side is about multiple candidates. Isn't it a bug if it happens?

I don't think it's necessarily a bug in the IR, consider the scenario below:

%empty = tensor.empty ...
%a = linalg.generic ins(...) outs(%empty) ...
%al = to_layout %a, layout with tile size 64
%b = linalg.generic ins(...) outs(%empty)
%c = linalg.generic ins(%b) ...
%cl = to_layout %c, layout with tile size 32
...

Here, we would propagate tile size 64 from %al backward to %a and then %empty. Then we propagate that tile size from %empty forward through to %b, and %b also has tile size 64. If we then propagate the tile size 32 backward from %cl through %c to %b, we have two tile sizes for %b and need to resolve that somehow. If, instead of collecting a set of possible tile sizes for each candidate (and dimension), we would move to an overdefined top element in the lattice, we would lose tile size information for %b.

CC @Groverkss who might have other scenarios in mind.

Groverkss

LGTM modulo hanhan's comments. I think it's okay to just bail out on duplicate sizes for now and we can improve if needed.

sommerlukas · 2026-03-18T15:22:24Z

I've updated the analysis to only track one size per dimension. In case of conflict, we now go to a top/overdefined state. Propagation from duplicatable operations is now completely disabled, which avoids conflict in the scenario I outlined above.

The tracking of one size per dimension also facilitates adding the scalable flag to the analysis for CPU later on.

hanhanW

Thanks for the feedback!

The main concern on my side is about multiple candidates. Isn't it a bug if it happens?

I don't think it's necessarily a bug in the IR, consider the scenario below:
%empty = tensor.empty ...
%a = linalg.generic ins(...) outs(%empty) ...
%al = to_layout %a, layout with tile size 64
%b = linalg.generic ins(...) outs(%empty)
%c = linalg.generic ins(%b) ...
%cl = to_layout %c, layout with tile size 32
...
Here, we would propagate tile size 64 from %al backward to %a and then %empty. Then we propagate that tile size from %empty forward through to %b, and %b also has tile size 64. If we then propagate the tile size 32 backward from %cl through %c to %b, we have two tile sizes for %b and need to resolve that somehow. If, instead of collecting a set of possible tile sizes for each candidate (and dimension), we would move to an overdefined top element in the lattice, we would lose tile size information for %b.

CC @Groverkss who might have other scenarios in mind.

I still think that it is a bug if it happens. It means that the config generator has a bug. The example you showed is valid, and you have a good fix for it. We should stop propagation if it is tensor.empty() op. It is describing the shape of the result, but it is not a connection between two linalg.generic ops.

Anyway, the current revision mostly looks good. I left few comments, please take a look. Thanks!

compiler/src/iree/compiler/Codegen/Common/VectorTileSizeAnalysis.cpp

hanhanW · 2026-03-18T22:22:24Z

compiler/src/iree/compiler/Codegen/Common/VectorTileSizeAnalysis.cpp

+      if (!perDimSizes) {
+        return;
+      }


I think we should either emit a message or signal a failure when it happens.

I think there's legitimate cases where operations don't get a tile size assigned (e.g., the anchor operation is inside a tiled loop), so emitting a warning/error here or even failing the pass isn't appropriate IMHO.

I think we should emit errors or at least warnings if there are overdefined tile sizes. It means that there is a bug in the analysis or input program.

It is okay if there are uninitialized tile sizes, because some dimension may not be tiled due to odd reasons.

The point is over-defined is a bug to me.

Signed-off-by: Lukas Sommer <[email protected]>

sommerlukas

Thanks for the feedback @hanhanW! I've addressed the comments and put some more replies inline.

sommerlukas · 2026-03-19T09:06:00Z

compiler/src/iree/compiler/Codegen/Common/VectorTileSizeAnalysis.cpp

+  // A linalg op that doesn't read any tensor data (e.g., linalg.fill or a
+  // fill-like linalg.generic broadcasting a scalar) is a generator and
+  // duplicatable.
+  if (auto linalgOp = dyn_cast<linalg::LinalgOp>(defOp)) {
+    if (llvm::none_of(linalgOp->getOpOperands(), [&](OpOperand &operand) {
+          return isa<ShapedType>(operand.get().getType()) &&
+                 linalgOp.payloadUsesValueFromOperand(&operand);
+        })) {
+      return true;
+    }
+  }


Unfortunately, verifyFillInterface emits diagnostics, even if we discarded the failure itself, so I don't think it's the right tool for this job.

sommerlukas · 2026-03-19T09:29:52Z

compiler/src/iree/compiler/Codegen/Common/VectorTileSizeAnalysis.cpp

+  /// Map from operand space to iteration space via an indexing map.
+  TileSizes mapToIterationSpace(AffineMap indexingMap,
+                                unsigned numLoops) const {
+    TileSizes result(numLoops);
+    for (unsigned i = 0; i < indexingMap.getNumResults(); ++i) {
+      auto dimExpr = dyn_cast<AffineDimExpr>(indexingMap.getResult(i));
+      if (!dimExpr) {
+        continue;
+      }
+      unsigned iterDim = dimExpr.getPosition();
+      result.dims[iterDim] = mergeDim(result.dims[iterDim], dims[i]);
+    }
+    return result;
+  }
+
+  /// Map from iteration space to operand space via an indexing map.
+  /// Returns empty TileSizes if any operand dim can't be determined.
+  TileSizes mapFromIterationSpace(AffineMap indexingMap) const {
+    unsigned numResults = indexingMap.getNumResults();
+    TileSizes result(numResults);
+    for (unsigned i = 0; i < numResults; ++i) {
+      auto dimExpr = dyn_cast<AffineDimExpr>(indexingMap.getResult(i));
+      if (!dimExpr) {
+        return {};
+      }
+      unsigned iterDim = dimExpr.getPosition();
+      if (iterDim >= rank() || dims[iterDim] == kUninitialized) {
+        return {};
+      }
+      result.dims[i] = dims[iterDim];
+    }
+    return result;
+  }


Do we assert if the given indexing map is a projected permutation?

I don't think we need to assert that here. If a result is not a dim expression, that dimension is skipped already and TileSizes that are not fully defined are never materialized.

Looking at the users of mapToIterationSpace, do you really need to pass in numLoops? It can be indexingMap.getNumDims(), if IIUC. We also need to check if the ranks match or not, IIUC.

I've removed numLoops and added an assert in all places using it to assert the correct rank.

I'm thinking if we rename them to scatterToIterationSpace and gatherFromIterationSpace. The original names are okay as well. I just feel scatter and gather sounds better, and it may be just my preference. What do you think?

For me personally, gather/scatter is always linked to data movement (gathering/scattering data elements into/from a bigger vector/matrix/...). What we are doing here is using an AffineMap to map information from operand to iteration space and vice versa. Therefore, IMHO, the current name is a good description.

compiler/src/iree/compiler/Codegen/Common/VectorTileSizeAnalysis.cpp

sommerlukas · 2026-03-19T10:02:35Z

compiler/src/iree/compiler/Codegen/Common/VectorTileSizeAnalysis.cpp

+      if (!perDimSizes) {
+        return;
+      }


I think there's legitimate cases where operations don't get a tile size assigned (e.g., the anchor operation is inside a tiled loop), so emitting a warning/error here or even failing the pass isn't appropriate IMHO.

hanhanW

I'll take a look at other code, but I think you forgot to delete old file (or make git mv).

Signed-off-by: Lukas Sommer <[email protected]>

hanhanW

Two high-level comments:

I think most of empty checks in tileSizes are redundant, see https://github.com/iree-org/iree/pull/23668/changes#r2966973979 for the reason
We should be consistent in terms of naming. I'm +1 on replacing all the ts with tileSizes.

hanhanW · 2026-03-20T16:55:09Z