Skip to content

[VectorDistribute] Add VectorTileSizeAnalysis#23668

Open
sommerlukas wants to merge 6 commits intoiree-org:mainfrom
sommerlukas:tile-size-analysis
Open

[VectorDistribute] Add VectorTileSizeAnalysis#23668
sommerlukas wants to merge 6 commits intoiree-org:mainfrom
sommerlukas:tile-size-analysis

Conversation

@sommerlukas
Copy link
Contributor

@sommerlukas sommerlukas commented Mar 5, 2026

Motivation for this change is to better support masking to make up for differences between actual tensor shape and tile sizes chosen to match the HW. For example, lowering config selection would be able to choose a tile size of 64, even if the input has size 127. To ensure correct computation, proper masking needs to be introduced in this case during GenericVectorization. To that end, GenericVectorization needs reliable vector tile size information.

The goal of the VectorTileSizeAnalysis is to provide such information. The analysis is seeded from to_layout operations, which in turn have been inserted based on lowering_config. The VectorTileSizeAnalysis propagates the information through the graph. It uses the upstream MLIR dataflow analysis framework and combines a sparse forward and sparse backward analysis in a single solver with a shared lattice.

Conflicts are resolved by an overdefined/"top" state. Consumers such as GenericVectorization can then choose a candidate, e.g. based on heuristics.

To make the vector tile size readily available in GenericVectorization, the MaterializeVectorTileSizePass materializes the analysis results as discardable attribute on compute ops.

This is part of #23415.

Assisted-by: Claude Code

Copy link
Contributor

@hanhanW hanhanW left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the patch! I can see the value of this pass, which is an improved version of useConfiguredSizes in generic vectorization. I'm thinking to extend the pass (in a follow-up) which moves the useConfiguredSizes logic into the anslysis, which can work well with the getVectorSize lowering config interface method.

The main concern on my side is about multiple candidates. Isn't it a bug if it happens?

@sommerlukas
Copy link
Contributor Author

sommerlukas commented Mar 17, 2026

Thanks for the feedback!

The main concern on my side is about multiple candidates. Isn't it a bug if it happens?

I don't think it's necessarily a bug in the IR, consider the scenario below:

%empty = tensor.empty ...
%a = linalg.generic ins(...) outs(%empty) ...
%al = to_layout %a, layout with tile size 64
%b = linalg.generic ins(...) outs(%empty)
%c = linalg.generic ins(%b) ...
%cl = to_layout %c, layout with tile size 32
...

Here, we would propagate tile size 64 from %al backward to %a and then %empty. Then we propagate that tile size from %empty forward through to %b, and %b also has tile size 64. If we then propagate the tile size 32 backward from %cl through %c to %b, we have two tile sizes for %b and need to resolve that somehow. If, instead of collecting a set of possible tile sizes for each candidate (and dimension), we would move to an overdefined top element in the lattice, we would lose tile size information for %b.

CC @Groverkss who might have other scenarios in mind.

Copy link
Contributor

@Groverkss Groverkss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM modulo hanhan's comments. I think it's okay to just bail out on duplicate sizes for now and we can improve if needed.

@sommerlukas
Copy link
Contributor Author

I've updated the analysis to only track one size per dimension. In case of conflict, we now go to a top/overdefined state. Propagation from duplicatable operations is now completely disabled, which avoids conflict in the scenario I outlined above.

The tracking of one size per dimension also facilitates adding the scalable flag to the analysis for CPU later on.

@sommerlukas sommerlukas requested a review from hanhanW March 18, 2026 15:22
Copy link
Contributor

@hanhanW hanhanW left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the feedback!

The main concern on my side is about multiple candidates. Isn't it a bug if it happens?

I don't think it's necessarily a bug in the IR, consider the scenario below:

%empty = tensor.empty ...
%a = linalg.generic ins(...) outs(%empty) ...
%al = to_layout %a, layout with tile size 64
%b = linalg.generic ins(...) outs(%empty)
%c = linalg.generic ins(%b) ...
%cl = to_layout %c, layout with tile size 32
...

Here, we would propagate tile size 64 from %al backward to %a and then %empty. Then we propagate that tile size from %empty forward through to %b, and %b also has tile size 64. If we then propagate the tile size 32 backward from %cl through %c to %b, we have two tile sizes for %b and need to resolve that somehow. If, instead of collecting a set of possible tile sizes for each candidate (and dimension), we would move to an overdefined top element in the lattice, we would lose tile size information for %b.

CC @Groverkss who might have other scenarios in mind.

I still think that it is a bug if it happens. It means that the config generator has a bug. The example you showed is valid, and you have a good fix for it. We should stop propagation if it is tensor.empty() op. It is describing the shape of the result, but it is not a connection between two linalg.generic ops.

Anyway, the current revision mostly looks good. I left few comments, please take a look. Thanks!

Comment on lines +492 to +494
if (!perDimSizes) {
return;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should either emit a message or signal a failure when it happens.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there's legitimate cases where operations don't get a tile size assigned (e.g., the anchor operation is inside a tiled loop), so emitting a warning/error here or even failing the pass isn't appropriate IMHO.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should emit errors or at least warnings if there are overdefined tile sizes. It means that there is a bug in the analysis or input program.

It is okay if there are uninitialized tile sizes, because some dimension may not be tiled due to odd reasons.

The point is over-defined is a bug to me.

Copy link
Contributor Author

@sommerlukas sommerlukas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the feedback @hanhanW! I've addressed the comments and put some more replies inline.

Comment on lines +220 to +230
// A linalg op that doesn't read any tensor data (e.g., linalg.fill or a
// fill-like linalg.generic broadcasting a scalar) is a generator and
// duplicatable.
if (auto linalgOp = dyn_cast<linalg::LinalgOp>(defOp)) {
if (llvm::none_of(linalgOp->getOpOperands(), [&](OpOperand &operand) {
return isa<ShapedType>(operand.get().getType()) &&
linalgOp.payloadUsesValueFromOperand(&operand);
})) {
return true;
}
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, verifyFillInterface emits diagnostics, even if we discarded the failure itself, so I don't think it's the right tool for this job.

Comment on lines +141 to +173
/// Map from operand space to iteration space via an indexing map.
TileSizes mapToIterationSpace(AffineMap indexingMap,
unsigned numLoops) const {
TileSizes result(numLoops);
for (unsigned i = 0; i < indexingMap.getNumResults(); ++i) {
auto dimExpr = dyn_cast<AffineDimExpr>(indexingMap.getResult(i));
if (!dimExpr) {
continue;
}
unsigned iterDim = dimExpr.getPosition();
result.dims[iterDim] = mergeDim(result.dims[iterDim], dims[i]);
}
return result;
}

/// Map from iteration space to operand space via an indexing map.
/// Returns empty TileSizes if any operand dim can't be determined.
TileSizes mapFromIterationSpace(AffineMap indexingMap) const {
unsigned numResults = indexingMap.getNumResults();
TileSizes result(numResults);
for (unsigned i = 0; i < numResults; ++i) {
auto dimExpr = dyn_cast<AffineDimExpr>(indexingMap.getResult(i));
if (!dimExpr) {
return {};
}
unsigned iterDim = dimExpr.getPosition();
if (iterDim >= rank() || dims[iterDim] == kUninitialized) {
return {};
}
result.dims[i] = dims[iterDim];
}
return result;
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we assert if the given indexing map is a projected permutation?

I don't think we need to assert that here. If a result is not a dim expression, that dimension is skipped already and TileSizes that are not fully defined are never materialized.

Looking at the users of mapToIterationSpace, do you really need to pass in numLoops? It can be indexingMap.getNumDims(), if IIUC. We also need to check if the ranks match or not, IIUC.

I've removed numLoops and added an assert in all places using it to assert the correct rank.

I'm thinking if we rename them to scatterToIterationSpace and gatherFromIterationSpace. The original names are okay as well. I just feel scatter and gather sounds better, and it may be just my preference. What do you think?

For me personally, gather/scatter is always linked to data movement (gathering/scattering data elements into/from a bigger vector/matrix/...). What we are doing here is using an AffineMap to map information from operand to iteration space and vice versa. Therefore, IMHO, the current name is a good description.

Comment on lines +492 to +494
if (!perDimSizes) {
return;
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there's legitimate cases where operations don't get a tile size assigned (e.g., the anchor operation is inside a tiled loop), so emitting a warning/error here or even failing the pass isn't appropriate IMHO.

@sommerlukas sommerlukas requested a review from hanhanW March 19, 2026 12:12
Copy link
Contributor

@hanhanW hanhanW left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll take a look at other code, but I think you forgot to delete old file (or make git mv).

Copy link
Contributor

@hanhanW hanhanW left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two high-level comments:


/// Read the TileSizes from a lattice, returning empty tile sizes if the lattice
/// value is from a duplicatable operation.
static const TileSizes getTileSizesFor(Value val,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Delete the const: https://clang.llvm.org/extra/clang-tidy/checks/readability/const-return-type.html

When you return by value, the caller gets a new object. The const on the return type is saying "the caller can't modify this temporary." But temporaries are already about to be either moved or copied into the caller's variable:

  // At the call site:
  auto ts = getTileSizesFor(val, lattice);  // const is irrelevant here

The caller's ts is a new independent copy. Whether the returned temporary was const or not doesn't affect what the caller can do with ts.

It is worse, because it prevents move semantics.


unsigned rank() const { return dims.size(); }
bool empty() const { return dims.empty(); }
const llvm::SmallVector<int64_t> &getDims() const { return dims; }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought that you should return ArrayRef<int64_t>?


int64_t operator[](unsigned i) const { return dims[i]; }

/// Returns true if all dimensions have a defined (positive) tile size.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is better to call out empty. A caller might reasonably think zero dimensions, all zero of them are defined, so it's defined.

Suggested change
/// Returns true if all dimensions have a defined (positive) tile size.
/// Returns true if the tile sizes are non-empty and every dimension has a concrete tile size (not uninitialized or overdefined).

Comment on lines +283 to +284
auto &ts = getTileSizesFor(operand.get(),
operands[operand.getOperandNumber()]);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should just be TileSizes, not auto &. Please also update other places.

for (unsigned i = 0; i < linalgOp.getNumDpsInits(); ++i) {
OpOperand *init = linalgOp.getDpsInitOperand(i);
AffineMap map = linalgOp.getMatchingIndexingMap(init);
auto resultTileSizes = iterTileSizes.mapFromIterationSpace(map);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Spell out the type, same for other places.

#include "mlir/Dialect/Tensor/IR/Tensor.h"
#include "mlir/IR/SymbolTable.h"

#define DEBUG_TYPE "iree-codegen-vector-tile-size-analysis"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should match to the pass name or filename.

std::optional<SmallVector<int64_t>> perDimSizes =
getPerDimTileSizes(linalgOp, solver);
if (!perDimSizes) {
LDBG() << "Analysis did not determine tile size for" << *linalgOp;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
LDBG() << "Analysis did not determine tile size for" << *linalgOp;
LDBG() << "Analysis did not determine tile size for " << *linalgOp;

return;
}

LDBG() << "Materializing tile size on " << *linalgOp;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This log is redundant to me, because it is easy to spot in the output IR. The above one is more useful to me, because you can grab failed cases quickly with the debug-only flag.

Comment on lines +97 to +100
/// Returns true if any dimension is overdefined.
bool isOverdefined() const {
return llvm::any_of(dims, [](int64_t v) { return v == kOverdefined; });
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI, this is not used at all.

@@ -0,0 +1,281 @@
// RUN: iree-opt --pass-pipeline='builtin.module(any(iree-codegen-materialize-vector-tile-sizes))' --split-input-file %s | FileCheck %s
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is my first time seeing any, and I learned it. I wonder if we should just use func.func because we usually see func.func in codegen IRs.

Allowing any may introduce inconsistency when people add tests. E.g., they may use util.func.

IIRC, IREE only has one any use and they are all new code (introduced by @Groverkss ). I feel that Claude uses it once and the rest just copy-over this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants