[VectorDistribute] Add VectorTileSizeAnalysis#23668
[VectorDistribute] Add VectorTileSizeAnalysis#23668sommerlukas wants to merge 6 commits intoiree-org:mainfrom
Conversation
6bae260 to
23f31ac
Compare
compiler/src/iree/compiler/Codegen/Common/test/materialize_vector_tile_sizes.mlir
Show resolved
Hide resolved
compiler/src/iree/compiler/Codegen/Common/VectorTileSizeAnalysis.cpp
Outdated
Show resolved
Hide resolved
compiler/src/iree/compiler/Codegen/Common/VectorTileSizeAnalysis.cpp
Outdated
Show resolved
Hide resolved
compiler/src/iree/compiler/Codegen/Common/VectorTileSizeAnalysis.cpp
Outdated
Show resolved
Hide resolved
compiler/src/iree/compiler/Codegen/Common/VectorTileSizeAnalysis.cpp
Outdated
Show resolved
Hide resolved
compiler/src/iree/compiler/Codegen/Common/VectorTileSizeAnalysis.cpp
Outdated
Show resolved
Hide resolved
37358a6 to
310c668
Compare
hanhanW
left a comment
There was a problem hiding this comment.
Thanks for the patch! I can see the value of this pass, which is an improved version of useConfiguredSizes in generic vectorization. I'm thinking to extend the pass (in a follow-up) which moves the useConfiguredSizes logic into the anslysis, which can work well with the getVectorSize lowering config interface method.
The main concern on my side is about multiple candidates. Isn't it a bug if it happens?
compiler/src/iree/compiler/Codegen/Common/VectorTileSizeAnalysis.cpp
Outdated
Show resolved
Hide resolved
|
Thanks for the feedback!
I don't think it's necessarily a bug in the IR, consider the scenario below: %empty = tensor.empty ...
%a = linalg.generic ins(...) outs(%empty) ...
%al = to_layout %a, layout with tile size 64
%b = linalg.generic ins(...) outs(%empty)
%c = linalg.generic ins(%b) ...
%cl = to_layout %c, layout with tile size 32
...Here, we would propagate tile size 64 from %al backward to CC @Groverkss who might have other scenarios in mind. |
Groverkss
left a comment
There was a problem hiding this comment.
LGTM modulo hanhan's comments. I think it's okay to just bail out on duplicate sizes for now and we can improve if needed.
|
I've updated the analysis to only track one size per dimension. In case of conflict, we now go to a top/overdefined state. Propagation from duplicatable operations is now completely disabled, which avoids conflict in the scenario I outlined above. The tracking of one size per dimension also facilitates adding the scalable flag to the analysis for CPU later on. |
hanhanW
left a comment
There was a problem hiding this comment.
Thanks for the feedback!
The main concern on my side is about multiple candidates. Isn't it a bug if it happens?
I don't think it's necessarily a bug in the IR, consider the scenario below:
%empty = tensor.empty ... %a = linalg.generic ins(...) outs(%empty) ... %al = to_layout %a, layout with tile size 64 %b = linalg.generic ins(...) outs(%empty) %c = linalg.generic ins(%b) ... %cl = to_layout %c, layout with tile size 32 ...Here, we would propagate tile size 64 from %al backward to
%aand then%empty. Then we propagate that tile size from%emptyforward through to%b, and%balso has tile size 64. If we then propagate the tile size 32 backward from%clthrough%cto%b, we have two tile sizes for%band need to resolve that somehow. If, instead of collecting a set of possible tile sizes for each candidate (and dimension), we would move to an overdefined top element in the lattice, we would lose tile size information for%b.CC @Groverkss who might have other scenarios in mind.
I still think that it is a bug if it happens. It means that the config generator has a bug. The example you showed is valid, and you have a good fix for it. We should stop propagation if it is tensor.empty() op. It is describing the shape of the result, but it is not a connection between two linalg.generic ops.
Anyway, the current revision mostly looks good. I left few comments, please take a look. Thanks!
compiler/src/iree/compiler/Codegen/Common/VectorTileSizeAnalysis.cpp
Outdated
Show resolved
Hide resolved
compiler/src/iree/compiler/Codegen/Common/VectorTileSizeAnalysis.cpp
Outdated
Show resolved
Hide resolved
compiler/src/iree/compiler/Codegen/Common/VectorTileSizeAnalysis.cpp
Outdated
Show resolved
Hide resolved
compiler/src/iree/compiler/Codegen/Common/VectorTileSizeAnalysis.cpp
Outdated
Show resolved
Hide resolved
compiler/src/iree/compiler/Codegen/Common/VectorTileSizeAnalysis.cpp
Outdated
Show resolved
Hide resolved
compiler/src/iree/compiler/Codegen/Common/VectorTileSizeAnalysis.cpp
Outdated
Show resolved
Hide resolved
compiler/src/iree/compiler/Codegen/Common/VectorTileSizeAnalysis.cpp
Outdated
Show resolved
Hide resolved
compiler/src/iree/compiler/Codegen/Common/VectorTileSizeAnalysis.cpp
Outdated
Show resolved
Hide resolved
compiler/src/iree/compiler/Codegen/Common/VectorTileSizeAnalysis.cpp
Outdated
Show resolved
Hide resolved
| if (!perDimSizes) { | ||
| return; | ||
| } |
There was a problem hiding this comment.
I think we should either emit a message or signal a failure when it happens.
There was a problem hiding this comment.
I think there's legitimate cases where operations don't get a tile size assigned (e.g., the anchor operation is inside a tiled loop), so emitting a warning/error here or even failing the pass isn't appropriate IMHO.
There was a problem hiding this comment.
I think we should emit errors or at least warnings if there are overdefined tile sizes. It means that there is a bug in the analysis or input program.
It is okay if there are uninitialized tile sizes, because some dimension may not be tiled due to odd reasons.
The point is over-defined is a bug to me.
Signed-off-by: Lukas Sommer <[email protected]>
Signed-off-by: Lukas Sommer <[email protected]>
Signed-off-by: Lukas Sommer <[email protected]>
Signed-off-by: Lukas Sommer <[email protected]>
Signed-off-by: Lukas Sommer <[email protected]>
sommerlukas
left a comment
There was a problem hiding this comment.
Thanks for the feedback @hanhanW! I've addressed the comments and put some more replies inline.
| // A linalg op that doesn't read any tensor data (e.g., linalg.fill or a | ||
| // fill-like linalg.generic broadcasting a scalar) is a generator and | ||
| // duplicatable. | ||
| if (auto linalgOp = dyn_cast<linalg::LinalgOp>(defOp)) { | ||
| if (llvm::none_of(linalgOp->getOpOperands(), [&](OpOperand &operand) { | ||
| return isa<ShapedType>(operand.get().getType()) && | ||
| linalgOp.payloadUsesValueFromOperand(&operand); | ||
| })) { | ||
| return true; | ||
| } | ||
| } |
There was a problem hiding this comment.
Unfortunately, verifyFillInterface emits diagnostics, even if we discarded the failure itself, so I don't think it's the right tool for this job.
| /// Map from operand space to iteration space via an indexing map. | ||
| TileSizes mapToIterationSpace(AffineMap indexingMap, | ||
| unsigned numLoops) const { | ||
| TileSizes result(numLoops); | ||
| for (unsigned i = 0; i < indexingMap.getNumResults(); ++i) { | ||
| auto dimExpr = dyn_cast<AffineDimExpr>(indexingMap.getResult(i)); | ||
| if (!dimExpr) { | ||
| continue; | ||
| } | ||
| unsigned iterDim = dimExpr.getPosition(); | ||
| result.dims[iterDim] = mergeDim(result.dims[iterDim], dims[i]); | ||
| } | ||
| return result; | ||
| } | ||
|
|
||
| /// Map from iteration space to operand space via an indexing map. | ||
| /// Returns empty TileSizes if any operand dim can't be determined. | ||
| TileSizes mapFromIterationSpace(AffineMap indexingMap) const { | ||
| unsigned numResults = indexingMap.getNumResults(); | ||
| TileSizes result(numResults); | ||
| for (unsigned i = 0; i < numResults; ++i) { | ||
| auto dimExpr = dyn_cast<AffineDimExpr>(indexingMap.getResult(i)); | ||
| if (!dimExpr) { | ||
| return {}; | ||
| } | ||
| unsigned iterDim = dimExpr.getPosition(); | ||
| if (iterDim >= rank() || dims[iterDim] == kUninitialized) { | ||
| return {}; | ||
| } | ||
| result.dims[i] = dims[iterDim]; | ||
| } | ||
| return result; | ||
| } |
There was a problem hiding this comment.
Do we assert if the given indexing map is a projected permutation?
I don't think we need to assert that here. If a result is not a dim expression, that dimension is skipped already and TileSizes that are not fully defined are never materialized.
Looking at the users of mapToIterationSpace, do you really need to pass in numLoops? It can be indexingMap.getNumDims(), if IIUC. We also need to check if the ranks match or not, IIUC.
I've removed numLoops and added an assert in all places using it to assert the correct rank.
I'm thinking if we rename them to scatterToIterationSpace and gatherFromIterationSpace. The original names are okay as well. I just feel scatter and gather sounds better, and it may be just my preference. What do you think?
For me personally, gather/scatter is always linked to data movement (gathering/scattering data elements into/from a bigger vector/matrix/...). What we are doing here is using an AffineMap to map information from operand to iteration space and vice versa. Therefore, IMHO, the current name is a good description.
compiler/src/iree/compiler/Codegen/Common/VectorTileSizeAnalysis.cpp
Outdated
Show resolved
Hide resolved
compiler/src/iree/compiler/Codegen/Common/VectorTileSizeAnalysis.cpp
Outdated
Show resolved
Hide resolved
compiler/src/iree/compiler/Codegen/Common/VectorTileSizeAnalysis.cpp
Outdated
Show resolved
Hide resolved
compiler/src/iree/compiler/Codegen/Common/VectorTileSizeAnalysis.cpp
Outdated
Show resolved
Hide resolved
compiler/src/iree/compiler/Codegen/Common/VectorTileSizeAnalysis.cpp
Outdated
Show resolved
Hide resolved
compiler/src/iree/compiler/Codegen/Common/VectorTileSizeAnalysis.cpp
Outdated
Show resolved
Hide resolved
compiler/src/iree/compiler/Codegen/Common/VectorTileSizeAnalysis.cpp
Outdated
Show resolved
Hide resolved
| if (!perDimSizes) { | ||
| return; | ||
| } |
There was a problem hiding this comment.
I think there's legitimate cases where operations don't get a tile size assigned (e.g., the anchor operation is inside a tiled loop), so emitting a warning/error here or even failing the pass isn't appropriate IMHO.
e59654b to
89700ca
Compare
hanhanW
left a comment
There was a problem hiding this comment.
I'll take a look at other code, but I think you forgot to delete old file (or make git mv).
Signed-off-by: Lukas Sommer <[email protected]>
hanhanW
left a comment
There was a problem hiding this comment.
Two high-level comments:
- I think most of
emptychecks intileSizesare redundant, see https://github.com/iree-org/iree/pull/23668/changes#r2966973979 for the reason - We should be consistent in terms of naming. I'm +1 on replacing all the
tswithtileSizes.
|
|
||
| /// Read the TileSizes from a lattice, returning empty tile sizes if the lattice | ||
| /// value is from a duplicatable operation. | ||
| static const TileSizes getTileSizesFor(Value val, |
There was a problem hiding this comment.
Delete the const: https://clang.llvm.org/extra/clang-tidy/checks/readability/const-return-type.html
When you return by value, the caller gets a new object. The const on the return type is saying "the caller can't modify this temporary." But temporaries are already about to be either moved or copied into the caller's variable:
// At the call site:
auto ts = getTileSizesFor(val, lattice); // const is irrelevant here
The caller's ts is a new independent copy. Whether the returned temporary was const or not doesn't affect what the caller can do with ts.
It is worse, because it prevents move semantics.
|
|
||
| unsigned rank() const { return dims.size(); } | ||
| bool empty() const { return dims.empty(); } | ||
| const llvm::SmallVector<int64_t> &getDims() const { return dims; } |
There was a problem hiding this comment.
I thought that you should return ArrayRef<int64_t>?
|
|
||
| int64_t operator[](unsigned i) const { return dims[i]; } | ||
|
|
||
| /// Returns true if all dimensions have a defined (positive) tile size. |
There was a problem hiding this comment.
I think it is better to call out empty. A caller might reasonably think zero dimensions, all zero of them are defined, so it's defined.
| /// Returns true if all dimensions have a defined (positive) tile size. | |
| /// Returns true if the tile sizes are non-empty and every dimension has a concrete tile size (not uninitialized or overdefined). |
| auto &ts = getTileSizesFor(operand.get(), | ||
| operands[operand.getOperandNumber()]); |
There was a problem hiding this comment.
It should just be TileSizes, not auto &. Please also update other places.
| for (unsigned i = 0; i < linalgOp.getNumDpsInits(); ++i) { | ||
| OpOperand *init = linalgOp.getDpsInitOperand(i); | ||
| AffineMap map = linalgOp.getMatchingIndexingMap(init); | ||
| auto resultTileSizes = iterTileSizes.mapFromIterationSpace(map); |
There was a problem hiding this comment.
Spell out the type, same for other places.
| #include "mlir/Dialect/Tensor/IR/Tensor.h" | ||
| #include "mlir/IR/SymbolTable.h" | ||
|
|
||
| #define DEBUG_TYPE "iree-codegen-vector-tile-size-analysis" |
There was a problem hiding this comment.
It should match to the pass name or filename.
| std::optional<SmallVector<int64_t>> perDimSizes = | ||
| getPerDimTileSizes(linalgOp, solver); | ||
| if (!perDimSizes) { | ||
| LDBG() << "Analysis did not determine tile size for" << *linalgOp; |
There was a problem hiding this comment.
| LDBG() << "Analysis did not determine tile size for" << *linalgOp; | |
| LDBG() << "Analysis did not determine tile size for " << *linalgOp; |
| return; | ||
| } | ||
|
|
||
| LDBG() << "Materializing tile size on " << *linalgOp; |
There was a problem hiding this comment.
This log is redundant to me, because it is easy to spot in the output IR. The above one is more useful to me, because you can grab failed cases quickly with the debug-only flag.
| /// Returns true if any dimension is overdefined. | ||
| bool isOverdefined() const { | ||
| return llvm::any_of(dims, [](int64_t v) { return v == kOverdefined; }); | ||
| } |
There was a problem hiding this comment.
FYI, this is not used at all.
| @@ -0,0 +1,281 @@ | |||
| // RUN: iree-opt --pass-pipeline='builtin.module(any(iree-codegen-materialize-vector-tile-sizes))' --split-input-file %s | FileCheck %s | |||
There was a problem hiding this comment.
This is my first time seeing any, and I learned it. I wonder if we should just use func.func because we usually see func.func in codegen IRs.
Allowing any may introduce inconsistency when people add tests. E.g., they may use util.func.
IIRC, IREE only has one any use and they are all new code (introduced by @Groverkss ). I feel that Claude uses it once and the rest just copy-over this.
Motivation for this change is to better support masking to make up for differences between actual tensor shape and tile sizes chosen to match the HW. For example, lowering config selection would be able to choose a tile size of 64, even if the input has size 127. To ensure correct computation, proper masking needs to be introduced in this case during
GenericVectorization. To that end,GenericVectorizationneeds reliable vector tile size information.The goal of the
VectorTileSizeAnalysisis to provide such information. The analysis is seeded fromto_layoutoperations, which in turn have been inserted based onlowering_config. TheVectorTileSizeAnalysispropagates the information through the graph. It uses the upstream MLIR dataflow analysis framework and combines a sparse forward and sparse backward analysis in a single solver with a shared lattice.Conflicts are resolved by an overdefined/"top" state. Consumers such as
GenericVectorizationcan then choose a candidate, e.g. based on heuristics.To make the vector tile size readily available in
GenericVectorization, theMaterializeVectorTileSizePassmaterializes the analysis results as discardable attribute on compute ops.This is part of #23415.
Assisted-by: Claude Code