[Codegen] Migrate all ops to VectorizableOpInterface and remove TypeSwitch#23713
[Codegen] Migrate all ops to VectorizableOpInterface and remove TypeSwitch#23713
Conversation
| if (succeeded(result)) { | ||
| rewriter.replaceOp(op, *result); | ||
| } |
There was a problem hiding this comment.
Should we signal pass failure here if the result is failure? The interface documentation says returning failure after isVectorizable returned true is a bug, so maybe we should signal this here. Otherwise, we will have silent failures that we need to track back to here later on.
There was a problem hiding this comment.
Good point, let me give it a shot. It aims to match current behavior, but let's see if it triggers silent bugs or not.
There was a problem hiding this comment.
It triggers a bug in CUDA backend: #23750
I turned it back with additional debug message. I think it is better to bypass the upstream bug (if exists) so the compiler can be more stable.
| #include "iree/compiler/Dialect/LinalgExt/Utils/Utils.h" | ||
| #include "mlir/Dialect/Linalg/IR/Linalg.h" | ||
| #include "mlir/Dialect/Linalg/Transforms/Transforms.h" | ||
| #include "mlir/Dialect/Arith/IR/Arith.h" |
There was a problem hiding this comment.
Why do we need a new header if we only delete code?
There was a problem hiding this comment.
Good catch! I'll remove it and the below VectorOps.h.
There was a problem hiding this comment.
They were required because we have dialect deps in Passes.td. I remove all of them.
There was a problem hiding this comment.
(Previously, it relied on indirect include to solve the issue. That's why it's added by Claude.)
compiler/src/iree/compiler/Codegen/Dialect/VectorExt/Transforms/Transforms.h
Show resolved
Hide resolved
| } | ||
|
|
||
| for (Value val : newOutputs) { | ||
| Value out = tensor::EmptyOp::create(rewriter, fullOp.getLoc(), vectorSizes, |
There was a problem hiding this comment.
Nit: We could modernize this code with an ImplictLocOpBuilder.
There was a problem hiding this comment.
I'm not convinced that ImplictLocOpBuilder is the modern builder. It's out for a long time, and we don't use it. The past usage that I've seen is mostly in stablehlo project, FYI, but that seems to be someone's preference.
I agree that it may be a proper util for op creation that requires body though, like generic op.
compiler/src/iree/compiler/Codegen/Interfaces/VectorizableOpInterface.cpp
Show resolved
Hide resolved
| // Filter out PadOp/PackOp/UnPackOp when masking is disabled. | ||
| // TODO(hanchung): Enable the vectorization without masking. This is mostly | ||
| // legacy code because it used to not working without masking. |
There was a problem hiding this comment.
Probably not as relevant here, as the dependency on masking should go away, but when should such a check happen in the driver vs. when should we pass the option to isVectorizable and decide there?
There was a problem hiding this comment.
Passing InputVectorSize implies masking has been the default behavior for a long time. This option is used to triggering whether a vector size inference is required or not (see below). Reason: some targets do not support masking and it will require emulation in this context. It provides a path to not vectorize the op for some targets like old aarch64 targets.
Just for more context: the code is legacy code that the option was added because the non-masking implementation was done after masking support.
| FailureOr<linalg::VectorizationResult> result = linalg::vectorize( | ||
| rewriter, op, vectorSizes, scalableDims, vectorizeNDExtract, | ||
| flatten1DDepthwiseConv, /*assumeDynamicDimsMatchVecSizes=*/false, | ||
| createNamedContraction); |
There was a problem hiding this comment.
I would eventually like to move away from upstream vectorization, but nothing actionable for now.
There was a problem hiding this comment.
That would be tricky as we may need to dup the code for other ops like elementwise-op, contraction, reduction, etc.
ba02055 to
dcccae8
Compare
a3305bc to
d5dfe0f
Compare
| // Filter out PadOp/PackOp/UnPackOp when masking is disabled. | ||
| // TODO(hanchung): Enable the vectorization without masking. This is mostly | ||
| // legacy code because it used to not working without masking. |
There was a problem hiding this comment.
Passing InputVectorSize implies masking has been the default behavior for a long time. This option is used to triggering whether a vector size inference is required or not (see below). Reason: some targets do not support masking and it will require emulation in this context. It provides a path to not vectorize the op for some targets like old aarch64 targets.
Just for more context: the code is legacy code that the option was added because the non-masking implementation was done after masking support.
| if (succeeded(result)) { | ||
| rewriter.replaceOp(op, *result); | ||
| } |
There was a problem hiding this comment.
Good point, let me give it a shot. It aims to match current behavior, but let's see if it triggers silent bugs or not.
compiler/src/iree/compiler/Codegen/Dialect/VectorExt/Transforms/Transforms.h
Show resolved
Hide resolved
| #include "iree/compiler/Dialect/LinalgExt/Utils/Utils.h" | ||
| #include "mlir/Dialect/Linalg/IR/Linalg.h" | ||
| #include "mlir/Dialect/Linalg/Transforms/Transforms.h" | ||
| #include "mlir/Dialect/Arith/IR/Arith.h" |
There was a problem hiding this comment.
Good catch! I'll remove it and the below VectorOps.h.
| } | ||
|
|
||
| for (Value val : newOutputs) { | ||
| Value out = tensor::EmptyOp::create(rewriter, fullOp.getLoc(), vectorSizes, |
There was a problem hiding this comment.
I'm not convinced that ImplictLocOpBuilder is the modern builder. It's out for a long time, and we don't use it. The past usage that I've seen is mostly in stablehlo project, FYI, but that seems to be someone's preference.
I agree that it may be a proper util for op creation that requires body though, like generic op.
| FailureOr<linalg::VectorizationResult> result = linalg::vectorize( | ||
| rewriter, op, vectorSizes, scalableDims, vectorizeNDExtract, | ||
| flatten1DDepthwiseConv, /*assumeDynamicDimsMatchVecSizes=*/false, | ||
| createNamedContraction); |
There was a problem hiding this comment.
That would be tricky as we may need to dup the code for other ops like elementwise-op, contraction, reduction, etc.
compiler/src/iree/compiler/Codegen/Interfaces/VectorizableOpInterface.cpp
Outdated
Show resolved
Hide resolved
dcccae8 to
c8f23d0
Compare
d5dfe0f to
355a2cd
Compare
c8f23d0 to
54492ef
Compare
355a2cd to
d9fbaa3
Compare
| // Decompose vectorized+bufferized map_store ops before lowering to loops. | ||
| .addPass(IREE::LinalgExt::createDecomposeMapStorePass) |
There was a problem hiding this comment.
I need to drop this commit, it was an experimental change.
|
|
||
| // Build DictionaryAttr options from pass options. These are forwarded to | ||
| // upstream linalg::vectorize(). | ||
| SmallVector<NamedAttribute> linalgOptionsList; |
There was a problem hiding this comment.
nit: We know there are at most 2 elements
| SmallVector<NamedAttribute> linalgOptionsList; | |
| SmallVector<NamedAttribute, 2> linalgOptionsList; |
There was a problem hiding this comment.
Note that it is only true in this snapshot, but it is not true for the existing upstream API: https://github.com/llvm/llvm-project/blob/3f65a03e8abb3e6fb3372cf4c254d6c9f090e2e0/mlir/lib/Dialect/Linalg/Transforms/Vectorization.cpp#L2728-L2732
FailureOr<VectorizationResult> mlir::linalg::vectorize(
RewriterBase &rewriter, Operation *op, ArrayRef<int64_t> inputVectorSizes,
ArrayRef<bool> inputScalableVecDims, bool vectorizeNDExtract,
bool flatten1DDepthwiseConv, bool assumeDynamicDimsMatchVecSizes,
bool createNamedContraction) {
I'm happy to make the change for this snapshot, but I'd like to point out that it's like asking future contributors to update the size when they add new options.
| SmallVector<NamedAttribute> linalgOptionsList; | ||
| linalgOptionsList.push_back( | ||
| rewriter.getNamedAttr("vectorizeNDExtract", rewriter.getBoolAttr(true))); | ||
| if (vectorizeToTransferGather) { | ||
| linalgOptionsList.push_back(rewriter.getNamedAttr( | ||
| "vectorizeToTransferGather", rewriter.getBoolAttr(true))); | ||
| } |
There was a problem hiding this comment.
nit: Don't use Builder::getNamedAttr, you can just pass in a StringRef to the NamedAttribute constructor:
| SmallVector<NamedAttribute> linalgOptionsList; | |
| linalgOptionsList.push_back( | |
| rewriter.getNamedAttr("vectorizeNDExtract", rewriter.getBoolAttr(true))); | |
| if (vectorizeToTransferGather) { | |
| linalgOptionsList.push_back(rewriter.getNamedAttr( | |
| "vectorizeToTransferGather", rewriter.getBoolAttr(true))); | |
| } | |
| SmallVector<NamedAttribute> linalgOptionsList = {{"vectorizeNDExtract", rewriter.getBoolAttr(true)}}; | |
| if (vectorizeToTransferGather) { | |
| linalgOptionsList.emplace_back( | |
| "vectorizeToTransferGather", rewriter.getBoolAttr(true)); | |
| } |
There was a problem hiding this comment.
Is it your coding preference? I don't see an issue of using Builder::getNamedAttr. If we already construct things from builder, I'd lean to being consistent.
In terms of LSP intergration, the function signature is provided in this case. Using brace initializers does not provide the signature easily to the tools (at least it does not work for my tools).
compiler/src/iree/compiler/Codegen/Common/GenericVectorization.cpp
Outdated
Show resolved
Hide resolved
| buildPartialGenericOp(RewriterBase &rewriter, linalg::GenericOp fullOp, | ||
| ArrayRef<int64_t> vectorSizes, | ||
| SmallVector<Operation *> partial, | ||
| DenseMap<Value, std::pair<Value, AffineMap>> &tmap) { |
There was a problem hiding this comment.
What does tmap stand for? I think this would be worth expanding or adding a docstring for.
There was a problem hiding this comment.
It is explained in the below comment. Let me add a function comment and move the documentation there.
compiler/src/iree/compiler/Codegen/Interfaces/VectorizableOpInterface.cpp
Outdated
Show resolved
Hide resolved
compiler/src/iree/compiler/Codegen/Interfaces/VectorizableOpInterface.cpp
Outdated
Show resolved
Hide resolved
compiler/src/iree/compiler/Codegen/Interfaces/VectorizableOpInterface.cpp
Outdated
Show resolved
Hide resolved
compiler/src/iree/compiler/Codegen/Interfaces/VectorizableOpInterface.cpp
Outdated
Show resolved
Hide resolved
compiler/src/iree/compiler/Codegen/Interfaces/VectorizableOpInterface.cpp
Outdated
Show resolved
Hide resolved
compiler/src/iree/compiler/Codegen/Interfaces/VectorizableOpInterface.cpp
Outdated
Show resolved
Hide resolved
…witch Signed-off-by: hanhanW <[email protected]>
Signed-off-by: hanhanW <[email protected]>
Signed-off-by: hanhanW <[email protected]>
Signed-off-by: hanhanW <[email protected]>
d9fbaa3 to
63a8891
Compare
Signed-off-by: hanhanW <[email protected]>
Signed-off-by: hanhanW <[email protected]>
compiler/src/iree/compiler/Codegen/Dialect/VectorExt/Transforms/Passes.td
Show resolved
Hide resolved
| if (failed(result)) { | ||
| return signalPassFailure(); | ||
| } |
There was a problem hiding this comment.
I'm a bit confused between this code change and the reply to my previous comment. Does the bug #23750 currently mean we can't signal pass failure here or can we just keep this change?
There was a problem hiding this comment.
I'm sorry about the confusion. Apparently, I made changes without commiting. 🤦♂️
There was a problem hiding this comment.
What happens with operations that remain unvectorized because their vectorization failed? Will they be handled by some alternative path or will they just fail later in the pipeline?
There was a problem hiding this comment.
It depends on pipeline setup, but the basic lowering should handle it. They should be lowered to scalar codes. There may be bufferization (like large stack buffer or shared memory) though.
There was a problem hiding this comment.
I find it difficult to say what the best strategy is here. I agree with your point that failing the pass here could mean that upstream changes/bugs cause instability. On the other hand, silently falling back to scalar code could either be inefficient (slow, which we would only catch in benchmarks, potentially with some delay that requires bisecting) or mis-compilation later on in the pipeline (stack buffer too large etc.).
The right choice probably depends on how bad the fallback can become. As there is an existing bug, maybe we can stick with this approach for now and add a TODO to add the pass failure later.
There was a problem hiding this comment.
We can add an option to that escalates silent failure to an actual failure for such case, so it may be easier to discover when we see bnehcmark regressions. This will require more work because there may be other similar "bugs" when we look at all the pipelines.
iree/compiler/src/iree/compiler/Codegen/Utils/CodegenOptions.h
Lines 14 to 29 in ccebc03
Signed-off-by: hanhanW <[email protected]>
Signed-off-by: hanhanW <[email protected]>
| if (failed(result)) { | ||
| return signalPassFailure(); | ||
| } |
There was a problem hiding this comment.
I find it difficult to say what the best strategy is here. I agree with your point that failing the pass here could mean that upstream changes/bugs cause instability. On the other hand, silently falling back to scalar code could either be inefficient (slow, which we would only catch in benchmarks, potentially with some delay that requires bisecting) or mis-compilation later on in the pipeline (stack buffer too large etc.).
The right choice probably depends on how bad the fallback can become. As there is an existing bug, maybe we can stick with this approach for now and add a TODO to add the pass failure later.
The revision inlines the implementation from VectorExt transforms to avoid circular dependency. Now all the ops are vectorized through the interface, so TypeSwitch is removed from driver.
It also updates
vectorizeGatherLikeGenericToTransferGatherto not rewrite the result internally. It's expected to be replaced by drivers.It is the last step towards https://lists.lfaidata.foundation/g/iree-technical-discussion/message/15
Assisted-by: Claude