Skip to content

[Codegen] Migrate all ops to VectorizableOpInterface and remove TypeSwitch#23713

Merged
hanhanW merged 8 commits intomainfrom
users/hanhanW/vec-iface-c3b
Mar 12, 2026
Merged

[Codegen] Migrate all ops to VectorizableOpInterface and remove TypeSwitch#23713
hanhanW merged 8 commits intomainfrom
users/hanhanW/vec-iface-c3b

Conversation

@hanhanW
Copy link
Contributor

@hanhanW hanhanW commented Mar 9, 2026

The revision inlines the implementation from VectorExt transforms to avoid circular dependency. Now all the ops are vectorized through the interface, so TypeSwitch is removed from driver.

It also updates vectorizeGatherLikeGenericToTransferGather to not rewrite the result internally. It's expected to be replaced by drivers.

It is the last step towards https://lists.lfaidata.foundation/g/iree-technical-discussion/message/15

Assisted-by: Claude

Comment on lines +234 to +236
if (succeeded(result)) {
rewriter.replaceOp(op, *result);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we signal pass failure here if the result is failure? The interface documentation says returning failure after isVectorizable returned true is a bug, so maybe we should signal this here. Otherwise, we will have silent failures that we need to track back to here later on.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, let me give it a shot. It aims to match current behavior, but let's see if it triggers silent bugs or not.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It triggers a bug in CUDA backend: #23750

I turned it back with additional debug message. I think it is better to bypass the upstream bug (if exists) so the compiler can be more stable.

#include "iree/compiler/Dialect/LinalgExt/Utils/Utils.h"
#include "mlir/Dialect/Linalg/IR/Linalg.h"
#include "mlir/Dialect/Linalg/Transforms/Transforms.h"
#include "mlir/Dialect/Arith/IR/Arith.h"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need a new header if we only delete code?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! I'll remove it and the below VectorOps.h.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They were required because we have dialect deps in Passes.td. I remove all of them.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Previously, it relied on indirect include to solve the issue. That's why it's added by Claude.)

}

for (Value val : newOutputs) {
Value out = tensor::EmptyOp::create(rewriter, fullOp.getLoc(), vectorSizes,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: We could modernize this code with an ImplictLocOpBuilder.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not convinced that ImplictLocOpBuilder is the modern builder. It's out for a long time, and we don't use it. The past usage that I've seen is mostly in stablehlo project, FYI, but that seems to be someone's preference.

I agree that it may be a proper util for op creation that requires body though, like generic op.

Comment on lines +182 to +184
// Filter out PadOp/PackOp/UnPackOp when masking is disabled.
// TODO(hanchung): Enable the vectorization without masking. This is mostly
// legacy code because it used to not working without masking.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably not as relevant here, as the dependency on masking should go away, but when should such a check happen in the driver vs. when should we pass the option to isVectorizable and decide there?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Passing InputVectorSize implies masking has been the default behavior for a long time. This option is used to triggering whether a vector size inference is required or not (see below). Reason: some targets do not support masking and it will require emulation in this context. It provides a path to not vectorize the op for some targets like old aarch64 targets.

Just for more context: the code is legacy code that the option was added because the non-masking implementation was done after masking support.

Comment on lines +778 to +781
FailureOr<linalg::VectorizationResult> result = linalg::vectorize(
rewriter, op, vectorSizes, scalableDims, vectorizeNDExtract,
flatten1DDepthwiseConv, /*assumeDynamicDimsMatchVecSizes=*/false,
createNamedContraction);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would eventually like to move away from upstream vectorization, but nothing actionable for now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would be tricky as we may need to dup the code for other ops like elementwise-op, contraction, reduction, etc.

@hanhanW hanhanW force-pushed the users/hanhanW/vec-iface-c3a branch from ba02055 to dcccae8 Compare March 10, 2026 17:46
@hanhanW hanhanW force-pushed the users/hanhanW/vec-iface-c3b branch from a3305bc to d5dfe0f Compare March 10, 2026 17:46
Comment on lines +182 to +184
// Filter out PadOp/PackOp/UnPackOp when masking is disabled.
// TODO(hanchung): Enable the vectorization without masking. This is mostly
// legacy code because it used to not working without masking.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Passing InputVectorSize implies masking has been the default behavior for a long time. This option is used to triggering whether a vector size inference is required or not (see below). Reason: some targets do not support masking and it will require emulation in this context. It provides a path to not vectorize the op for some targets like old aarch64 targets.

Just for more context: the code is legacy code that the option was added because the non-masking implementation was done after masking support.

Comment on lines +234 to +236
if (succeeded(result)) {
rewriter.replaceOp(op, *result);
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, let me give it a shot. It aims to match current behavior, but let's see if it triggers silent bugs or not.

#include "iree/compiler/Dialect/LinalgExt/Utils/Utils.h"
#include "mlir/Dialect/Linalg/IR/Linalg.h"
#include "mlir/Dialect/Linalg/Transforms/Transforms.h"
#include "mlir/Dialect/Arith/IR/Arith.h"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! I'll remove it and the below VectorOps.h.

}

for (Value val : newOutputs) {
Value out = tensor::EmptyOp::create(rewriter, fullOp.getLoc(), vectorSizes,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not convinced that ImplictLocOpBuilder is the modern builder. It's out for a long time, and we don't use it. The past usage that I've seen is mostly in stablehlo project, FYI, but that seems to be someone's preference.

I agree that it may be a proper util for op creation that requires body though, like generic op.

Comment on lines +778 to +781
FailureOr<linalg::VectorizationResult> result = linalg::vectorize(
rewriter, op, vectorSizes, scalableDims, vectorizeNDExtract,
flatten1DDepthwiseConv, /*assumeDynamicDimsMatchVecSizes=*/false,
createNamedContraction);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would be tricky as we may need to dup the code for other ops like elementwise-op, contraction, reduction, etc.

@hanhanW hanhanW force-pushed the users/hanhanW/vec-iface-c3a branch from dcccae8 to c8f23d0 Compare March 10, 2026 23:07
@hanhanW hanhanW force-pushed the users/hanhanW/vec-iface-c3b branch from d5dfe0f to 355a2cd Compare March 10, 2026 23:08
@hanhanW hanhanW force-pushed the users/hanhanW/vec-iface-c3a branch from c8f23d0 to 54492ef Compare March 11, 2026 01:16
@hanhanW hanhanW force-pushed the users/hanhanW/vec-iface-c3b branch from 355a2cd to d9fbaa3 Compare March 11, 2026 01:16
Comment on lines +485 to +486
// Decompose vectorized+bufferized map_store ops before lowering to loops.
.addPass(IREE::LinalgExt::createDecomposeMapStorePass)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I need to drop this commit, it was an experimental change.

Copy link
Member

@kuhar kuhar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just drive-by nits


// Build DictionaryAttr options from pass options. These are forwarded to
// upstream linalg::vectorize().
SmallVector<NamedAttribute> linalgOptionsList;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: We know there are at most 2 elements

Suggested change
SmallVector<NamedAttribute> linalgOptionsList;
SmallVector<NamedAttribute, 2> linalgOptionsList;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that it is only true in this snapshot, but it is not true for the existing upstream API: https://github.com/llvm/llvm-project/blob/3f65a03e8abb3e6fb3372cf4c254d6c9f090e2e0/mlir/lib/Dialect/Linalg/Transforms/Vectorization.cpp#L2728-L2732

FailureOr<VectorizationResult> mlir::linalg::vectorize(
    RewriterBase &rewriter, Operation *op, ArrayRef<int64_t> inputVectorSizes,
    ArrayRef<bool> inputScalableVecDims, bool vectorizeNDExtract,
    bool flatten1DDepthwiseConv, bool assumeDynamicDimsMatchVecSizes,
    bool createNamedContraction) {

I'm happy to make the change for this snapshot, but I'd like to point out that it's like asking future contributors to update the size when they add new options.

Comment on lines +170 to +176
SmallVector<NamedAttribute> linalgOptionsList;
linalgOptionsList.push_back(
rewriter.getNamedAttr("vectorizeNDExtract", rewriter.getBoolAttr(true)));
if (vectorizeToTransferGather) {
linalgOptionsList.push_back(rewriter.getNamedAttr(
"vectorizeToTransferGather", rewriter.getBoolAttr(true)));
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Don't use Builder::getNamedAttr, you can just pass in a StringRef to the NamedAttribute constructor:

Suggested change
SmallVector<NamedAttribute> linalgOptionsList;
linalgOptionsList.push_back(
rewriter.getNamedAttr("vectorizeNDExtract", rewriter.getBoolAttr(true)));
if (vectorizeToTransferGather) {
linalgOptionsList.push_back(rewriter.getNamedAttr(
"vectorizeToTransferGather", rewriter.getBoolAttr(true)));
}
SmallVector<NamedAttribute> linalgOptionsList = {{"vectorizeNDExtract", rewriter.getBoolAttr(true)}};
if (vectorizeToTransferGather) {
linalgOptionsList.emplace_back(
"vectorizeToTransferGather", rewriter.getBoolAttr(true));
}

Copy link
Contributor Author

@hanhanW hanhanW Mar 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it your coding preference? I don't see an issue of using Builder::getNamedAttr. If we already construct things from builder, I'd lean to being consistent.

In terms of LSP intergration, the function signature is provided in this case. Using brace initializers does not provide the signature easily to the tools (at least it does not work for my tools).

buildPartialGenericOp(RewriterBase &rewriter, linalg::GenericOp fullOp,
ArrayRef<int64_t> vectorSizes,
SmallVector<Operation *> partial,
DenseMap<Value, std::pair<Value, AffineMap>> &tmap) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does tmap stand for? I think this would be worth expanding or adding a docstring for.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is explained in the below comment. Let me add a function comment and move the documentation there.

Base automatically changed from users/hanhanW/vec-iface-c3a to main March 12, 2026 16:20
@hanhanW hanhanW force-pushed the users/hanhanW/vec-iface-c3b branch from d9fbaa3 to 63a8891 Compare March 12, 2026 16:38
hanhanW added 2 commits March 12, 2026 10:11
Signed-off-by: hanhanW <[email protected]>
Comment on lines +238 to +240
if (failed(result)) {
return signalPassFailure();
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit confused between this code change and the reply to my previous comment. Does the bug #23750 currently mean we can't signal pass failure here or can we just keep this change?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm sorry about the confusion. Apparently, I made changes without commiting. 🤦‍♂️

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens with operations that remain unvectorized because their vectorization failed? Will they be handled by some alternative path or will they just fail later in the pipeline?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It depends on pipeline setup, but the basic lowering should handle it. They should be lowered to scalar codes. There may be bufferization (like large stack buffer or shared memory) though.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find it difficult to say what the best strategy is here. I agree with your point that failing the pass here could mean that upstream changes/bugs cause instability. On the other hand, silently falling back to scalar code could either be inefficient (slow, which we would only catch in benchmarks, potentially with some delay that requires bisecting) or mis-compilation later on in the pipeline (stack buffer too large etc.).

The right choice probably depends on how bad the fallback can become. As there is an existing bug, maybe we can stick with this approach for now and add a TODO to add the pass failure later.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can add an option to that escalates silent failure to an actual failure for such case, so it may be easier to discover when we see bnehcmark regressions. This will require more work because there may be other similar "bugs" when we look at all the pipelines.

// A base class that defines common codegen options that are shared across
// different backends (e.g., CPU and GPU). Derived classes can add
// backend-specific options as needed.
//
// Note: We need static members because they are shared across all derived
// instances to bind LLVM cl::opt registration at the single storage when
// multiple backends inherit from this class.
struct CodegenOptions {
// Path to a module containing a tuning spec.
static std::string tuningSpecPath;
// Whether to add attributes for the tuner on root ops.
static bool setTunerAttributes;
void bindOptions(OptionsBinder &binder);
};

Signed-off-by: hanhanW <[email protected]>
@hanhanW hanhanW requested a review from sommerlukas March 12, 2026 18:29
Signed-off-by: hanhanW <[email protected]>
Comment on lines +238 to +240
if (failed(result)) {
return signalPassFailure();
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find it difficult to say what the best strategy is here. I agree with your point that failing the pass here could mean that upstream changes/bugs cause instability. On the other hand, silently falling back to scalar code could either be inefficient (slow, which we would only catch in benchmarks, potentially with some delay that requires bisecting) or mis-compilation later on in the pipeline (stack buffer too large etc.).

The right choice probably depends on how bad the fallback can become. As there is an existing bug, maybe we can stick with this approach for now and add a TODO to add the pass failure later.

@hanhanW hanhanW merged commit d1e7974 into main Mar 12, 2026
59 of 60 checks passed
@hanhanW hanhanW deleted the users/hanhanW/vec-iface-c3b branch March 12, 2026 20:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants