Skip to content

[Codegen] Introduce VectorizableOpInterface and migrate LinalgExt::GatherOp#23653

Merged
hanhanW merged 3 commits intomainfrom
users/hanhanW/vec-iface-c1
Mar 10, 2026
Merged

[Codegen] Introduce VectorizableOpInterface and migrate LinalgExt::GatherOp#23653
hanhanW merged 3 commits intomainfrom
users/hanhanW/vec-iface-c1

Conversation

@hanhanW
Copy link
Contributor

@hanhanW hanhanW commented Mar 5, 2026

Introduce a VectorizableOpInterface that provides an unified extension point for vectorization. Operations implement this interface to declare, whether it is vectorizable, how they should be vectorized, and the driver dispatches through it.

  • Implement external model for IREE::LinalgExt::GatherOp, inlining the vectorization logic from vectorizeLinalgExtGatherToTransferGather.
  • Update GenericVectorization.cpp to dispatch through the interface for ops that implement it, with TypeSwitch fallback for unmigrated ops.

No new tests because it is an NFC in terms of functionality. It just follows different mechanism for vectorization.

It is a step towards https://lists.lfaidata.foundation/g/iree-technical-discussion/message/15

Assisted-by: Claude

@hanhanW hanhanW force-pushed the users/hanhanW/vec-iface-c1 branch from bc0a2cb to 16bcb57 Compare March 5, 2026 00:48
…therOp

Introduce a VectorizableOpInterface that provides an unified extension
point for vectorization. Operations implement this interface to declare,
whether it is vectorizable, how they should be vectorized, and the
driver dispatches through it.

- Implement external model for IREE::LinalgExt::GatherOp, inlining the
  vectorization logic from vectorizeLinalgExtGatherToTransferGather.
- Update GenericVectorization.cpp to dispatch through the interface
  for ops that implement it, with TypeSwitch fallback for unmigrated ops.

It is a step towards https://lists.lfaidata.foundation/g/iree-technical-discussion/message/15

Assisted-by: Claude

Signed-off-by: hanhanW <[email protected]>
@hanhanW hanhanW force-pushed the users/hanhanW/vec-iface-c1 branch from 16bcb57 to c35263b Compare March 5, 2026 00:48
@hanhanW hanhanW marked this pull request as ready for review March 5, 2026 00:57
Signed-off-by: hanhanW <[email protected]>
Copy link
Contributor

@sommerlukas sommerlukas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a small nit, but please also wait for @Groverkss to review.

DictionaryAttr options) const {
auto gatherOp = cast<IREE::LinalgExt::GatherOp>(op);
int64_t batchRank = gatherOp.getBatchRank();
Location loc = gatherOp.getLoc();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be worth creating an ImplicitLocOpBuilder here and use it in the rest of this function.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried ImplicitLocOpBuilder when it was created, but I did not like the idea much. It was useful if you create operations in a single line, though I still don't like the idea.

The other point is that it extends OpBuilder, not RewriterBase, so methods like modifyOpInPlace don't work.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what you mean by

operations in a single line

If we create an ImplicitLocOpBuilder once in the beginning of vectorize, there is nine places where we pass rewriter, loc and could be passing that builder instead.

For things like modifyOpInPlace, you can keep using the rewriter, I think.

Comment on lines +87 to +90
auto indicesVecRead = vector::TransferReadOp::create(
rewriter, loc, indicesVecTy.clone(indicesTy.getElementType()),
gatherOp.getIndices(), SmallVector<Value>(indicesTy.getRank(), zero),
std::nullopt);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that eventually I would like to move this to an interface. Using transfer_read/transfer_write to vectorize something is not a very nice way of vectorization. We should be doing dialect conversion imo. We can do that after the VectorTile size analysis lands.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a plan for VectorTile size analysis yet? Is it something similar to what we implemented:

struct VectorizationTileSizes {
SmallVector<int64_t> destShape;
SmallVector<int64_t> vectorSizes;
SmallVector<bool> vectorScalableFlags;
};
/// Returns a VectorizationTileSizes which contains the inferred bounded result
/// shape and vector input sizes. This is useful to infer the sizes from a
/// chain.
std::optional<VectorizationTileSizes> inferSizesFromIR(Value val);

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a PR with a first attempt at the analysis here: #23668

The idea is that we take the tile sizes from to_layout operations (that come from lowering_config) and propagate that to the other compute ops. It's then annotated as a discardable attribute, so we can use it in GenericVectorization. There, we use three sources of information in this order:

  1. lowering_config
  2. Result of the VectorTileSizeAnalysis
  3. inferSizesFromIR

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The quick comment is about getVectorScalableFlags method. The lowering_config can implement the method to provide vector tile sizes. It's a little fragile because lowering config can be messed up during the transformation, so it is default off on x86 backends.

std::optional<SmallVector<int64_t>> vectorSizes =
loweringConfig.getVectorSizes();
SmallVector<bool> scalableFlags = loweringConfig.getVectorScalableFlags();

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, you mean we need to propagate the scalable flags in the analysis?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant the design idea was using the interface method, if the lowering config has the information. Scalable flags is just something similar to vector size and only some CPU backends use it.

@hanhanW hanhanW merged commit 3b4bb3d into main Mar 10, 2026
59 of 61 checks passed
@hanhanW hanhanW deleted the users/hanhanW/vec-iface-c1 branch March 10, 2026 16:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants