[vectorization] Add vectorization of non-projected linalg.generic by NoumanAmir657 · Pull Request #23664 · iree-org/iree

NoumanAmir657 · 2026-03-05T10:45:26Z

This PR adds support for what was discussed in this RFC.
The RFC proposes the tensor.extract approach but after further discussion with @Groverkss on discord, it was decided to lower the generic to iree_vector_ext.transfer_gather. The e2e lowering won't work for now in case of strided load from fastest moving dimension because transfer_gather lacks the folding pattern for gathering from fastest moving dimension.

Signed-off-by: NoumanAmir657 <[email protected]>

hanhanW

How is vector_ext.transfer_gather lowered in CPU backend?

hanhanW · 2026-03-09T20:46:06Z

compiler/src/iree/compiler/Codegen/Common/test/generic_vectorization.mlir

 // RUN: iree-opt --pass-pipeline="builtin.module(func.func(iree-codegen-generic-vectorization{enable-vector-masking=true}))" --split-input-file %s | FileCheck %s -check-prefix=CHECK-MASK
 // RUN: iree-opt --pass-pipeline="builtin.module(func.func(iree-codegen-generic-vectorization{fold-cast-into-contract=true}))" --split-input-file %s | FileCheck %s -check-prefix=CHECK-FOLD
 // RUN: iree-opt --pass-pipeline="builtin.module(func.func(iree-codegen-generic-vectorization{vectorize-to-transfer-gather=true}))" --split-input-file %s | FileCheck %s --check-prefix=CHECK-GATHER
+// RUN: iree-opt --pass-pipeline="builtin.module(func.func(iree-codegen-generic-vectorization{enable-vector-masking=true vectorize-to-transfer-gather=true}))" --split-input-file %s | FileCheck %s --check-prefix=CHECK-MASK-GATHER


This is okay for now. I will break this into a couple files in a follow-up. Mostly an FYI.

hanhanW · 2026-03-09T20:46:33Z

compiler/src/iree/compiler/Codegen/Common/test/generic_vectorization.mlir

+// CHECK-GATHER-LABEL: func.func @implicit_gather_like_generic_stride_2
+// CHECK-GATHER-SAME: %[[IN:[a-zA-Z0-9]+]]: tensor<1x1x31xf32>
+// CHECK-GATHER-SAME: %[[OUT:[a-zA-Z0-9]+]]: tensor<1x1x1x1x16xf32>
+// CHECK-GATHER-DAG: %[[C0:.+]] = arith.constant 0 : index
+// CHECK-GATHER-DAG: %[[DENSE:.+]] = arith.constant dense<2> : vector<16xindex>
+// CHECK-GATHER-DAG: %[[STEP:.+]] = vector.step : vector<16xindex>
+// CHECK-GATHER: %[[INDICES:.+]] = arith.muli %[[STEP]], %[[DENSE]] : vector<16xindex>
+// CHECK-GATHER: %[[GATHER:.+]] = iree_vector_ext.transfer_gather %[[IN]][%[[C0]], %[[C0]], %[[C0]]]
+// CHECK-GATHER: %[[RESULT:.+]] = vector.transfer_write %[[GATHER]], %[[OUT]][%[[C0]], %[[C0]], %[[C0]], %[[C0]], %[[C0]]]
+// CHECK-GATHER: return %[[RESULT]]


Please make checks aligned/formated like above case.

hanhanW · 2026-03-09T20:48:46Z

compiler/src/iree/compiler/Codegen/Dialect/VectorExt/Transforms/VectorizeIREEVectorExtOps.cpp

+  auto inType = llvm::cast<RankedTensorType>(inOperand->get().getType());
+  auto outType = llvm::cast<RankedTensorType>(outOperand->get().getType());


style nit: drop llvm:: prefix.

hanhanW · 2026-03-09T21:05:57Z

compiler/src/iree/compiler/Codegen/Dialect/VectorExt/Transforms/VectorizeIREEVectorExtOps.cpp

+  rewriter.replaceOp(op, transferWriteOp.getResult());
+  return success();


Can we move it to driver? I know the other method has this pattern, but it should be fixed. I'm fixing it in my other vectorization interface implementation: #23713

Can we make it this way in the first place?

NoumanAmir657 · 2026-03-09T21:19:30Z

How is vector_ext.transfer_gather lowered in CPU backend?

Currently, it lowers to a transfer_read if the load is contigous in the fastest moving dimension. The canonicalization patterns do that: https://github.com/iree-org/iree/blob/main/compiler/src/iree/compiler/Codegen/Dialect/VectorExt/IR/test/canonicalize.mlir#L180-L192.
If the fastest moving dimension is a gathered load (having indices) then that canocalization is currently not implemented for transfer_gather. That can be dealt with by lowering the transfer_gather to a vector.gather + vector.transfer_write.

hanhanW · 2026-03-09T21:26:19Z

How is vector_ext.transfer_gather lowered in CPU backend?

Currently, it lowers to a transfer_read if the load is contigous in the fastest moving dimension. The canonicalization patterns do that: https://github.com/iree-org/iree/blob/main/compiler/src/iree/compiler/Codegen/Dialect/VectorExt/IR/test/canonicalize.mlir#L180-L192. If the fastest moving dimension is a gathered load (having indices) then that canocalization is currently not implemented for transfer_gather.

Can we add a test case for the case that vector.transfer_gather can't be lowered to vector.transfer_read?

What is the plan for the lowering of the case that fastest moving dimension is a gathered load? Is it unrolled? (Maybe @Groverkss knows the answers?)

NoumanAmir657 · 2026-03-09T21:33:53Z

How is vector_ext.transfer_gather lowered in CPU backend?

Currently, it lowers to a transfer_read if the load is contigous in the fastest moving dimension. The canonicalization patterns do that: https://github.com/iree-org/iree/blob/main/compiler/src/iree/compiler/Codegen/Dialect/VectorExt/IR/test/canonicalize.mlir#L180-L192. If the fastest moving dimension is a gathered load (having indices) then that canocalization is currently not implemented for transfer_gather.

Can we add a test case for the case that vector.transfer_gather can't be lowered to vector.transfer_read?

What is the plan for the lowering of the case that fastest moving dimension is a gathered load? Is it unrolled? (Maybe @Groverkss knows the answers?)

This is the test case

func.func @implicit_gather_like_generic_stride_2(%arg0: tensor<1x1x31xf32>, %arg1: tensor<1x1x1x1x16xf32>) -> tensor<1x1x1x1x16xf32> {
  %0 = linalg.generic {
    indexing_maps = [
      affine_map<(d0, d1, d2, d3, d4) -> (d0, d1, d4 * 2)>,
      affine_map<(d0, d1, d2, d3, d4) -> (d0, d1, d2, d3, d4)>
    ],
    iterator_types = ["parallel", "parallel", "parallel", "parallel", "parallel"]
  } ins(%arg0 : tensor<1x1x31xf32>) outs(%arg1 : tensor<1x1x1x1x16xf32>) {
  ^bb0(%in: f32, %out: f32):
    linalg.yield %in : f32
  } -> tensor<1x1x1x1x16xf32>
  return %0 : tensor<1x1x1x1x16xf32>
}
// CHECK-GATHER-LABEL: func.func @implicit_gather_like_generic_stride_2
// CHECK-GATHER-SAME:   %[[IN:[a-zA-Z0-9]+]]: tensor<1x1x31xf32>
// CHECK-GATHER-SAME:   %[[OUT:[a-zA-Z0-9]+]]: tensor<1x1x1x1x16xf32>
// CHECK-GATHER-DAG:    %[[C0:.+]] = arith.constant 0 : index
// CHECK-GATHER-DAG:    %[[DENSE:.+]] = arith.constant dense<2> : vector<16xindex>
// CHECK-GATHER-DAG:    %[[STEP:.+]] = vector.step : vector<16xindex>
// CHECK-GATHER:        %[[INDICES:.+]] = arith.muli %[[STEP]], %[[DENSE]] : vector<16xindex>
// CHECK-GATHER:        %[[GATHER:.+]] = iree_vector_ext.transfer_gather %[[IN]][%[[C0]], %[[C0]], %[[C0]]]
// CHECK-GATHER:        %[[RESULT:.+]] = vector.transfer_write %[[GATHER]], %[[OUT]][%[[C0]], %[[C0]], %[[C0]], %[[C0]], %[[C0]]]
// CHECK-GATHER:        return %[[RESULT]]

Since the generic has stride as 2 in the indexing map. After generic vectorization, when the canocalizer runs, transfer_gather will remain as it is and won't be lowered to a vector.transfer_read.
Are you asking me to add this test case in the PR somewhere? Should it go in this PR since it's not really the main thing being addressed in the PR?

Edit: Sorry, this test case is already there.

hanhanW · 2026-03-09T21:39:59Z

How is vector_ext.transfer_gather lowered in CPU backend?

Currently, it lowers to a transfer_read if the load is contigous in the fastest moving dimension. The canonicalization patterns do that: https://github.com/iree-org/iree/blob/main/compiler/src/iree/compiler/Codegen/Dialect/VectorExt/IR/test/canonicalize.mlir#L180-L192. If the fastest moving dimension is a gathered load (having indices) then that canocalization is currently not implemented for transfer_gather.

Can we add a test case for the case that vector.transfer_gather can't be lowered to vector.transfer_read?
What is the plan for the lowering of the case that fastest moving dimension is a gathered load? Is it unrolled? (Maybe @Groverkss knows the answers?)

This is the test case
func.func @implicit_gather_like_generic_stride_2(%arg0: tensor<1x1x31xf32>, %arg1: tensor<1x1x1x1x16xf32>) -> tensor<1x1x1x1x16xf32> {
  %0 = linalg.generic {
    indexing_maps = [
      affine_map<(d0, d1, d2, d3, d4) -> (d0, d1, d4 * 2)>,
      affine_map<(d0, d1, d2, d3, d4) -> (d0, d1, d2, d3, d4)>
    ],
    iterator_types = ["parallel", "parallel", "parallel", "parallel", "parallel"]
  } ins(%arg0 : tensor<1x1x31xf32>) outs(%arg1 : tensor<1x1x1x1x16xf32>) {
  ^bb0(%in: f32, %out: f32):
    linalg.yield %in : f32
  } -> tensor<1x1x1x1x16xf32>
  return %0 : tensor<1x1x1x1x16xf32>
}
// CHECK-GATHER-LABEL: func.func @implicit_gather_like_generic_stride_2
// CHECK-GATHER-SAME:   %[[IN:[a-zA-Z0-9]+]]: tensor<1x1x31xf32>
// CHECK-GATHER-SAME:   %[[OUT:[a-zA-Z0-9]+]]: tensor<1x1x1x1x16xf32>
// CHECK-GATHER-DAG:    %[[C0:.+]] = arith.constant 0 : index
// CHECK-GATHER-DAG:    %[[DENSE:.+]] = arith.constant dense<2> : vector<16xindex>
// CHECK-GATHER-DAG:    %[[STEP:.+]] = vector.step : vector<16xindex>
// CHECK-GATHER:        %[[INDICES:.+]] = arith.muli %[[STEP]], %[[DENSE]] : vector<16xindex>
// CHECK-GATHER:        %[[GATHER:.+]] = iree_vector_ext.transfer_gather %[[IN]][%[[C0]], %[[C0]], %[[C0]]]
// CHECK-GATHER:        %[[RESULT:.+]] = vector.transfer_write %[[GATHER]], %[[OUT]][%[[C0]], %[[C0]], %[[C0]], %[[C0]], %[[C0]]]
// CHECK-GATHER:        return %[[RESULT]]
Since the generic has stride as 2 in the indexing map. After generic vectorization, when the canocalizer runs, transfer_gather will remain as it is and won't be lowered to a vector.transfer_read. Are you asking me to add this test case in the PR somewhere? Should it go in this PR since it's not really the main thing being addressed in the PR?

Edit: Sorry, this test case is already there.

Sorry, it was not obvious to me. It's good if we already have it. Can you check the indexing maps as well? They are important to vectorization, right? (I should run it myself, but I don't have a fresh build yet.)

Signed-off-by: NoumanAmir657 <[email protected]>

NoumanAmir657 · 2026-03-10T01:36:43Z

@hanhanW I don't think the CI failure is related to this PR. Let me know if further changes are required. Thanks!

hanhanW · 2026-03-10T22:06:11Z

compiler/src/iree/compiler/Codegen/Dialect/VectorExt/Transforms/VectorizeIREEVectorExtOps.cpp

+  SmallVector<AffineExpr> sourceExprs;
+  for (int i = 0; i < inRank - 1; ++i) {
+    sourceExprs.push_back(rewriter.getAffineDimExpr(i));
+  }
+  sourceExprs.push_back(rewriter.getAffineSymbolExpr(0));


Source map construction is incorrect for general indexing maps

VectorizeIREEVectorExtOps.cpp lines 458-462: for (int i = 0; i < inRank - 1; ++i) { sourceExprs.push_back(rewriter.getAffineDimExpr(i)); } sourceExprs.push_back(rewriter.getAffineSymbolExpr(0));

This hardcodes the contiguous source dimensions to d0, d1, ..., d_{inRank-2}, but the actual input map's contiguous dims may use different loop dimensions. For example, if the input map is (d0, d1, d2, d3, d4) -> (d2, d1, d4 * 2),
this code generates source map (d0,d1,d2,d3,d4)[s0] -> (d0, d1, s0) when it should be (d0,d1,d2,d3,d4)[s0] -> (d2, d1, s0).

The fix: extract the actual dim expressions from the input map's contiguous results rather than assuming they are sequential from d0

Addressed in the below comment.

hanhanW · 2026-03-10T22:08:17Z

compiler/src/iree/compiler/Codegen/Dialect/VectorExt/Transforms/VectorizeIREEVectorExtOps.cpp

+  AffineMap indexMap = AffineMap::get(
+      numLoops, 1, {rewriter.getAffineDimExpr(numLoops - 1)}, ctx);


This assumes the gathered (non-affine) expression always involves the last loop dimension (d4 in the test). But isImplicitGather doesn't enforce this - it only checks that the input map is "not a projected permutation." A map like (d0, d1, d2, d3, d4) -> (d0, d2 * 2, d1) would pass the detection but be incorrectly vectorized because the gathered dim is d2, not d4.

I can constraint it like this:

Constraint output to be identity indexing map

Constraint the last dim of input indexing map to be a AffineBinaryExpr or an AffineDimExpr. If dk is last dim of output indexing map then the last index of input indexing map should be dk * stride + dm where dm!=dk or just be dk.

For the above to work, another constraint would be needed which basically checks whether all leading dims of input and output are 1, except the last dim. In that case, the gathered expression will basically become dk * stride. And it would ensure that we are only gathering/loading from last dimension only.

These checks seems to work on the usecase that I am targeting. It will also work on the case where we have multiple gather dims expressions because of the unit dim check as mentioned in the third point above.

Do you think these constraints are okay to move forward with?

For a case like the below:

// input indexing map #map1 = affine_map<(d0, d1, d2, d3, d4) -> (d0, d3 * 2 + d1, d4 * 2 + d2)> // output indexing map #map2 = affine_map<(d0, d1, d2, d3, d4) -> (d0, d1, d2, d3, d4)>

Since, we constrianed that the load/gather will be a form of d4 * stride + d_whatever_except_d4 and that all leading dims uptil the last dim are 1 (input tensor<1x1x31> and output tensor<1x1x1x1x16), for the source map we will have all affine_map<(d0, d1, d2, d3, d4)[s0] -> (0, 0, s0)>. All the leading dims except the last dim of input will be 0.

func.func @implicit_gather_like_generic_stride_2(%arg0: tensor<1x1x31xf32>, %arg1: tensor<1x1x1x1x16xf32>) -> tensor<1x1x1x1x16xf32> { %0 = linalg.generic { indexing_maps = [ affine_map<(d0, d1, d2, d3, d4) -> (d0, d1, d4 * 2 + d0)>, affine_map<(d0, d1, d2, d3, d4) -> (d0, d1, d2, d3, d4)> ], iterator_types = ["parallel", "parallel", "parallel", "parallel", "parallel"] } ins(%arg0 : tensor<1x1x31xf32>) outs(%arg1 : tensor<1x1x1x1x16xf32>) { ^bb0(%in: f32, %out: f32): linalg.yield %in : f32 } -> tensor<1x1x1x1x16xf32> return %0 : tensor<1x1x1x1x16xf32> } // CHECK-GATHER: #[[$MAP0:.+]] = affine_map<(d0, d1, d2, d3, d4)[s0] -> (0, 0, s0)> // CHECK-GATHER: #[[$MAP1:.+]] = affine_map<(d0, d1, d2, d3, d4)[s0] -> (d4)> // CHECK-GATHER-LABEL: func.func @implicit_gather_like_generic_stride_2 // CHECK-GATHER-SAME: %[[IN:[a-zA-Z0-9]+]]: tensor<1x1x31xf32> // CHECK-GATHER-SAME: %[[OUT:[a-zA-Z0-9]+]]: tensor<1x1x1x1x16xf32> // CHECK-GATHER-DAG: %[[C0:.+]] = arith.constant 0 : index // CHECK-GATHER-DAG: %[[DENSE:.+]] = arith.constant dense<2> : vector<16xindex> // CHECK-GATHER-DAG: %[[PASSTHRU:.+]] = arith.constant 0.000000e+00 : f32 // CHECK-GATHER-DAG: %[[STEP:.+]] = vector.step : vector<16xindex> // CHECK-GATHER: %[[INDICES:.+]] = arith.muli %[[STEP]], %[[DENSE]] : vector<16xindex> // CHECK-GATHER: %[[GATHER:.+]] = iree_vector_ext.transfer_gather %[[IN]][%[[C0]], %[[C0]], %[[C0]]] // CHECK-GATHER-SAME: [%[[INDICES]] : vector<16xindex>], %[[PASSTHRU]] // CHECK-GATHER-SAME: {indexing_maps = [#[[$MAP0]], #[[$MAP1]]]} // CHECK-GATHER: %[[RESULT:.+]] = vector.transfer_write %[[GATHER]], %[[OUT]][%[[C0]], %[[C0]], %[[C0]], %[[C0]], %[[C0]]] // CHECK-GATHER: return %[[RESULT]]

Future PRs can make isImplicitGather less conservative.

hanhanW · 2026-03-10T22:10:11Z

compiler/src/iree/compiler/Codegen/Common/GenericVectorization.cpp

  return success(maxFlatVecSize < maxVectorSize);
 }

+static bool isImplicitGather(linalg::GenericOp genericOp) {


We should be conservative in codegen lowering, i.e., we should prevent incorrect lowering in the first place.

The check and the implementation are not aligned to me, so I asked claude to provide the comment:

isImplicitGather is too permissive for the transformation's assumptions

The detection function (GenericVectorization.cpp lines 148-177) checks:

All parallel loops

Exactly 1 input, 1 output

Input map is NOT a projected permutation

Output map IS a projected permutation

Body has only the yield op

But vectorizeImplicitGatherToTransferGather additionally assumes:

Only the last result expression of the input map is non-trivial

The contiguous dims are d0..d_{n-2} in sequential order

The gathered expression involves only d_{numLoops-1}

The output map is identity (not just projected permutation)

If isImplicitGather returns true but these structural assumptions don't hold, the generated IR is silently incorrect. The detection function must validate the exact structural requirements, or the transformation must analyze the actual map expressions.

hanhanW · 2026-03-10T22:12:00Z

compiler/src/iree/compiler/Codegen/Dialect/VectorExt/Transforms/VectorizeIREEVectorExtOps.cpp

+  auto transferWriteOp = vector::TransferWriteOp::create(
+      rewriter, loc, transferGatherOp.getResult(), outOperand->get(),
+      writeOffsets);


Similar comment for the mismatch:

The transfer_write uses identity offsets and no permutation map. But isImplicitGather allows any projected permutation for the output map (e.g., (d0,d1,d2,d3,d4) -> (d4,d3,d2,d1,d0)). For non-identity output maps, the write would place elements in incorrect positions. Either restrict isImplicitGather to identity output maps, or generate the correct permutation map for the write.

isImplictiGather restriced to output identity maps.

hanhanW · 2026-03-10T22:14:48Z

compiler/src/iree/compiler/Codegen/Dialect/VectorExt/Transforms/VectorizeIREEVectorExtOps.cpp

+                                        ArrayRef<int64_t> vectorSizes) {
+  Location loc = op.getLoc();
+  MLIRContext *ctx = rewriter.getContext();
+  rewriter.setInsertionPoint(op);


We should have the insertion point guard when you move the insertion point. IMO, it's callee's responsibility to restore the insertion point. Otherwise, you spread the logic to callers.

hanhanW · 2026-03-10T22:15:20Z

compiler/src/iree/compiler/Codegen/Dialect/VectorExt/Transforms/VectorizeIREEVectorExtOps.cpp

+  rewriter.modifyOpInPlace(
+      transferWriteOp, [&] { transferWriteOp.getMaskMutable().assign(mask); });


Why not pass mask to the builder?

I decided to consider only static cases for this PR so masking won't be needed. Fixed in recent commit.

hanhanW · 2026-03-10T22:34:26Z

compiler/src/iree/compiler/Codegen/Common/GenericVectorization.cpp

+  }
+
+  Block *body = genericOp.getBlock();
+  return std::distance(body->begin(), body->end()) == 1;


nit: use llvm::hasSingleElement(*body)

hanhanW · 2026-03-10T22:39:52Z

compiler/src/iree/compiler/Codegen/Dialect/VectorExt/Transforms/VectorizeIREEVectorExtOps.cpp

+  int64_t stride = 1;
+  inputMap.getResult(inputMap.getNumResults() - 1).walk([&](AffineExpr sub) {
+    if (auto mul = dyn_cast<AffineBinaryOpExpr>(sub)) {
+      if (mul.getKind() == AffineExprKind::Mul) {
+        if (auto rhs = dyn_cast<AffineConstantExpr>(mul.getRHS())) {
+          stride = rhs.getValue();
+        }
+      }
+    }
+  });


This is fragile and it is wrong for some cases: It treats d4 * 2 + 1 as the same as d4 * 2.

Made the isImplicitGather more conservative so the case d4 * 2 + 1 won't be considered for vectorization.

Signed-off-by: NoumanAmir657 <[email protected]>

hanhanW

Thanks for addressing the comments, here is a new round of review. Overall looks okay to me, just some nits.

I did not expect that d1e7974 is landed before this. Can you move the implementation to VectorizableOpInterface.cpp after rebase? Thanks!

hanhanW · 2026-03-12T23:06:48Z

compiler/src/iree/compiler/Codegen/Dialect/VectorExt/Transforms/VectorizeIREEVectorExtOps.cpp

+        if (dim.getPosition() == lastLoopDim) {
+          return cast<AffineConstantExpr>(mul.getLHS()).getValue();
+        }
+      }


MLIR's canonical form guarantees RHS of Mul is always the constant/symbolic expression, so this is a deadcode.

https://github.com/llvm/llvm-project/blob/2d70dbdb357d2b4080b30d35a105442337bc1fdd/mlir/include/mlir/IR/AffineExpr.h#L42-L43

/// RHS of mul is always a constant or a symbolic expression.

hanhanW · 2026-03-12T23:07:55Z

compiler/src/iree/compiler/Codegen/Common/GenericVectorization.cpp

+    if (auto dim = dyn_cast<AffineDimExpr>(mul.getRHS())) {
+      return dim.getPosition() == lastLoopDim &&
+             isa<AffineConstantExpr>(mul.getLHS());
+    }


same here, this is deadcode.

hanhanW · 2026-03-12T23:18:37Z

compiler/src/iree/compiler/Codegen/Common/GenericVectorization.cpp

+  auto inputMap =
+      genericOp.getMatchingIndexingMap(genericOp.getDpsInputOperand(0));
+  auto outputMap =
+      genericOp.getMatchingIndexingMap(genericOp.getDpsInitOperand(0));


spell out the type

hanhanW · 2026-03-12T23:18:54Z

compiler/src/iree/compiler/Codegen/Common/GenericVectorization.cpp

+              auto gatherResult =
+                  IREE::VectorExt::vectorizeImplicitGatherToTransferGather(
+                      rewriter, genericOp, vectorSizes);


Spell out the type

hanhanW · 2026-03-12T23:19:09Z

compiler/src/iree/compiler/Codegen/Dialect/VectorExt/Transforms/VectorizeIREEVectorExtOps.cpp

+  auto *inOperand = op.getDpsInputOperand(0);
+  auto *outOperand = op.getDpsInitOperand(0);


spell out the type

hanhanW · 2026-03-12T23:19:45Z

compiler/src/iree/compiler/Codegen/Dialect/VectorExt/Transforms/VectorizeIREEVectorExtOps.cpp

+  AffineMap sourceMap =
+      AffineMap::get(numLoops, /*symbolCount=*/1, sourceExprs, ctx);
+
+  AffineMap indexMap =
+      AffineMap::get(numLoops, /*symbolCount=*/1,
+                     {rewriter.getAffineDimExpr(numLoops - 1)}, ctx);


nit: these can use auto because ::get spells the type.

hanhanW · 2026-03-12T23:20:31Z

compiler/src/iree/compiler/Codegen/Dialect/VectorExt/Transforms/VectorizeIREEVectorExtOps.cpp

+  llvm_unreachable("isImplicitGather should have rejected this expression");
+}
+
+FailureOr<Value>


I think we can always return Value. It should not fail.

Signed-off-by: NoumanAmir657 <[email protected]>

hanhanW · 2026-03-13T18:32:34Z

compiler/src/iree/compiler/Codegen/Interfaces/VectorizableOpInterface.cpp

+      if (isImplicitGather(genericOp)) {
+        return true;
+      }


You should check with vectorizeToTransferGather flag.

Signed-off-by: NoumanAmir657 <[email protected]>

hanhanW · 2026-03-13T21:40:10Z

https://github.com/iree-org/iree/actions/runs/23066888975/attempts/2?pr=23664 seems to be flaky, it passes in a re-run. Can you fix the bazel issue?

Signed-off-by: NoumanAmir657 <[email protected]>

NoumanAmir657 · 2026-03-13T21:48:27Z

https://github.com/iree-org/iree/actions/runs/23066888975/attempts/2?pr=23664 seems to be flaky, it passes in a re-run. Can you fix the bazel issue?

Fixed in lastest commit.

NoumanAmir657 · 2026-03-13T23:10:16Z

@hanhanW The CI is passing now. Can you merge this for me? Thanks!

NoumanAmir657 added 4 commits March 4, 2026 00:27

Vectorize gather like generic

a0ffccb

Signed-off-by: NoumanAmir657 <[email protected]>

Add tests

43fa72d

Signed-off-by: NoumanAmir657 <[email protected]>

Fixed tests

f48971f

Signed-off-by: NoumanAmir657 <[email protected]>

removed dbg statements

f843744

Signed-off-by: NoumanAmir657 <[email protected]>

NoumanAmir657 requested review from Groverkss, MaheshRavishankar, Max191, hanhanW and qedawkins as code owners March 5, 2026 10:45

NoumanAmir657 mentioned this pull request Mar 6, 2026

RFC: Vectorization of Gather like linalg.generic #23523

Closed

hanhanW reviewed Mar 9, 2026

View reviewed changes

Merge branch 'main' into nouman/gather

fe64d81

Fixes

79b0ba2

Signed-off-by: NoumanAmir657 <[email protected]>

Merge branch 'main' into nouman/gather

759e28d

hanhanW requested changes Mar 10, 2026

View reviewed changes

NoumanAmir657 added 2 commits March 13, 2026 00:06

conservative

8199101

Signed-off-by: NoumanAmir657 <[email protected]>

Merge branch 'main' into nouman/gather

7ddd446

hanhanW reviewed Mar 12, 2026

View reviewed changes

NoumanAmir657 added 2 commits March 13, 2026 22:18

Merge remote-tracking branch 'upstream/main' into nouman/gather

868295e

Fixes

1533f8d

Signed-off-by: NoumanAmir657 <[email protected]>

hanhanW reviewed Mar 13, 2026

View reviewed changes

Fix

cbfc4ab

Signed-off-by: NoumanAmir657 <[email protected]>

bazel fix

2a18dc4

Signed-off-by: NoumanAmir657 <[email protected]>

hanhanW approved these changes Mar 13, 2026

View reviewed changes

hanhanW merged commit cb98730 into iree-org:main Mar 13, 2026
57 checks passed

NoumanAmir657 mentioned this pull request Mar 13, 2026

linalg.generic not getting vectorized #23297

Closed

		auto inType = llvm::cast<RankedTensorType>(inOperand->get().getType());
		auto outType = llvm::cast<RankedTensorType>(outOperand->get().getType());

		rewriter.replaceOp(op, transferWriteOp.getResult());
		return success();

		AffineMap indexMap = AffineMap::get(
		numLoops, 1, {rewriter.getAffineDimExpr(numLoops - 1)}, ctx);

		rewriter.modifyOpInPlace(
		transferWriteOp, [&] { transferWriteOp.getMaskMutable().assign(mask); });

		auto *inOperand = op.getDpsInputOperand(0);
		auto *outOperand = op.getDpsInitOperand(0);

Conversation

NoumanAmir657 commented Mar 5, 2026

Uh oh!

hanhanW left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

NoumanAmir657 commented Mar 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

hanhanW commented Mar 9, 2026

Uh oh!

NoumanAmir657 commented Mar 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

hanhanW commented Mar 9, 2026

Uh oh!

NoumanAmir657 commented Mar 10, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

NoumanAmir657 Mar 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

NoumanAmir657 Mar 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

hanhanW left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

NoumanAmir657 commented Mar 9, 2026 •

edited

Loading

NoumanAmir657 commented Mar 9, 2026 •

edited

Loading

NoumanAmir657 Mar 11, 2026 •

edited

Loading

NoumanAmir657 Mar 11, 2026 •

edited

Loading