Skip to content

[vectorization] Add vectorization of non-projected linalg.generic#23664

Merged
hanhanW merged 13 commits intoiree-org:mainfrom
10x-Engineers:nouman/gather
Mar 13, 2026
Merged

[vectorization] Add vectorization of non-projected linalg.generic#23664
hanhanW merged 13 commits intoiree-org:mainfrom
10x-Engineers:nouman/gather

Conversation

@NoumanAmir657
Copy link
Contributor

This PR adds support for what was discussed in this RFC.
The RFC proposes the tensor.extract approach but after further discussion with @Groverkss on discord, it was decided to lower the generic to iree_vector_ext.transfer_gather. The e2e lowering won't work for now in case of strided load from fastest moving dimension because transfer_gather lacks the folding pattern for gathering from fastest moving dimension.

Signed-off-by: NoumanAmir657 <[email protected]>
Signed-off-by: NoumanAmir657 <[email protected]>
Signed-off-by: NoumanAmir657 <[email protected]>
Signed-off-by: NoumanAmir657 <[email protected]>
Copy link
Contributor

@hanhanW hanhanW left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How is vector_ext.transfer_gather lowered in CPU backend?

// RUN: iree-opt --pass-pipeline="builtin.module(func.func(iree-codegen-generic-vectorization{enable-vector-masking=true}))" --split-input-file %s | FileCheck %s -check-prefix=CHECK-MASK
// RUN: iree-opt --pass-pipeline="builtin.module(func.func(iree-codegen-generic-vectorization{fold-cast-into-contract=true}))" --split-input-file %s | FileCheck %s -check-prefix=CHECK-FOLD
// RUN: iree-opt --pass-pipeline="builtin.module(func.func(iree-codegen-generic-vectorization{vectorize-to-transfer-gather=true}))" --split-input-file %s | FileCheck %s --check-prefix=CHECK-GATHER
// RUN: iree-opt --pass-pipeline="builtin.module(func.func(iree-codegen-generic-vectorization{enable-vector-masking=true vectorize-to-transfer-gather=true}))" --split-input-file %s | FileCheck %s --check-prefix=CHECK-MASK-GATHER
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is okay for now. I will break this into a couple files in a follow-up. Mostly an FYI.

Comment on lines +1021 to +1030
// CHECK-GATHER-LABEL: func.func @implicit_gather_like_generic_stride_2
// CHECK-GATHER-SAME: %[[IN:[a-zA-Z0-9]+]]: tensor<1x1x31xf32>
// CHECK-GATHER-SAME: %[[OUT:[a-zA-Z0-9]+]]: tensor<1x1x1x1x16xf32>
// CHECK-GATHER-DAG: %[[C0:.+]] = arith.constant 0 : index
// CHECK-GATHER-DAG: %[[DENSE:.+]] = arith.constant dense<2> : vector<16xindex>
// CHECK-GATHER-DAG: %[[STEP:.+]] = vector.step : vector<16xindex>
// CHECK-GATHER: %[[INDICES:.+]] = arith.muli %[[STEP]], %[[DENSE]] : vector<16xindex>
// CHECK-GATHER: %[[GATHER:.+]] = iree_vector_ext.transfer_gather %[[IN]][%[[C0]], %[[C0]], %[[C0]]]
// CHECK-GATHER: %[[RESULT:.+]] = vector.transfer_write %[[GATHER]], %[[OUT]][%[[C0]], %[[C0]], %[[C0]], %[[C0]], %[[C0]]]
// CHECK-GATHER: return %[[RESULT]]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please make checks aligned/formated like above case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

Comment on lines +411 to +412
auto inType = llvm::cast<RankedTensorType>(inOperand->get().getType());
auto outType = llvm::cast<RankedTensorType>(outOperand->get().getType());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style nit: drop llvm:: prefix.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

Comment on lines +483 to +484
rewriter.replaceOp(op, transferWriteOp.getResult());
return success();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we move it to driver? I know the other method has this pattern, but it should be fixed. I'm fixing it in my other vectorization interface implementation: #23713

Can we make it this way in the first place?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

@NoumanAmir657
Copy link
Contributor Author

NoumanAmir657 commented Mar 9, 2026

How is vector_ext.transfer_gather lowered in CPU backend?

Currently, it lowers to a transfer_read if the load is contigous in the fastest moving dimension. The canonicalization patterns do that: https://github.com/iree-org/iree/blob/main/compiler/src/iree/compiler/Codegen/Dialect/VectorExt/IR/test/canonicalize.mlir#L180-L192.
If the fastest moving dimension is a gathered load (having indices) then that canocalization is currently not implemented for transfer_gather. That can be dealt with by lowering the transfer_gather to a vector.gather + vector.transfer_write.

@hanhanW
Copy link
Contributor

hanhanW commented Mar 9, 2026

How is vector_ext.transfer_gather lowered in CPU backend?

Currently, it lowers to a transfer_read if the load is contigous in the fastest moving dimension. The canonicalization patterns do that: https://github.com/iree-org/iree/blob/main/compiler/src/iree/compiler/Codegen/Dialect/VectorExt/IR/test/canonicalize.mlir#L180-L192. If the fastest moving dimension is a gathered load (having indices) then that canocalization is currently not implemented for transfer_gather.

Can we add a test case for the case that vector.transfer_gather can't be lowered to vector.transfer_read?

What is the plan for the lowering of the case that fastest moving dimension is a gathered load? Is it unrolled? (Maybe @Groverkss knows the answers?)

@NoumanAmir657
Copy link
Contributor Author

NoumanAmir657 commented Mar 9, 2026

How is vector_ext.transfer_gather lowered in CPU backend?

Currently, it lowers to a transfer_read if the load is contigous in the fastest moving dimension. The canonicalization patterns do that: https://github.com/iree-org/iree/blob/main/compiler/src/iree/compiler/Codegen/Dialect/VectorExt/IR/test/canonicalize.mlir#L180-L192. If the fastest moving dimension is a gathered load (having indices) then that canocalization is currently not implemented for transfer_gather.

Can we add a test case for the case that vector.transfer_gather can't be lowered to vector.transfer_read?

What is the plan for the lowering of the case that fastest moving dimension is a gathered load? Is it unrolled? (Maybe @Groverkss knows the answers?)

This is the test case

func.func @implicit_gather_like_generic_stride_2(%arg0: tensor<1x1x31xf32>, %arg1: tensor<1x1x1x1x16xf32>) -> tensor<1x1x1x1x16xf32> {
  %0 = linalg.generic {
    indexing_maps = [
      affine_map<(d0, d1, d2, d3, d4) -> (d0, d1, d4 * 2)>,
      affine_map<(d0, d1, d2, d3, d4) -> (d0, d1, d2, d3, d4)>
    ],
    iterator_types = ["parallel", "parallel", "parallel", "parallel", "parallel"]
  } ins(%arg0 : tensor<1x1x31xf32>) outs(%arg1 : tensor<1x1x1x1x16xf32>) {
  ^bb0(%in: f32, %out: f32):
    linalg.yield %in : f32
  } -> tensor<1x1x1x1x16xf32>
  return %0 : tensor<1x1x1x1x16xf32>
}
// CHECK-GATHER-LABEL: func.func @implicit_gather_like_generic_stride_2
// CHECK-GATHER-SAME:   %[[IN:[a-zA-Z0-9]+]]: tensor<1x1x31xf32>
// CHECK-GATHER-SAME:   %[[OUT:[a-zA-Z0-9]+]]: tensor<1x1x1x1x16xf32>
// CHECK-GATHER-DAG:    %[[C0:.+]] = arith.constant 0 : index
// CHECK-GATHER-DAG:    %[[DENSE:.+]] = arith.constant dense<2> : vector<16xindex>
// CHECK-GATHER-DAG:    %[[STEP:.+]] = vector.step : vector<16xindex>
// CHECK-GATHER:        %[[INDICES:.+]] = arith.muli %[[STEP]], %[[DENSE]] : vector<16xindex>
// CHECK-GATHER:        %[[GATHER:.+]] = iree_vector_ext.transfer_gather %[[IN]][%[[C0]], %[[C0]], %[[C0]]]
// CHECK-GATHER:        %[[RESULT:.+]] = vector.transfer_write %[[GATHER]], %[[OUT]][%[[C0]], %[[C0]], %[[C0]], %[[C0]], %[[C0]]]
// CHECK-GATHER:        return %[[RESULT]]

Since the generic has stride as 2 in the indexing map. After generic vectorization, when the canocalizer runs, transfer_gather will remain as it is and won't be lowered to a vector.transfer_read.
Are you asking me to add this test case in the PR somewhere? Should it go in this PR since it's not really the main thing being addressed in the PR?

Edit: Sorry, this test case is already there.

@hanhanW
Copy link
Contributor

hanhanW commented Mar 9, 2026

How is vector_ext.transfer_gather lowered in CPU backend?

Currently, it lowers to a transfer_read if the load is contigous in the fastest moving dimension. The canonicalization patterns do that: https://github.com/iree-org/iree/blob/main/compiler/src/iree/compiler/Codegen/Dialect/VectorExt/IR/test/canonicalize.mlir#L180-L192. If the fastest moving dimension is a gathered load (having indices) then that canocalization is currently not implemented for transfer_gather.

Can we add a test case for the case that vector.transfer_gather can't be lowered to vector.transfer_read?
What is the plan for the lowering of the case that fastest moving dimension is a gathered load? Is it unrolled? (Maybe @Groverkss knows the answers?)

This is the test case

func.func @implicit_gather_like_generic_stride_2(%arg0: tensor<1x1x31xf32>, %arg1: tensor<1x1x1x1x16xf32>) -> tensor<1x1x1x1x16xf32> {
  %0 = linalg.generic {
    indexing_maps = [
      affine_map<(d0, d1, d2, d3, d4) -> (d0, d1, d4 * 2)>,
      affine_map<(d0, d1, d2, d3, d4) -> (d0, d1, d2, d3, d4)>
    ],
    iterator_types = ["parallel", "parallel", "parallel", "parallel", "parallel"]
  } ins(%arg0 : tensor<1x1x31xf32>) outs(%arg1 : tensor<1x1x1x1x16xf32>) {
  ^bb0(%in: f32, %out: f32):
    linalg.yield %in : f32
  } -> tensor<1x1x1x1x16xf32>
  return %0 : tensor<1x1x1x1x16xf32>
}
// CHECK-GATHER-LABEL: func.func @implicit_gather_like_generic_stride_2
// CHECK-GATHER-SAME:   %[[IN:[a-zA-Z0-9]+]]: tensor<1x1x31xf32>
// CHECK-GATHER-SAME:   %[[OUT:[a-zA-Z0-9]+]]: tensor<1x1x1x1x16xf32>
// CHECK-GATHER-DAG:    %[[C0:.+]] = arith.constant 0 : index
// CHECK-GATHER-DAG:    %[[DENSE:.+]] = arith.constant dense<2> : vector<16xindex>
// CHECK-GATHER-DAG:    %[[STEP:.+]] = vector.step : vector<16xindex>
// CHECK-GATHER:        %[[INDICES:.+]] = arith.muli %[[STEP]], %[[DENSE]] : vector<16xindex>
// CHECK-GATHER:        %[[GATHER:.+]] = iree_vector_ext.transfer_gather %[[IN]][%[[C0]], %[[C0]], %[[C0]]]
// CHECK-GATHER:        %[[RESULT:.+]] = vector.transfer_write %[[GATHER]], %[[OUT]][%[[C0]], %[[C0]], %[[C0]], %[[C0]], %[[C0]]]
// CHECK-GATHER:        return %[[RESULT]]

Since the generic has stride as 2 in the indexing map. After generic vectorization, when the canocalizer runs, transfer_gather will remain as it is and won't be lowered to a vector.transfer_read. Are you asking me to add this test case in the PR somewhere? Should it go in this PR since it's not really the main thing being addressed in the PR?

Edit: Sorry, this test case is already there.

Sorry, it was not obvious to me. It's good if we already have it. Can you check the indexing maps as well? They are important to vectorization, right? (I should run it myself, but I don't have a fresh build yet.)

Signed-off-by: NoumanAmir657 <[email protected]>
@NoumanAmir657
Copy link
Contributor Author

@hanhanW I don't think the CI failure is related to this PR. Let me know if further changes are required. Thanks!

Comment on lines +454 to +458
SmallVector<AffineExpr> sourceExprs;
for (int i = 0; i < inRank - 1; ++i) {
sourceExprs.push_back(rewriter.getAffineDimExpr(i));
}
sourceExprs.push_back(rewriter.getAffineSymbolExpr(0));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Source map construction is incorrect for general indexing maps

  VectorizeIREEVectorExtOps.cpp lines 458-462:
  for (int i = 0; i < inRank - 1; ++i) {
    sourceExprs.push_back(rewriter.getAffineDimExpr(i));
  }
  sourceExprs.push_back(rewriter.getAffineSymbolExpr(0));

This hardcodes the contiguous source dimensions to d0, d1, ..., d_{inRank-2}, but the actual input map's contiguous dims may use different loop dimensions. For example, if the input map is (d0, d1, d2, d3, d4) -> (d2, d1, d4 * 2),
this code generates source map (d0,d1,d2,d3,d4)[s0] -> (d0, d1, s0) when it should be (d0,d1,d2,d3,d4)[s0] -> (d2, d1, s0).

The fix: extract the actual dim expressions from the input map's contiguous results rather than assuming they are sequential from d0

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in the below comment.

Comment on lines +461 to +462
AffineMap indexMap = AffineMap::get(
numLoops, 1, {rewriter.getAffineDimExpr(numLoops - 1)}, ctx);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This assumes the gathered (non-affine) expression always involves the last loop dimension (d4 in the test). But isImplicitGather doesn't enforce this - it only checks that the input map is "not a projected permutation." A map like (d0, d1, d2, d3, d4) -> (d0, d2 * 2, d1) would pass the detection but be incorrectly vectorized because the gathered dim is d2, not d4.

Copy link
Contributor Author

@NoumanAmir657 NoumanAmir657 Mar 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can constraint it like this:

  • Constraint output to be identity indexing map
  • Constraint the last dim of input indexing map to be a AffineBinaryExpr or an AffineDimExpr. If dk is last dim of output indexing map then the last index of input indexing map should be dk * stride + dm where dm!=dk or just be dk.
  • For the above to work, another constraint would be needed which basically checks whether all leading dims of input and output are 1, except the last dim. In that case, the gathered expression will basically become dk * stride. And it would ensure that we are only gathering/loading from last dimension only.

These checks seems to work on the usecase that I am targeting. It will also work on the case where we have multiple gather dims expressions because of the unit dim check as mentioned in the third point above.

Do you think these constraints are okay to move forward with?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For a case like the below:

// input indexing map
#map1 = affine_map<(d0, d1, d2, d3, d4) -> (d0, d3 * 2 + d1, d4 * 2 + d2)>
// output indexing map
#map2 = affine_map<(d0, d1, d2, d3, d4) -> (d0, d1, d2, d3, d4)>

Since, we constrianed that the load/gather will be a form of d4 * stride + d_whatever_except_d4 and that all leading dims uptil the last dim are 1 (input tensor<1x1x31> and output tensor<1x1x1x1x16), for the source map we will have all affine_map<(d0, d1, d2, d3, d4)[s0] -> (0, 0, s0)>. All the leading dims except the last dim of input will be 0.

Copy link
Contributor Author

@NoumanAmir657 NoumanAmir657 Mar 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

func.func @implicit_gather_like_generic_stride_2(%arg0: tensor<1x1x31xf32>, %arg1: tensor<1x1x1x1x16xf32>) -> tensor<1x1x1x1x16xf32> {
  %0 = linalg.generic {
    indexing_maps = [
      affine_map<(d0, d1, d2, d3, d4) -> (d0, d1, d4 * 2 + d0)>,
      affine_map<(d0, d1, d2, d3, d4) -> (d0, d1, d2, d3, d4)>
    ],
    iterator_types = ["parallel", "parallel", "parallel", "parallel", "parallel"]
  } ins(%arg0 : tensor<1x1x31xf32>) outs(%arg1 : tensor<1x1x1x1x16xf32>) {
  ^bb0(%in: f32, %out: f32):
    linalg.yield %in : f32
  } -> tensor<1x1x1x1x16xf32>
  return %0 : tensor<1x1x1x1x16xf32>
}
// CHECK-GATHER:       #[[$MAP0:.+]] = affine_map<(d0, d1, d2, d3, d4)[s0] -> (0, 0, s0)>
// CHECK-GATHER:       #[[$MAP1:.+]] = affine_map<(d0, d1, d2, d3, d4)[s0] -> (d4)>
// CHECK-GATHER-LABEL: func.func @implicit_gather_like_generic_stride_2
// CHECK-GATHER-SAME:    %[[IN:[a-zA-Z0-9]+]]: tensor<1x1x31xf32>
// CHECK-GATHER-SAME:    %[[OUT:[a-zA-Z0-9]+]]: tensor<1x1x1x1x16xf32>
// CHECK-GATHER-DAG:     %[[C0:.+]] = arith.constant 0 : index
// CHECK-GATHER-DAG:     %[[DENSE:.+]] = arith.constant dense<2> : vector<16xindex>
// CHECK-GATHER-DAG:     %[[PASSTHRU:.+]] = arith.constant 0.000000e+00 : f32
// CHECK-GATHER-DAG:     %[[STEP:.+]] = vector.step : vector<16xindex>
// CHECK-GATHER:         %[[INDICES:.+]] = arith.muli %[[STEP]], %[[DENSE]] : vector<16xindex>
// CHECK-GATHER:         %[[GATHER:.+]] = iree_vector_ext.transfer_gather %[[IN]][%[[C0]], %[[C0]], %[[C0]]]
// CHECK-GATHER-SAME:      [%[[INDICES]] : vector<16xindex>], %[[PASSTHRU]]
// CHECK-GATHER-SAME:      {indexing_maps = [#[[$MAP0]], #[[$MAP1]]]}
// CHECK-GATHER:         %[[RESULT:.+]] = vector.transfer_write %[[GATHER]], %[[OUT]][%[[C0]], %[[C0]], %[[C0]], %[[C0]], %[[C0]]]
// CHECK-GATHER:         return %[[RESULT]]

Future PRs can make isImplicitGather less conservative.

return success(maxFlatVecSize < maxVectorSize);
}

static bool isImplicitGather(linalg::GenericOp genericOp) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should be conservative in codegen lowering, i.e., we should prevent incorrect lowering in the first place.

The check and the implementation are not aligned to me, so I asked claude to provide the comment:


isImplicitGather is too permissive for the transformation's assumptions

The detection function (GenericVectorization.cpp lines 148-177) checks:

  • All parallel loops
  • Exactly 1 input, 1 output
  • Input map is NOT a projected permutation
  • Output map IS a projected permutation
  • Body has only the yield op

But vectorizeImplicitGatherToTransferGather additionally assumes:

  • Only the last result expression of the input map is non-trivial
  • The contiguous dims are d0..d_{n-2} in sequential order
  • The gathered expression involves only d_{numLoops-1}
  • The output map is identity (not just projected permutation)

If isImplicitGather returns true but these structural assumptions don't hold, the generated IR is silently incorrect. The detection function must validate the exact structural requirements, or the transformation must analyze the actual map expressions.

Comment on lines +476 to +478
auto transferWriteOp = vector::TransferWriteOp::create(
rewriter, loc, transferGatherOp.getResult(), outOperand->get(),
writeOffsets);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar comment for the mismatch:

The transfer_write uses identity offsets and no permutation map. But isImplicitGather allows any projected permutation for the output map (e.g., (d0,d1,d2,d3,d4) -> (d4,d3,d2,d1,d0)). For non-identity output maps, the write would place elements in incorrect positions. Either restrict isImplicitGather to identity output maps, or generate the correct permutation map for the write.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isImplictiGather restriced to output identity maps.

ArrayRef<int64_t> vectorSizes) {
Location loc = op.getLoc();
MLIRContext *ctx = rewriter.getContext();
rewriter.setInsertionPoint(op);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should have the insertion point guard when you move the insertion point. IMO, it's callee's responsibility to restore the insertion point. Otherwise, you spread the logic to callers.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

Comment on lines +479 to +480
rewriter.modifyOpInPlace(
transferWriteOp, [&] { transferWriteOp.getMaskMutable().assign(mask); });
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not pass mask to the builder?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I decided to consider only static cases for this PR so masking won't be needed. Fixed in recent commit.

}

Block *body = genericOp.getBlock();
return std::distance(body->begin(), body->end()) == 1;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: use llvm::hasSingleElement(*body)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

Comment on lines +434 to +443
int64_t stride = 1;
inputMap.getResult(inputMap.getNumResults() - 1).walk([&](AffineExpr sub) {
if (auto mul = dyn_cast<AffineBinaryOpExpr>(sub)) {
if (mul.getKind() == AffineExprKind::Mul) {
if (auto rhs = dyn_cast<AffineConstantExpr>(mul.getRHS())) {
stride = rhs.getValue();
}
}
}
});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is fragile and it is wrong for some cases: It treats d4 * 2 + 1 as the same as d4 * 2.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made the isImplicitGather more conservative so the case d4 * 2 + 1 won't be considered for vectorization.

Copy link
Contributor

@hanhanW hanhanW left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for addressing the comments, here is a new round of review. Overall looks okay to me, just some nits.

I did not expect that d1e7974 is landed before this. Can you move the implementation to VectorizableOpInterface.cpp after rebase? Thanks!

if (dim.getPosition() == lastLoopDim) {
return cast<AffineConstantExpr>(mul.getLHS()).getValue();
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MLIR's canonical form guarantees RHS of Mul is always the constant/symbolic expression, so this is a deadcode.

https://github.com/llvm/llvm-project/blob/2d70dbdb357d2b4080b30d35a105442337bc1fdd/mlir/include/mlir/IR/AffineExpr.h#L42-L43

/// RHS of mul is always a constant or a symbolic expression.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

Comment on lines +175 to +178
if (auto dim = dyn_cast<AffineDimExpr>(mul.getRHS())) {
return dim.getPosition() == lastLoopDim &&
isa<AffineConstantExpr>(mul.getLHS());
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here, this is deadcode.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

Comment on lines +206 to +209
auto inputMap =
genericOp.getMatchingIndexingMap(genericOp.getDpsInputOperand(0));
auto outputMap =
genericOp.getMatchingIndexingMap(genericOp.getDpsInitOperand(0));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

spell out the type

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

Comment on lines +335 to +337
auto gatherResult =
IREE::VectorExt::vectorizeImplicitGatherToTransferGather(
rewriter, genericOp, vectorSizes);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Spell out the type

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

Comment on lines +383 to +384
auto *inOperand = op.getDpsInputOperand(0);
auto *outOperand = op.getDpsInitOperand(0);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

spell out the type

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

Comment on lines +421 to +426
AffineMap sourceMap =
AffineMap::get(numLoops, /*symbolCount=*/1, sourceExprs, ctx);

AffineMap indexMap =
AffineMap::get(numLoops, /*symbolCount=*/1,
{rewriter.getAffineDimExpr(numLoops - 1)}, ctx);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: these can use auto because ::get spells the type.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

llvm_unreachable("isImplicitGather should have rejected this expression");
}

FailureOr<Value>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can always return Value. It should not fail.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

Comment on lines +951 to +953
if (isImplicitGather(genericOp)) {
return true;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should check with vectorizeToTransferGather flag.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

Signed-off-by: NoumanAmir657 <[email protected]>
@hanhanW
Copy link
Contributor

hanhanW commented Mar 13, 2026

https://github.com/iree-org/iree/actions/runs/23066888975/attempts/2?pr=23664 seems to be flaky, it passes in a re-run. Can you fix the bazel issue?

Signed-off-by: NoumanAmir657 <[email protected]>
@NoumanAmir657
Copy link
Contributor Author

https://github.com/iree-org/iree/actions/runs/23066888975/attempts/2?pr=23664 seems to be flaky, it passes in a re-run. Can you fix the bazel issue?

Fixed in lastest commit.

@NoumanAmir657
Copy link
Contributor Author

@hanhanW The CI is passing now. Can you merge this for me? Thanks!

@hanhanW hanhanW merged commit cb98730 into iree-org:main Mar 13, 2026
57 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants