Skip to content

[Codegen][VectorExt] Add VectorExt::TransferScatter#23610

Closed
keshavvinayak01 wants to merge 6 commits intoiree-org:mainfrom
keshavvinayak01:users/keshavvinayak01/transfer-scatter
Closed

[Codegen][VectorExt] Add VectorExt::TransferScatter#23610
keshavvinayak01 wants to merge 6 commits intoiree-org:mainfrom
keshavvinayak01:users/keshavvinayak01/transfer-scatter

Conversation

@keshavvinayak01
Copy link
Contributor

@keshavvinayak01 keshavvinayak01 commented Mar 2, 2026

Adds iree_vector_ext.transfer_scatter, the write counterpart to transfer_gather, using the same unified indexing_maps attribute. For tensor operands, the operation returns the modified tensor; for memref operands, it has no result.

Example — scatter write: writing values into scattered rows of a 2D dest:

// dest[indices[i], j] = vector[i, j]
  %result = iree_vector_ext.transfer_scatter %vector into %dest[%c0, %c0]
    [%indices : vector<16xindex>] {
      indexing_maps = [
        affine_map<(d0, d1)[s0] -> (s0, d1)>,
        affine_map<(d0, d1)[s0] -> (d0)>
      ]
    } : vector<16x8xf16>, tensor<4096x8xf16> -> tensor<4096x8xf16>

Adds a lot of templatizing for shared code logic of TransferGather/TransferScatter, and missing tests in vector_lowering.mlir

Extract shared verifier, templatize fold patterns, merge lowering
files into LowerTransferGatherScatterOps.cpp, and extract GPU
distribution and layout analysis helpers to reduce code duplication
between the two ops.

Co-Authored-By: Claude Opus 4.6 <[email protected]>
Signed-off-by: Keshav Vinayak Jha <[email protected]>
@keshavvinayak01 keshavvinayak01 changed the title Users/keshavvinayak01/transfer scatter [Codegen][VectorExt] Add VectorExt::TransferScatter Mar 2, 2026
@keshavvinayak01 keshavvinayak01 force-pushed the users/keshavvinayak01/transfer-scatter branch from 0600626 to 792746d Compare March 2, 2026 15:23
keshavvinayak01 and others added 4 commits March 3, 2026 10:25
Replace custom parse/print with declarative assemblyFormat using
`(-> type($result)^)?` for the optional tensor result. Unify
TransferOpAdaptor into a single template using `if constexpr`.

Co-Authored-By: Claude Opus 4.6 <[email protected]>
Signed-off-by: Keshav Vinayak Jha <[email protected]>
Extract foldAllTrueSplatMask and foldTransferIndexVecOps to
deduplicate the gather and scatter fold functions.

Co-Authored-By: Claude Opus 4.6 <[email protected]>
Signed-off-by: Keshav Vinayak Jha <[email protected]>
Merge removeDim0FromMap/removeDim0FromIndexVecMap into a single
function with a droppedAxes pointer parameter. Remove UnrollDim0Info
struct in favor of output parameters. Templatize UnrollTransferDim
for both gather and scatter.

Co-Authored-By: Claude Opus 4.6 <[email protected]>
Signed-off-by: Keshav Vinayak Jha <[email protected]>
Add scatter equivalents of the existing gather unrolling tests:
memref, masked, tensor semantics, and 3D transposed index vector.

Co-Authored-By: Claude Opus 4.6 <[email protected]>
Signed-off-by: Keshav Vinayak Jha <[email protected]>
Copy link
Contributor

@hanhanW hanhanW left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's good to have a single PR that demonstrate the impact, but you should really break it into a few small PRs. I did not review any details, and I think we can at least break it into something like

  • Introduce the operation including roundtrip/invalid tests.
  • A PR per interface implementation.
  • Unrolling support

Please read below links for guidance:

@Groverkss
Copy link
Contributor

Nice! I think splitting PR as Hanhan suggested is a good idea.

keshavvinayak01 added a commit that referenced this pull request Mar 11, 2026
)

Add bufferization interface for `TransferScatterOp`
Split from #23703 per review feedback.

Part 3a/4 from #23610

Signed-off-by: Keshav Vinayak Jha <[email protected]>
Co-authored-by: Claude Opus 4.6 <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants