[GPU] Make GPUVectorAlloc allocate shared memory based on layout analysis#23631

Merged

Groverkss merged 2 commits intoiree-org:mainfrom

Groverkss:layout_analysis_improvements

Mar 4, 2026

Contributor

Groverkss commented Mar 3, 2026 •

edited

Loading

Before this patch, we were doing in hack in layout configuration where we checked if there would be a layout conflict and try to promote the operand for attention. This hack was fragile as well as doesn't really scale beyond simple stuff.

This patch runs vector layout analysis to check if there would be any conflicts and uses shared memory to allocate shared memory for them.

Also removes some dead code from the alloc pass.


          [GPU] Make GPUVectorAlloc allocate shared memory based on layout anal…

78cad2e

…ysis

Groverkss requested review from MaheshRavishankar, Max191, hanhanW, krzysz00, kuhar, nirvedhmeshram and qedawkins as code owners

March 3, 2026 16:28

Groverkss requested review from keshavvinayak01 and sommerlukas

March 3, 2026 16:28

Groverkss commented

View reviewed changes

compiler/src/iree/compiler/Codegen/Common/GPU/GPUVectorAlloc.cpp Outdated Show resolved Hide resolved

krzysz00 approved these changes

View reviewed changes

Contributor

krzysz00 left a comment

I've got some high-level questions, but don't see any problems here as it stands.

compiler/src/iree/compiler/Codegen/Common/GPU/GPUVectorAlloc.cpp Outdated Show resolved Hide resolved

compiler/src/iree/compiler/Codegen/Common/GPU/GPUVectorAlloc.cpp

+                  // Synchronize after the write to shared memory before we read from it.
+                  auto synced =
+                      IREE::GPU::ValueBarrierOp::create(builder, op->getLoc(), *ret);

Contributor

krzysz00 Mar 3, 2026

@qedawkins Is this the sort of thing that would justify memory spaces on value barrier, or will that get handled by the tensor getting the workgroup address space during bufferization?

compiler/src/iree/compiler/Codegen/Common/GPU/GPUVectorAlloc.cpp Outdated Show resolved Hide resolved

compiler/src/iree/compiler/Codegen/Common/GPU/GPUVectorAlloc.cpp Show resolved Hide resolved

Groverkss mentioned this pull request

[Codegen] Remove attention transpose intrinsic hacks #23633

Merged

sommerlukas approved these changes

View reviewed changes

Contributor

sommerlukas left a comment

LGTM, just minor stuff.

compiler/src/iree/compiler/Codegen/Common/GPU/GPUVectorAlloc.cpp Outdated Show resolved Hide resolved

compiler/src/iree/compiler/Codegen/Common/GPU/GPUVectorAlloc.cpp Outdated Show resolved Hide resolved


          Address comments

a19cc25

Groverkss enabled auto-merge (squash)

March 4, 2026 13:43

Groverkss merged commit 2284319 into iree-org:main

56 checks passed

Groverkss added a commit that referenced this pull request


          [Codegen] Remove attention transpose intrinsic hacks (#23633)

1ea0b66

After #23631 we will be using the
analysis to determine conflicts, which allows us to undo a bunch of
hacks we did earlier to not have conflicts between the chained mma in
attention.

This patch simply sets the intrinsic to be col major in KernelConfig and
rest of the pipeline automatically picks up if there needs to be a
intermediate conflict using shared memory or not.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Reviewers

krzysz00 krzysz00 approved these changes

sommerlukas sommerlukas approved these changes

qedawkins Awaiting requested review from qedawkins qedawkins is a code owner

kuhar Awaiting requested review from kuhar kuhar is a code owner

nirvedhmeshram Awaiting requested review from nirvedhmeshram nirvedhmeshram is a code owner

Max191 Awaiting requested review from Max191 Max191 is a code owner

hanhanW Awaiting requested review from hanhanW hanhanW is a code owner

MaheshRavishankar Awaiting requested review from MaheshRavishankar MaheshRavishankar is a code owner

keshavvinayak01 Awaiting requested review from keshavvinayak01

Labels

None yet