Skip to content

[GPU] Make GPUVectorAlloc allocate shared memory based on layout analysis#23631

Merged
Groverkss merged 2 commits intoiree-org:mainfrom
Groverkss:layout_analysis_improvements
Mar 4, 2026
Merged

[GPU] Make GPUVectorAlloc allocate shared memory based on layout analysis#23631
Groverkss merged 2 commits intoiree-org:mainfrom
Groverkss:layout_analysis_improvements

Conversation

@Groverkss
Copy link
Contributor

@Groverkss Groverkss commented Mar 3, 2026

Before this patch, we were doing in hack in layout configuration where we checked if there would be a layout conflict and try to promote the operand for attention. This hack was fragile as well as doesn't really scale beyond simple stuff.

This patch runs vector layout analysis to check if there would be any conflicts and uses shared memory to allocate shared memory for them.

Also removes some dead code from the alloc pass.

Copy link
Contributor

@krzysz00 krzysz00 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've got some high-level questions, but don't see any problems here as it stands.


// Synchronize after the write to shared memory before we read from it.
auto synced =
IREE::GPU::ValueBarrierOp::create(builder, op->getLoc(), *ret);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@qedawkins Is this the sort of thing that would justify memory spaces on value barrier, or will that get handled by the tensor getting the workgroup address space during bufferization?

Copy link
Contributor

@sommerlukas sommerlukas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, just minor stuff.

@Groverkss Groverkss enabled auto-merge (squash) March 4, 2026 13:43
@Groverkss Groverkss merged commit 2284319 into iree-org:main Mar 4, 2026
56 checks passed
Groverkss added a commit that referenced this pull request Mar 4, 2026
After #23631 we will be using the
analysis to determine conflicts, which allows us to undo a bunch of
hacks we did earlier to not have conflicts between the chained mma in
attention.

This patch simply sets the intrinsic to be col major in KernelConfig and
rest of the pipeline automatically picks up if there needs to be a
intermediate conflict using shared memory or not.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants