[GPU] Add coalescing to reduction tiling by nirvedhmeshram · Pull Request #23673 · iree-org/iree

nirvedhmeshram · 2026-03-05T22:00:29Z

This is useful for doing further optimizations like prefetching see details in #23557

Regrading the implementation, we check for a parent loop because we may do reduction tiling multiple times e.g in direct convolution we tile the filter reduction dims, do pack to intrinsics and reshape patterns and then we later tile the channel dimension, so we want to be able to coalesce all these loops in one loop. It is assumed that all loops (parent or tiling) are from tiling of the same root op, see discussion in the PR why we didnt end up adding additional checks to verify this.

I check the effect of this on direct convolution and found no difference in the performance on 336 convolution shapes (excluded any filter 1 shapes as they go down GEMM path) and found no performance difference. I believe the true impact of this can only be seen with tuning.

Fixes : #23557

compiler/src/iree/compiler/Codegen/Common/GPU/test/gpu_apply_padding_online_attention.mlir

Signed-off-by: Nirvedh Meshram <[email protected]> Co-Authored-By: Claude Sonnet 4 <[email protected]> Signed-off-by: Nirvedh Meshram <[email protected]>

compiler/src/iree/compiler/Codegen/Common/TileAndFuseUtils.cpp

Max191 · 2026-03-10T12:54:44Z

I check the effect of this on direct convolution and found no difference in the performance on 336 convolution shapes

There was no difference at all? Doesn't this cause prefetching to happen where it otherwise wouldn't? Were you testing the default heuristic or using the tuned configurations that were generated from when we had non-coalesced loops?

nirvedhmeshram · 2026-03-10T15:13:29Z

I check the effect of this on direct convolution and found no difference in the performance on 336 convolution shapes

There was no difference at all? Doesn't this cause prefetching to happen where it otherwise wouldn't? Were you testing the default heuristic or using the tuned configurations that were generated from when we had non-coalesced loops?

By default prefetching is off for direct conv, and i didnt use a tuning spec when doing this comparison, just wanted to show in context of this PR that this transform is not harmful to performance. With tuning I am able to see perf improvement with prefetching when doing coalescing that was not possible without it.

compiler/src/iree/compiler/Codegen/Common/GPU/test/gpu_apply_tiling_level.mlir

compiler/src/iree/compiler/Codegen/Common/TileAndFuseUtils.cpp

Signed-off-by: Nirvedh Meshram <[email protected]>

nirvedhmeshram requested review from Groverkss, MaheshRavishankar, Max191, hanhanW, krzysz00, kuhar and qedawkins as code owners March 5, 2026 22:00

nirvedhmeshram force-pushed the coalesce_pr branch 2 times, most recently from 85efe1a to 62c9374 Compare March 5, 2026 22:47

nirvedhmeshram commented Mar 5, 2026

View reviewed changes

compiler/src/iree/compiler/Codegen/Common/GPU/test/gpu_apply_padding_online_attention.mlir Outdated Show resolved Hide resolved

nirvedhmeshram marked this pull request as draft March 6, 2026 22:44

nirvedhmeshram force-pushed the coalesce_pr branch from 2f56966 to d9c4079 Compare March 9, 2026 19:09

[GPU] Add coalescing to reduction tiling

ffb0b44

Signed-off-by: Nirvedh Meshram <[email protected]> Co-Authored-By: Claude Sonnet 4 <[email protected]> Signed-off-by: Nirvedh Meshram <[email protected]>

nirvedhmeshram marked this pull request as ready for review March 9, 2026 20:34

MaheshRavishankar reviewed Mar 9, 2026

View reviewed changes

compiler/src/iree/compiler/Codegen/Common/TileAndFuseUtils.cpp Outdated Show resolved Hide resolved

MaheshRavishankar approved these changes Mar 11, 2026

View reviewed changes

compiler/src/iree/compiler/Codegen/Common/GPU/test/gpu_apply_tiling_level.mlir Show resolved Hide resolved

compiler/src/iree/compiler/Codegen/Common/TileAndFuseUtils.cpp Outdated Show resolved Hide resolved

address reviewer comments

ffffc63

Signed-off-by: Nirvedh Meshram <[email protected]>

nirvedhmeshram force-pushed the coalesce_pr branch from d9c4079 to ffffc63 Compare March 11, 2026 16:13

nirvedhmeshram merged commit bd361f7 into iree-org:main Mar 11, 2026
55 of 57 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[GPU] Add coalescing to reduction tiling#23673

[GPU] Add coalescing to reduction tiling#23673
nirvedhmeshram merged 2 commits intoiree-org:mainfrom
nirvedhmeshram:coalesce_pr

nirvedhmeshram commented Mar 5, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Max191 commented Mar 10, 2026

Uh oh!

nirvedhmeshram commented Mar 10, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

nirvedhmeshram commented Mar 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Max191 commented Mar 10, 2026

Uh oh!

nirvedhmeshram commented Mar 10, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

nirvedhmeshram commented Mar 5, 2026 •

edited

Loading