[GPUHeuristics] Refactor MMA heuristic seeds to be architecture-specific#23717

Merged

yzhang93 merged 5 commits intoiree-org:mainfrom

yzhang93:arch-specific-heuristic-seeds

Mar 11, 2026

Contributor

yzhang93 commented Mar 10, 2026

Each GPU architecture now defines a constexpr ArchSeedSet containing gemm, scaled gemm, and convolution seed arrays indexed by GemmSize. The architecture is determined from the target and the corresponding seed set is returned (e.g. RDNA4 uses tuned seeds from RX 9070 XT benchmarking, others use the default). GemmSize enum values are changed to start at 0 for direct array indexing.

Add RDNA4 (gfx1201) config tests covering small, medium, and large matmul and convolution shapes.

RDNA4 benchmark results (RX 9070 XT, vs default seeds):

GEMM shapes: ~40% of shapes improved, 15.9% geometric mean speedup.
Prod Conv shapes: ~25% of shapes improved, 6% geometric mean speedup.
Proxy Conv shapes: ~30% of shapes improved, 14.5% geometric mean speedup.

yzhang93 requested review from Groverkss, Max191, krzysz00, kuhar, nirvedhmeshram and qedawkins as code owners

March 10, 2026 01:10


          [GPUHeuristics] Refactor MMA heuristic seeds to be architecture-specific

792645b

Each GPU architecture now defines a constexpr ArchSeedSet containing
gemm, scaled gemm, and convolution seed arrays indexed by GemmSize.
The architecture is determined from the target and the corresponding
seed set is returned (e.g. RDNA4 uses tuned seeds from RX 9070 XT
benchmarking, others use the default). GemmSize enum values are
changed to start at 0 for direct array indexing.

Add RDNA4 (gfx1201) config tests covering small, medium, and large
matmul and convolution shapes.

RDNA4 benchmark results (RX 9070 XT, vs default seeds):
- GEMM shapes: ~40% of shapes improved, 15.9% geometric mean speedup.
- Prod Conv shapes: ~25% of shapes improved, 6% geometric mean speedup.
- Proxy Conv shapes: ~30% of shapes improved, 14.5% geometric mean speedup.

Co-authored-by: Claude <[email protected]>
Signed-off-by: yzhang93 <[email protected]>

yzhang93 force-pushed the arch-specific-heuristic-seeds branch from 3b5aca1 to 792645b Compare

March 10, 2026 02:30

Yu-Zhewen approved these changes

View reviewed changes

Contributor

Yu-Zhewen left a comment

LGTM, thanks!

compiler/src/iree/compiler/Codegen/Dialect/GPU/TargetUtils/ConfigUtils.cpp Outdated Show resolved Hide resolved

compiler/src/iree/compiler/Codegen/Dialect/GPU/TargetUtils/ConfigUtils.cpp Outdated Show resolved Hide resolved

yzhang93 requested review from MaheshRavishankar and lialan

March 10, 2026 18:57

lialan reviewed

View reviewed changes

compiler/src/iree/compiler/Codegen/Common/GPU/GPUHeuristics.h Outdated Show resolved Hide resolved

kuhar reviewed

View reviewed changes

Member

kuhar left a comment

Encoding target-specific features in generic compiler code is an anti-pattern... Is there any way to move it to the rocm target?

compiler/src/iree/compiler/Codegen/Dialect/GPU/TargetUtils/ConfigUtils.cpp Outdated Show resolved Hide resolved

compiler/src/iree/compiler/Codegen/Common/GPU/GPUHeuristics.h Outdated Show resolved Hide resolved

compiler/src/iree/compiler/Codegen/Common/GPU/GPUHeuristics.h Outdated Show resolved Hide resolved

compiler/src/iree/compiler/Codegen/Dialect/GPU/TargetUtils/ConfigUtils.cpp Outdated Show resolved Hide resolved

compiler/src/iree/compiler/Codegen/Dialect/GPU/TargetUtils/ConfigUtils.cpp Outdated

+                  },
+              };
+              /// RDNA4 seeds (tuned based on RX 9070 XT benchmarking data).

Member

kuhar Mar 10, 2026

I guess the question is whether this should be arch or SKU-specific. With CDNA3, we saw that different SKUs like MI300X vs. MI308 needed different heuristics. We have that supported for default tuning specs.

Collaborator

MaheshRavishankar Mar 11, 2026

Maybe, but it also depends on what we have tried. I think this is something that we will have to evolve and keep in mind to manage the different SKUs. But without measuring we wont know what these should be.


          Address comments

0246a29

Signed-off-by: yzhang93 <[email protected]>

MaheshRavishankar approved these changes

View reviewed changes

compiler/src/iree/compiler/Codegen/Dialect/GPU/TargetUtils/ConfigUtils.cpp Outdated

+                  },
+              };
+              /// RDNA4 seeds (tuned based on RX 9070 XT benchmarking data).

Collaborator

MaheshRavishankar Mar 11, 2026

Maybe, but it also depends on what we have tried. I think this is something that we will have to evolve and keep in mind to manage the different SKUs. But without measuring we wont know what these should be.


          Move seeds table to KnowTarget.cpp

613bcfc

Signed-off-by: yzhang93 <[email protected]>

Contributor Author

yzhang93 commented Mar 11, 2026 •

edited

Loading

Encoding target-specific features in generic compiler code is an anti-pattern... Is there any way to move it to the rocm target?

Good point. Moved ArchSeedSet, the seed tables, and getArchSeedSet() to KnownTargets.cpp/.h. Could you review it again? @kuhar

kuhar reviewed

View reviewed changes

compiler/src/iree/compiler/Codegen/Dialect/GPU/TargetUtils/KnownTargets.cpp Outdated Show resolved Hide resolved

kuhar approved these changes

View reviewed changes

Member

kuhar left a comment

Thanks for the changes. I think this is a pragmatic compromise -- we can land in the current form and decide how to extend later when we learn what it takes to generalize it to other architectures / SKUs.


          Minor: flip condition

42794ef

Signed-off-by: yzhang93 <[email protected]>

yzhang93 mentioned this pull request

[GPUHeuristics] Further tune LargeGemm perf #23652

Merged


          Fix bazel build

9d9c613

Signed-off-by: yzhang93 <[email protected]>

yzhang93 merged commit 07e7697 into iree-org:main

83 of 84 checks passed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Reviewers

lialan lialan left review comments

MaheshRavishankar MaheshRavishankar approved these changes

kuhar kuhar approved these changes

Yu-Zhewen Yu-Zhewen approved these changes

qedawkins Awaiting requested review from qedawkins qedawkins is a code owner

Groverkss Awaiting requested review from Groverkss Groverkss is a code owner

nirvedhmeshram Awaiting requested review from nirvedhmeshram nirvedhmeshram is a code owner

krzysz00 Awaiting requested review from krzysz00 krzysz00 is a code owner

Max191 Awaiting requested review from Max191 Max191 is a code owner

Labels

None yet