[GPUHeuristics] Refactor MMA heuristic seeds to be architecture-specific#23717
[GPUHeuristics] Refactor MMA heuristic seeds to be architecture-specific#23717yzhang93 merged 5 commits intoiree-org:mainfrom
Conversation
Each GPU architecture now defines a constexpr ArchSeedSet containing gemm, scaled gemm, and convolution seed arrays indexed by GemmSize. The architecture is determined from the target and the corresponding seed set is returned (e.g. RDNA4 uses tuned seeds from RX 9070 XT benchmarking, others use the default). GemmSize enum values are changed to start at 0 for direct array indexing. Add RDNA4 (gfx1201) config tests covering small, medium, and large matmul and convolution shapes. RDNA4 benchmark results (RX 9070 XT, vs default seeds): - GEMM shapes: ~40% of shapes improved, 15.9% geometric mean speedup. - Prod Conv shapes: ~25% of shapes improved, 6% geometric mean speedup. - Proxy Conv shapes: ~30% of shapes improved, 14.5% geometric mean speedup. Co-authored-by: Claude <[email protected]> Signed-off-by: yzhang93 <[email protected]>
3b5aca1 to
792645b
Compare
compiler/src/iree/compiler/Codegen/Dialect/GPU/TargetUtils/ConfigUtils.cpp
Outdated
Show resolved
Hide resolved
compiler/src/iree/compiler/Codegen/Dialect/GPU/TargetUtils/ConfigUtils.cpp
Outdated
Show resolved
Hide resolved
kuhar
left a comment
There was a problem hiding this comment.
Encoding target-specific features in generic compiler code is an anti-pattern... Is there any way to move it to the rocm target?
compiler/src/iree/compiler/Codegen/Dialect/GPU/TargetUtils/ConfigUtils.cpp
Outdated
Show resolved
Hide resolved
compiler/src/iree/compiler/Codegen/Dialect/GPU/TargetUtils/ConfigUtils.cpp
Outdated
Show resolved
Hide resolved
| }, | ||
| }; | ||
|
|
||
| /// RDNA4 seeds (tuned based on RX 9070 XT benchmarking data). |
There was a problem hiding this comment.
I guess the question is whether this should be arch or SKU-specific. With CDNA3, we saw that different SKUs like MI300X vs. MI308 needed different heuristics. We have that supported for default tuning specs.
There was a problem hiding this comment.
Maybe, but it also depends on what we have tried. I think this is something that we will have to evolve and keep in mind to manage the different SKUs. But without measuring we wont know what these should be.
Signed-off-by: yzhang93 <[email protected]>
| }, | ||
| }; | ||
|
|
||
| /// RDNA4 seeds (tuned based on RX 9070 XT benchmarking data). |
There was a problem hiding this comment.
Maybe, but it also depends on what we have tried. I think this is something that we will have to evolve and keep in mind to manage the different SKUs. But without measuring we wont know what these should be.
Signed-off-by: yzhang93 <[email protected]>
Good point. Moved |
compiler/src/iree/compiler/Codegen/Dialect/GPU/TargetUtils/KnownTargets.cpp
Outdated
Show resolved
Hide resolved
kuhar
left a comment
There was a problem hiding this comment.
Thanks for the changes. I think this is a pragmatic compromise -- we can land in the current form and decide how to extend later when we learn what it takes to generalize it to other architectures / SKUs.
Signed-off-by: yzhang93 <[email protected]>
Signed-off-by: yzhang93 <[email protected]>
Each GPU architecture now defines a constexpr ArchSeedSet containing gemm, scaled gemm, and convolution seed arrays indexed by GemmSize. The architecture is determined from the target and the corresponding seed set is returned (e.g. RDNA4 uses tuned seeds from RX 9070 XT benchmarking, others use the default). GemmSize enum values are changed to start at 0 for direct array indexing.
Add RDNA4 (gfx1201) config tests covering small, medium, and large matmul and convolution shapes.
RDNA4 benchmark results (RX 9070 XT, vs default seeds):