[Codegen] Apply bounds to subgroup_id#23768
Conversation
2a4b225 to
ad04bc3
Compare
amd-eochoalo
left a comment
There was a problem hiding this comment.
I think this looks good, but you may want to wait for other reviewers. Just one comment about the for-loop.
| for (std::optional<int64_t> size : workgroupSizes) { | ||
| if (size) { | ||
| maxTotalThreads *= *size; | ||
| // Cap at the hardware thread-per-workgroup limit inside the loop | ||
| // to avoid overflow from multiplying per-dimension maximums. | ||
| if (target) { | ||
| maxTotalThreads = | ||
| std::min(maxTotalThreads, | ||
| static_cast<int64_t>( | ||
| target.getWgp().getMaxThreadCountPerWorkgroup())); | ||
| } | ||
| } else { | ||
| allSizesKnown = false; | ||
| break; | ||
| } |
There was a problem hiding this comment.
Nit: I think this is a little bit more difficult to parse than it should be. Could we remove one level of nesting by moving the break earlier and then the else should not be necessary, right? Maybe changing the logic slightly too for calculating the maxTotalThreads when target is known and avoid overflow somehow or adding variables?
// pseudocode
| for (std::optional<int64_t> size : workgroupSizes) { | |
| if (size) { | |
| maxTotalThreads *= *size; | |
| // Cap at the hardware thread-per-workgroup limit inside the loop | |
| // to avoid overflow from multiplying per-dimension maximums. | |
| if (target) { | |
| maxTotalThreads = | |
| std::min(maxTotalThreads, | |
| static_cast<int64_t>( | |
| target.getWgp().getMaxThreadCountPerWorkgroup())); | |
| } | |
| } else { | |
| allSizesKnown = false; | |
| break; | |
| } | |
| // if target is available | |
| std::optional<int64_t> maxThreadCountPerWg = static_cast<int64_t> (target.getWgp().getMaxThreadCountPerWorkgroup())); | |
| for (std::optional<int64_t> size : workgroupSizes) { | |
| if (!size) { | |
| allSizesKnown = false; | |
| break; | |
| } | |
| maxTotalThreads *= *size; | |
| // Cap at the hardware thread-per-workgroup limit inside the loop | |
| // to avoid overflow from multiplying per-dimension maximums. | |
| if (maxThreadCountPerWg) { | |
| maxTotalThreads = std::min(maxTotalThreads, maxThreadCountPerWg) | |
| } | |
| } |
There was a problem hiding this comment.
I refactored all that code - do you like the new version?
Since we're fixing to start using subgroup_id more often with PCF, apply bounds to it based on the subgroup size (or sizes) and the number of threads in the workgroup. Also extend subgroup_size handling to account for known subgroup sizes instead of giving up completely when there isn't a fixed choice made yet. Co-Authored-By: Claude Opus 4.6 <[email protected]>
ad04bc3 to
8adfb9f
Compare
| if (std::optional<uint64_t> exportSubgroupSize = | ||
| exportOp->getSubgroupSizeAsUInt()) { | ||
| subgroupSize = exportSubgroupSize; | ||
| staticSubgroupSize = *exportSubgroupSize; |
| std::optional<int64_t> constantSubgroupSize; | ||
| if (minSubgroupSize && maxSubgroupSize && | ||
| *minSubgroupSize == *maxSubgroupSize) { | ||
| constantSubgroupSize = *minSubgroupSize; | ||
| } |
There was a problem hiding this comment.
nit: is there a test for this path?
There was a problem hiding this comment.
Yeah, I'm pretty sure existing tests cover this
There was a problem hiding this comment.
... Oh, no, you're right, there should be a test for this, I'll go add one (and move this to the right spot)
Since we're fixing to start using subgroup_id more often with PCF, apply bounds to it based on the subgroup size (or sizes) and the number of threads in the workgroup. Also extend subgroup_size handling to account for known subgroup sizes instead of giving up completely when there isn't a fixed choice made yet.
Also fix up some double-spaces in test attributes.