Ensure various scalar cross platform helper APIs are handled directly as intrinsic#80789
Conversation
|
Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch, @kunalspathak Issue DetailsMuch like with APIs directly exposed on However, unlike the vector APIs, the bit operation APIs were not directly handled as intrinsic and only ever executed a software fallback which included manual dispatch to the relevant underlying hardware intrinsics. This works well most of the time, but it does have the side effect of reducing/impacting JIT throughput, being subject to inlining heuristics, and was not able to participate in constant folding. This PR updates those APIs to be directly imported as the relevant hardware intrinsic, when supported, and to more generally support constant folding.
|
src/coreclr/jit/importercalls.cpp
Outdated
There was a problem hiding this comment.
We could handle this specially with a check for negative values, but it is a more complex change and so I decided to push it out to a later PR.
src/coreclr/jit/importercalls.cpp
Outdated
There was a problem hiding this comment.
Since there isn't an "always available" instruction for x86/x64, we should probably import this as GenTreeIntrinsic much as happens for various Math APIs like Sin, Cos, and other APIs.
Doing so would allow us to still perform post import constant folding and then transform this back into a GT_CALL during rationalization on older hardware.
However, given it is a more complex change I opted to push it out to a later PR.
There was a problem hiding this comment.
For Arm64, we could do the same or we could add basic SIMD constant folding support for PopCount, AddAcross, and ToScalar. CreateScalarNode will already generate a GT_CNS_VEC where applicable, including post import.
46a98b0 to
20e1693
Compare
392fc87 to
cd5959a
Compare
37d0475 to
76b84c5
Compare
76b84c5 to
a527d5a
Compare
e3f2d19 to
6483f2e
Compare
|
Not a perfect diff due to 40 missed contexts, but still good overall and showing some TP improvements + positive diffs. There is notably a very small |
|
/azp run runtime-coreclr jitstress-isas-x86, runtime-coreclr jitstress-isas-arm, runtime-coreclr outerloop |
|
Azure Pipelines successfully started running 3 pipeline(s). |
|
CC. @dotnet/jit-contrib, this is ready for review. |
Much like with APIs directly exposed on
Vector64/128/256/512, several of the APIs exposed onBitOperationsare "cross platform helper APIs" and are used in various perf critical code paths.However, unlike the vector APIs, the bit operation APIs were not directly handled as intrinsic and only ever executed a software fallback which included manual dispatch to the relevant underlying hardware intrinsics. This works well most of the time, but it does have the side effect of reducing/impacting JIT throughput, being subject to inlining heuristics, and was not able to participate in constant folding.
This PR updates those APIs to be directly imported as the relevant hardware intrinsic, when supported, and to more generally support constant folding.