SimdAsHWIntrinsic improvements and cleanup#80134
Conversation
…lpers where one path was already
|
Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch, @kunalspathak Issue DetailsThis does some general cleanup to the As part of that, this finishes the cleanup to use
|
d4d8dac to
e8516fa
Compare
|
/azp run runtime-coreclr jitstress-isas-x86, runtime-coreclr jitstress-isas-arm, runtime-coreclr outerloop |
|
Azure Pipelines successfully started running 3 pipeline(s). |
|
CC. @dotnet/jit-contrib. Some pretty substantial wins here, reducing the size by 40k bytes on Arm64 and 90-102k bytes on x64. There is a small gain for Arm64 minopts and a small regression for x64 minopts, but also some nice throughput gains (-0.14% for libraries.pmi on x64) for full opts and no TP change for minopts. |
|
The few regressions are cases where we are spilling an operand ( The vast majority of these are simply the For |
|
The mono changes look ok. |
| case NI_VectorT256_op_Division: | ||
| #endif // TARGET_XARCH | ||
| { | ||
| return gtNewSimdBinOpNode(GT_DIV, retType, op1, op2, simdBaseJitType, simdSize, |
There was a problem hiding this comment.
I assume eventually we might need to implement the magic division optimization if op2 is CNS_VEC? 🙂 (or pdivp is fast enough as is?)
There was a problem hiding this comment.
Right, there are some optimization opportunities here that we can more easily enable/centralize in the future with this change.
|
Nice diffs |
…t on downlevel hardware
|
/azp run runtime-coreclr jitstress-isas-x86, runtime-coreclr jitstress-isas-arm, runtime-coreclr outerloop |
|
Azure Pipelines successfully started running 3 pipeline(s). |
|
Merged slightly early (only mono llvmfullaot legs left). CI had already been passing except for two |
This does some general cleanup to the
simdashwintrinsiccode by sharing code paths where possible and using thegtNewSimd*Nodehelpers where they exist.As part of that, this resolves an issue with the operand order used for a
gtNewSimdWithElementNodecall. I additionally attempted to finishe the cleanup to usefgMakeMultiUserather thanimpCloneExpr, however it hit various asserts and needs more followup (I plan on doing this in a separate PR: #80242).