[Arm64] Replace pairs of str with stp#85032
Conversation
|
Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch, @kunalspathak |
|
/azp run runtime-coreclr gcstress0x3-gcstress0xc |
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
/azp run runtime-coreclr gcstress0x3-gcstress0xc |
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
gcstress is green with few known failures. |
|
@dotnet/jit-contrib , @BruceForstall |
|
This diffs look great. Interesting there are some zero-size diffs where we convert str/stp to stp/str. fwiw, I noticed a case maybe we could do better in libraries.crossgen2 on win-arm64: => ? |
BruceForstall
left a comment
There was a problem hiding this comment.
Can you show any diffs (including GC diffs) where both addresses are GC lcl vars?
src/coreclr/jit/emitarm64.cpp
Outdated
| (immediate). | ||
| */ | ||
|
|
||
| void emitter::emitIns_SS_R_R_R_I(instruction ins, |
There was a problem hiding this comment.
Why is this not a variant or extension of emitIns_S_S_R_R?
In that case, no reg3 is passed because it is determined based on the lclvar. In this case, there are 2 lclvar. Presumably they would both have the same FPbased value, thus determining the same address base register. In fact, you could probably determine / assert the offsets are correct.
There was a problem hiding this comment.
I have referred the code from emitIns_R_R_R_I() instead which operates on vector registers as well. For example, it operates on scale of 2, 3 and 4, whereas emitIns_S_S_R_R() operates just on scale == 3. Also, it has special handling for useRegForAdr which I assume is just for the stack stores. And yes, the reg3 is always FP or SP.
|
I presume the MinOpts TP regression is due to extra checks related to instrDesc size, and possibly due to fewer constants fitting in a small instrDesc. |
correct. |
That is an interesting pattern coming from accessing 4~5 Unfortunately, the popular peephole optimization that we do today just check the last instruction. We could use the infrastructure that we have recently developed to see last N instructions and combine them, but honestly, I don't know if removing last N instructions is well tested and works. I would expect this optimization to happen higher up in morph or something, but should be tracked as a separate issue, although I am not sure how many of such patterns exist. |
src/coreclr/jit/emitarm64.cpp
Outdated
| regNumber reg3, | ||
| ssize_t imm, | ||
| int varx1, | ||
| int offs, |
There was a problem hiding this comment.
offs here is the offset within a local var. Since there are 2 lclvars, there should be two offsets: offs1, offs2.
There was a problem hiding this comment.
The 2nd offset would be just offs + TARGET_POINTER_SIZE so we don't need to pass it.
There was a problem hiding this comment.
That's not correct. The offset is within the local, not from the base register.
|
@BruceForstall - this should be ready to review again. |




Optimize pair of
strwithstpalong with tracking the gc information.Contributes to #55365 and #77010
Fixes: #35134, #35133 and #81278