Skip to content

JIT: PGO-based block reordering interferes with loop recognition #67318

@AndyAyersMS

Description

@AndyAyersMS

We currently rearrange the flow graph based on profile weights before we perform loop recognition. Since this recognition is lexical, rearrangement can impair loop recognition.

Here's a simple example:

;; COMPlus_TieredPGO=1
;; COMPlus_TC_QuickJitForLoops=0
;; (optionally use release runtime, as checked runtimes now change tiering parameters to cause tiering up sooner)

using System;
using System.Runtime.CompilerServices;
using System.Threading;

interface I
{
    public int F(int x, int y);
}

class Add : I 
{
    int I.F(int x, int y) => x + y;
}

class Mul : I
{
    int I.F(int x, int y) => x * y;
}

class X
{
    [MethodImpl(MethodImplOptions.NoInlining)]
    public static int MapF(I m, int[] xs, int[] ys, int from, int to)
    {
        int r = 0;
        for (int i = from; i < to; i++)
        {
            r += m.F(xs[i], ys[i]);
        }
        return r;
    }

    public static int Main()
    {
        int[] xs = new int[] { 1, 2, 3, 4 };
        int[] ys = new int[] { 4, 3, 2, 1 };
        I m = new Add();
        I mm = new Mul();

        int r = 0;

        // comment these two out to create a heavily biased GDV
        r += MapF(mm, xs, ys, 0, 3);
        r -= MapF(mm, xs, ys, 0, 3);

        for (int i = 0; i < 30; i++)
        {
            r += MapF(m, xs, ys, 0, 3);
            Thread.Sleep(15);
        }

        Thread.Sleep(50);

        for (int i = 0; i < 70; i++)
        {
            r += MapF(m, xs, ys, 0, 3);
        }

        return r / 20;
    }
}

As is we produce the flow graph on the left; with the two MapF(mm, ...) calls commented out the graph on the right:

Because of this reordering, loop recognition arrives at very different results, even though the flow graphs are isomorphic:

;;;;;;;;;;;;;;;;; left example

*************** Starting PHASE Find loops
*************** In optFindLoops()
*************** In optMarkLoopHeads()
1 loop heads marked
*************** In optFindNaturalLoops()
New loop epoch 1
L00: setting LPFLG_ITER
Recorded loop L00, from BB02 to BB05 (Head=BB01, Entry=BB02, Exit=BB05) [over V06 (ADD 1) LT V04]

***************  Natural loop table
L00, from BB02 to BB05 (Head=BB01, Entry=BB02, Exit=BB05) [over V06 (ADD 1) LT V04]

*************** In optFindAndScaleGeneralLoopBlocks()

Marking a loop from BB02 to BB05
    BB02(wt=300); unchanged: has profile weight
    BB03(wt=150); unchanged: has profile weight
    BB04(wt=150); unchanged: has profile weight
    BB05(wt=300); unchanged: has profile weight
Found a total of 1 general loops.
Skip alignment for L00 that starts at BB02 weight=300.
*************** In fgDebugCheckLoopTable

*************** Finishing PHASE Find loops

;;;;;;;;;;;;;;;;; right example

*************** Starting PHASE Find loops
*************** In optFindLoops()
*************** In optMarkLoopHeads()
2 loop heads marked
*************** In optFindNaturalLoops()
*************** In optFindAndScaleGeneralLoopBlocks()

Marking a loop from BB02 to BB04
    BB02(wt=300); unchanged: has profile weight
    BB03(wt=300); unchanged: has profile weight
    BB04(wt=300); unchanged: has profile weight
Marking a loop from BB04 to BB06
    BB04(wt=300); unchanged: has profile weight
    BB05(wt=100); unchanged: has profile weight
    BB06(wt=0); unchanged: has profile weight
Found a total of 2 general loops.
*************** In fgDebugCheckLoopTable

*************** Finishing PHASE Find loops

We actually see a high volume of monomorphic call sites with PGO, meaning we will quite often be seeing profiles like the one at right where the GDV test is highly biased and the residual indirect call site is cold.

Seems like we should look into deferring block reordering until after optimization.

cc @BruceForstall @EgorBo

Metadata

Metadata

Assignees

Labels

area-CodeGen-coreclrCLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions