Skip to content

gh-148171: Convert CALL_BUILTIN_FAST to leave inputs on the stack for refcount elimination in JIT#148172

Merged
Fidget-Spinner merged 13 commits intopython:mainfrom
Fidget-Spinner:pop_top_calls
Apr 10, 2026
Merged

gh-148171: Convert CALL_BUILTIN_FAST to leave inputs on the stack for refcount elimination in JIT#148172
Fidget-Spinner merged 13 commits intopython:mainfrom
Fidget-Spinner:pop_top_calls

Conversation

@Fidget-Spinner
Copy link
Copy Markdown
Member

@Fidget-Spinner Fidget-Spinner commented Apr 6, 2026

@Fidget-Spinner
Copy link
Copy Markdown
Member Author

Results are pretty good. 7% faster on call microbenchmark:

from _testcapi import meth_fastcall

def testfunc(n):
    x = 0
    for _ in range(n):
        meth_fastcall(1, 2, 3, 4, 5, 6, 7, 8, 9, 10)
    return x

testfunc(20000000)

Summary
  PYTHON_JIT_STRESS=1 ./python ../add_int.py ran
    1.07 ± 0.10 times faster than PYTHON_JIT_STRESS=1 ../cpython_jit_pgo_tc_0/python ../add_int.py
  

Co-authored-by: Kumar Aditya <[email protected]>
@markshannon
Copy link
Copy Markdown
Member

Rather than adding an extra stack item for all calls everywhere, I think it would make more sense to keep within stack bounds for this one uop.

The callable is almost always going to be immortal, and self_or_null will either by borrowed (for self) or NULL, so there are options to optimize away these values, leaving space for the result and the arguments.

For example:

op(_CALL_BUILTIN_FAST, (callable, self_or_null, args[oparg] -- res, s, a[oparg])) {
    ...
    PyStackRef_CLOSE(callable);
    ...

means we can optimize away the decrefs on self_or_null and args.

or, for function calls where we know self_or_null is NULL, we can just declare it dead:

op(_CALL_BUILTIN_FAST, (callable, self_or_null, args[oparg] -- res, c, a[oparg])) {
    ...
    DEAD(self_of_null);
    ...

assert(next->op.code == STORE_FAST);
operand = next->op.arg;
}
else if (uop == _POP_TOP_OPARG) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should be special casing as few instructions as possible in the front-end.

This can probably be better handled in the optimizer, as we might want to leave a single _POP_TOP_OPARG rather than a long series of _POP_TOPs.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok done,

@Fidget-Spinner
Copy link
Copy Markdown
Member Author

@markshannon I chose to swap out callable, as it's the most convenient location, and I suspect self_or_null has a higher hit rate of being a NULL or borrowed than callable.

// To maintain a consistent view of the stack in the GC, callable's stack location
// must be overwritten before we close it.
_PyStackRef *callable_ptr = args - 2;
PyStackRef_XSETREF(*callable_ptr, PyStackRef_NULL);
Copy link
Copy Markdown
Member

@markshannon markshannon Apr 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a bit strange. Does the code generator force you to do this?

Would this work?

    PyObject *res_o = _Py_BuiltinCallFast_StackRef(...)
    if (res_o == NULL) {
         ERROR_NO_POP();
    }
    s = self_or_null;
    DEAD(self_or_null);
    DEAD(args);
    res = PyStackRef_FromPyObjectSteal(res_o);
    PyStackRef_CLOSE(callable);

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I do that, the code generator shrinks the "real" stack to nothing, sets the stack pointer to before callable, and then calls the close.

This is incorrect, as we need the self_or_null and args to still be visible from the stack when closing callable, as on FT, the stack is a GC root.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does the generated code look like for what input?
I'm trying to establish if the code generator is buggy or not.
I think the problem might be that you're renaming args to a so the code generator thinks it is a new array.
IIRC though, if you change callable to res it will complain if you leave args as a.
So this might need a code generator fix.

This works, by telling the code generator that the stack layout is unchanged.

        op(_CALL_BUILTIN_FAST_EXPERIMENT, (callable, self_or_null, args[oparg] -- callable, self_or_null, args[oparg])) {
            /* Builtin METH_FASTCALL functions, without keywords */
            int total_args = oparg;
            _PyStackRef *arguments = args;
            if (!PyStackRef_IsNull(self_or_null)) {
                arguments--;
                total_args++;
            }
            STAT_INC(CALL, hit);
            PyObject *res_o = _Py_BuiltinCallFast_StackRef(
                callable,
                arguments,
                total_args
            );
            if (res_o == NULL) {
                ERROR_NO_POP();
            }
            _PyStackRef temp = callable;
            callable = PyStackRef_FromPyObjectSteal(res_o);
            PyStackRef_CLOSE(temp);
        }

Can you file an issue for this code generator fault?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I applied your fix and I'm impressed it works. I think the code generator isn't faulty. I did mark the top two items as DEAD so it shrinking the stack is right.

Copy link
Copy Markdown
Member

@markshannon markshannon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good.

One request for clarification, feel free to merge once that's addressed.

uops = get_opnames(ex)
self.assertIn("_CALL_BUILTIN_FAST", uops)
self.assertNotIn("_GUARD_CALLABLE_BUILTIN_FAST", uops)
self.assertGreaterEqual(count_ops(ex, "_POP_TOP_NOP"), 6)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why 6?
Can you either derive this number or explain it in a comment?

@Fidget-Spinner Fidget-Spinner merged commit 266247c into python:main Apr 10, 2026
74 checks passed
@Fidget-Spinner Fidget-Spinner deleted the pop_top_calls branch April 10, 2026 15:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants