gh-148171: Convert CALL_BUILTIN_FAST to leave inputs on the stack for refcount elimination in JIT#148172
Conversation
|
Results are pretty good. 7% faster on call microbenchmark: |
Co-authored-by: Kumar Aditya <[email protected]>
|
Rather than adding an extra stack item for all calls everywhere, I think it would make more sense to keep within stack bounds for this one uop. The callable is almost always going to be immortal, and For example: means we can optimize away the decrefs on or, for function calls where we know |
Python/optimizer.c
Outdated
| assert(next->op.code == STORE_FAST); | ||
| operand = next->op.arg; | ||
| } | ||
| else if (uop == _POP_TOP_OPARG) { |
There was a problem hiding this comment.
We should be special casing as few instructions as possible in the front-end.
This can probably be better handled in the optimizer, as we might want to leave a single _POP_TOP_OPARG rather than a long series of _POP_TOPs.
|
@markshannon I chose to swap out callable, as it's the most convenient location, and I suspect |
Python/bytecodes.c
Outdated
| // To maintain a consistent view of the stack in the GC, callable's stack location | ||
| // must be overwritten before we close it. | ||
| _PyStackRef *callable_ptr = args - 2; | ||
| PyStackRef_XSETREF(*callable_ptr, PyStackRef_NULL); |
There was a problem hiding this comment.
This is a bit strange. Does the code generator force you to do this?
Would this work?
PyObject *res_o = _Py_BuiltinCallFast_StackRef(...)
if (res_o == NULL) {
ERROR_NO_POP();
}
s = self_or_null;
DEAD(self_or_null);
DEAD(args);
res = PyStackRef_FromPyObjectSteal(res_o);
PyStackRef_CLOSE(callable);
There was a problem hiding this comment.
If I do that, the code generator shrinks the "real" stack to nothing, sets the stack pointer to before callable, and then calls the close.
This is incorrect, as we need the self_or_null and args to still be visible from the stack when closing callable, as on FT, the stack is a GC root.
There was a problem hiding this comment.
What does the generated code look like for what input?
I'm trying to establish if the code generator is buggy or not.
I think the problem might be that you're renaming args to a so the code generator thinks it is a new array.
IIRC though, if you change callable to res it will complain if you leave args as a.
So this might need a code generator fix.
This works, by telling the code generator that the stack layout is unchanged.
op(_CALL_BUILTIN_FAST_EXPERIMENT, (callable, self_or_null, args[oparg] -- callable, self_or_null, args[oparg])) {
/* Builtin METH_FASTCALL functions, without keywords */
int total_args = oparg;
_PyStackRef *arguments = args;
if (!PyStackRef_IsNull(self_or_null)) {
arguments--;
total_args++;
}
STAT_INC(CALL, hit);
PyObject *res_o = _Py_BuiltinCallFast_StackRef(
callable,
arguments,
total_args
);
if (res_o == NULL) {
ERROR_NO_POP();
}
_PyStackRef temp = callable;
callable = PyStackRef_FromPyObjectSteal(res_o);
PyStackRef_CLOSE(temp);
}
Can you file an issue for this code generator fault?
There was a problem hiding this comment.
I applied your fix and I'm impressed it works. I think the code generator isn't faulty. I did mark the top two items as DEAD so it shrinking the stack is right.
markshannon
left a comment
There was a problem hiding this comment.
Looks good.
One request for clarification, feel free to merge once that's addressed.
| uops = get_opnames(ex) | ||
| self.assertIn("_CALL_BUILTIN_FAST", uops) | ||
| self.assertNotIn("_GUARD_CALLABLE_BUILTIN_FAST", uops) | ||
| self.assertGreaterEqual(count_ops(ex, "_POP_TOP_NOP"), 6) |
There was a problem hiding this comment.
Why 6?
Can you either derive this number or explain it in a comment?
Uh oh!
There was an error while loading. Please reload this page.