Skip to content

bpo-47067: Add vectorcall for gaobject#31996

Merged
sweeneyde merged 7 commits intopython:mainfrom
penguin-wwy:ga_vectorcall
Mar 21, 2022
Merged

bpo-47067: Add vectorcall for gaobject#31996
sweeneyde merged 7 commits intopython:mainfrom
penguin-wwy:ga_vectorcall

Conversation

@penguin-wwy
Copy link
Contributor

@penguin-wwy penguin-wwy commented Mar 19, 2022

@corona10
Copy link
Member

FYI, we decided to add caching mechanism instead of applying vector call. cc @sweeneyde
see #19677 (comment)

@corona10
Copy link
Member

corona10 commented Mar 20, 2022

@sweeneyde
To follow @gvanrossum 's decision, I will assign the PR to you :)
I am +1 about improving CPython performance if possible, so I have the same opinion with Guido too.

ref: https://bugs.python.org/issue47067#msg415616

@sweeneyde
Copy link
Member

sweeneyde commented Mar 20, 2022

BTW, I replicated some speedup with pyperf commmands like

./python -m pyperf timeit -s "D = dict[str, int]" "D(a=1, b=2)" --duplicate 10 -o "dict[str, int](a=1, b=2).json"

Some results:

dict[str, int](a=1, b=2)      # 359 ns +- 6 ns -> 280 ns +- 3 ns: 1.29x faster
list[int](())                 # 259 ns +- 3 ns -> 246 ns +- 5 ns: 1.05x faster
MappingProxyType[str, int](d) # 273 ns +- 13 ns -> 261 ns +- 4 ns: 1.05x faster

class A:
    def __init__(self, a, b):
        pass
G = GenericAlias(A, int)
G(1, 2)                       # 198 ns +- 6 ns -> 190 ns +- 4 ns: 1.04x faster

Copy link
Member

@sweeneyde sweeneyde left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some small code style / PEP 7 nits

@bedevere-bot
Copy link

A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated.

Once you have made the requested changes, please leave a comment on this pull request containing the phrase I have made the requested changes; please review again. I will then notify any core developers who have left a review that you're ready for them to take another look at this pull request.

@gvanrossum
Copy link
Member

BTW, I replicated some speedup with pyperf commmands like

./python -m pyperf timeit -s "D = dict[str, int]" "D(a=1, b=2)" --duplicate 10 -o "dict[str, int](a=1, b=2).json"

Some results:

dict[str, int](a=1, b=2)      # 359 ns +- 6 ns -> 280 ns +- 3 ns: 1.29x faster
list[int](())                 # 259 ns +- 3 ns -> 246 ns +- 5 ns: 1.05x faster
MappingProxyType[str, int](d) # 273 ns +- 13 ns -> 261 ns +- 4 ns: 1.05x faster

class A:
    def __init__(self, a, b):
        pass
G = GenericAlias(A, int)
G(1, 2)                       # 198 ns +- 6 ns -> 190 ns +- 4 ns: 1.04x faster

So what does this speed up -- the dict[str, int] part, or the ...(a=1, b=2) part? (Also, the latter is probably not the fastest way to create a dict. The 5% faster results are more in line with my expectations.

@sweeneyde
Copy link
Member

...(a=1, b=2)

The D(a=1, b=2) part is what I benchmarked, for all of those types.

@gvanrossum
Copy link
Member

If we’re now changing Makefile this PR seems to have strayed from the topic?

@sweeneyde
Copy link
Member

If we’re now changing Makefile this PR seems to have strayed from the topic?

I agree, we should probably keep the vectorcall changes for now, then explore global strings in another PR.

@kumaraditya303
Copy link
Contributor

Just FYI the order of execution of deep freeze and global string generator matter so don't change it in this PR.

@penguin-wwy
Copy link
Contributor Author

I have made the requested changes; please review again :)

@bedevere-bot
Copy link

Thanks for making the requested changes!

@sweeneyde: please review the changes made to this pull request.

@bedevere-bot bedevere-bot requested a review from sweeneyde March 21, 2022 05:09
@sweeneyde
Copy link
Member

Thanks for the changes, this is looking close.

I am still not convinced that adding ga_make_tp_call is worth it.

  • It eliminates a few branches/C-calls, but doesn't eliminate an allocation or much data movement like ga_vectorcall does.
  • Most other places in the code base, there is at most one vectorcall function per type, which is either present or NULL, so there's an argument to maintain consistency.
  • This is not a particularly hot function: list[int], dict[str, int], set[int], etc, all use the ga_vectorcall codepath, and GenericAlias(MyPythonClass, some_type)() will generally have lots of other overhead already.
  • It requires using the private function _PyObject_MakeTpCall and grabbing the ThreadState, making the implementation less obvious and more tightly coupled.
  • Three code paths is more complex than two.

@penguin-wwy
Copy link
Contributor Author

In my mind, ga_make_tp_call will have an essential effect on tp_call, which still only needs to be packed once.

It requires using the private function _PyObject_MakeTpCall and grabbing the ThreadState, making the implementation less obvious and more tightly coupled.

For the purpose of simplicity and decoupling, it is appropriate to delete it :)

Copy link
Member

@sweeneyde sweeneyde left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very close! One last thing.

@bedevere-bot
Copy link

A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated.

Once you have made the requested changes, please leave a comment on this pull request containing the phrase I have made the requested changes; please review again. I will then notify any core developers who have left a review that you're ready for them to take another look at this pull request.

Copy link
Member

@sweeneyde sweeneyde left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@sweeneyde sweeneyde merged commit 1ea055b into python:main Mar 21, 2022
@sweeneyde
Copy link
Member

The string change seems to have really improved the microbenchmarks. Before/after this PR:

./python -m pyperf timeit -s "D = dict[str, int]" "D(a=1, b=2)" --duplicate 10 -o "dictbench.json" --rigorous
Mean +- std dev: [./dictbench.json] 366 ns +- 6 ns -> [../ga_vectorcall/dictbench.json] 202 ns +- 2 ns: 1.81x faster

./python -m pyperf timeit -s "L = list[int]" "L(())" --duplicate 10 -o "listbench.json" --rigorous
Mean +- std dev: [./listbench.json] 258 ns +- 5 ns -> [../ga_vectorcall/listbench.json] 168 ns +- 4 ns: 1.54x faster

./python -m pyperf timeit -s "from types import GenericAlias; PyClass = GenericAlias(type('A', (), {}), int)" "PyClass()" --duplicate 10 -o "pyclassbench.json" --rigorous
Mean +- std dev: [./pyclassbench.json] 153 ns +- 3 ns -> [../ga_vectorcall/pyclassbench.json] 75.9 ns +- 0.9 ns: 2.01x faster

I also checked for refleaks just for fun, and ./python -m test test_types test_typing test_genericalias -R3:3 passed.

Thanks for the contribution!

@gvanrossum
Copy link
Member

Great results! In the end, how was the "make regen-all" issue resolved?

@sweeneyde
Copy link
Member

Great results! In the end, how was the "make regen-all" issue resolved?

I don't think it was solved, but ignored for the sake of this particular PR. I added a comment to https://bugs.python.org/issue46712 inquiring about it

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants