Conversation
8feee4e to
77ab747
Compare
|
https://github.com/pypa/packaging/actions/runs/22794380340?pr=1106 I think you’re comparing an old run branch somehow. And improving specifiers won’t make everything, like versions, faster. :) |
|
Okay, that's much more in line with what I expected, maybe even a bit worse. Marginal, maybe no improvement. I think I should close this, but I won't rush that in case you think it might be worth salvaging. |
|
Have you tried deleting html and results and rerunning? |
|
(And yes, I think a readability focus for this PR might be salvageable) |
|
I just cleared results and did a fresh run, and my results are 10% worse than Looking at this for readability, I like the regex changes -- I think those lookbehinds are harder to read than this version -- but I don't love the string slicing. I'll try to think if there's a way to make that tidier. |
77ab747 to
7e9aafb
Compare
|
I think this may actually be a small perf gain, but not big enough to show up in the CI benchmarking. I rebased to grab the latest changes, shut down most non-essential applications, and ran There are no results with a >10% change (the ASV default), but with a more sensitive threshold of 5%, we see some info: I'm starting to actually believe these numbers. The results for the Specifier constructor benchmark, across all Pythons, show a small improvement on the latest Pythons and an even smaller slowdown on 3.8 and 3.9. That's consistent with some of the behavior I've seen with string operations which have fast-path implementations in newer Pythons. (I think this is part of Faster CPython, but it might just be the ordinary improvement process for CPython.) results for just the Specifier constructorall benchmarks from this runThis has been great for me in terms of playing with |
|
Looking at the latest merge with master (which did affect the benchmarks) and rearranging in order of Python version so it's easier to make sense of: I could convince myself that it's a hair slower on Python 3.10 and below and a hair faster on Python 3.13+, but it's tough. But I'm personally terrified of regexes, so I'm going to deffer to @henryiii on if the regex itself is better (for some definition of the sense better, such as more maintainable). |
|
I ran it locally:
I'd say that's an improvement, even before 3.13 (but mostly after). |
henryiii
left a comment
There was a problem hiding this comment.
IMO, this seems easier to reason about.
|
I think I want to move that I'm double checking the benchmarks now (I realized moving it might have surprising impact). EDIT: Actually, I went back and decided against it, although my testing shows it has no perf impact. Here's the diff I had in mind: if spec.startswith("==="):
- operator, version = spec[:3], spec[3:].strip()
+ operator, version = spec[:3], spec[3:]
elif spec.startswith(("~=", "==", "!=", "<=", ">=")):
- operator, version = spec[:2], spec[2:].strip()
+ operator, version = spec[:2], spec[2:]
else:
- operator, version = spec[:1], spec[1:].strip()
+ operator, version = spec[:1], spec[1:]
- self._spec: tuple[str, str] = (operator, version)
+ self._spec: tuple[str, str] = (operator, version.strip())This seemed like "simple cleanup", but I didn't really like that it assigns to I felt like this is a neutral change, not really a big improvement to readability but also not harmful. If someone else feels strongly that the version with only one call to |
|
I like the individual strips a little better. But pretty neutral. |
Examining the specifier regex, two potential optimizations seem worthwhile: 1. Negative lookbehinds can be eliminated if we combine the operator regex with the version regex -- this avoids crawling the same characters of a string forwards and backwards. 2. Do more string operations, like `strip()` rather than having the regex engine do this work. These ideas are somewhat combined, in that getting rid of the lookbehinds only really works by eliminating the group selectors in use. And removing the group selectors in the regex requires that we do more string operations. This also interestingly results in a regex stored in the `Specifier` which matches its secondary usage in `_tokenizer`, which was previously very slightly misaligned (in that the group captures in the regex were unused).
8c886c3 to
6a2a044
Compare
Examining the specifier regex, two potential optimizations seem worthwhile: 1. Negative lookbehinds can be eliminated if we combine the operator regex with the version regex -- this avoids crawling the same characters of a string forwards and backwards. 2. Do more string operations, like `strip()` rather than having the regex engine do this work. These ideas are somewhat combined, in that getting rid of the lookbehinds only really works by eliminating the group selectors in use. And removing the group selectors in the regex requires that we do more string operations. This also interestingly results in a regex stored in the `Specifier` which matches its secondary usage in `_tokenizer`, which was previously very slightly misaligned (in that the group captures in the regex were unused).
I'm not totally sure I trust the results I'm getting from benchmarking.
These results seem too good for the modest changes included here, so I'm concerned that I've somehow run the benchmarks incorrectly.
However, it's showing a consistent improvement, so I think this is worth submitting for further evaluation.
Examining the specifier regex, two potential optimizations seem
worthwhile:
Negative lookbehinds can be eliminated if we combine the operator
regex with the version regex -- this avoids crawling the same
characters of a string forwards and backwards.
Do more string operations, like
strip()rather than having theregex engine do this work.
These ideas are somewhat combined, in that getting rid of the lookbehinds
only really works by eliminating the group selectors in use. And removing
the group selectors in the regex requires that we do more string
operations.
This also interestingly results in a regex stored in the
Specifierwhich matches its secondary usage in
_tokenizer, which was previouslyvery slightly misaligned (in that the group captures in the regex were
unused).
Running the benchmarks from #1059 on this, I get some surprisingly good results.
We expect this to be mostly noticeable with the
Specifierconstructor benchmark, and that does show results:So this makes this branch look much better, especially on newer Pythons, and no regression (but no real improvement) in 3.8 and 3.9.
Many of the higher level tests show similar/related improvement. I think the largest ratio in my runs is this one:
full asv results