Add efficient primitives for str.strip().#18742
Conversation
JukkaL
left a comment
There was a problem hiding this comment.
Thanks for the PR! Looks good overall, just a few things I noticed. Can you run some microbenchmarks to make sure the performance impact is as expected?
mypyc/test-data/run-strings.test
Outdated
| assert repr("foobar\x00\xab\ud912\U00012345") == r"'foobar\x00«\ud912𒍅'" | ||
|
|
||
| [case testStrip] | ||
| # This is a negative test. strip variants without args does not use efficient primitives. |
There was a problem hiding this comment.
Is this comment out of date / incorrect?
There was a problem hiding this comment.
It was out of date. Removed it.
| s = "xxb2yy" | ||
| assert s.lstrip("xy") == "b2yy" | ||
| assert s.strip("xy") == "b2" | ||
| assert s.rstrip("xy") == "xxb2" |
There was a problem hiding this comment.
Can you test all string kinds and different character code ranges, such as these (and mixing these):
- Character codes between 128 and 255 (0x80 to 0xff)
- Character codes between 256 and 65535 (0x100 to 0xffff)
- Character codes 65536+ (0x10000+)
mypyc/lib-rt/str_ops.c
Outdated
| #include <Python.h> | ||
| #include "CPy.h" | ||
|
|
||
| // Copied from cpython.git:Objects/unicodeobject.c. |
There was a problem hiding this comment.
Please mention the Python version or commit date from where this is from.
| BLOOM_MASK sepmask; | ||
| Py_ssize_t seplen; | ||
|
|
||
| kind = PyUnicode_KIND(self); |
There was a problem hiding this comment.
I think you need to call PyUnicode_READY on self and sepobj (and check the return values) on 3.9 at least, before you can call PyUnicode_KIND etc. You can check the relevant function in Python 3.9 to see how it's used.
| // Copied from do_strip function in cpython.git/Objects/unicodeobject.c. | ||
| PyObject *_CPyStr_Strip(PyObject *self, int strip_type, PyObject *sep) { | ||
| if (sep == NULL || sep == Py_None) { | ||
| Py_ssize_t len, i, j; |
There was a problem hiding this comment.
Similar to above, I think you'll need to call PyUnicode_READY.
* Fixing code comments. * Adding tests with more unicode chars. * Adding commit ID for code copied from cpython.git.
I have pushed a new commit that addresses review comments. |
JukkaL
left a comment
There was a problem hiding this comment.
Thanks for updates! The performance improvement is great.
Fixes mypyc/mypyc#1090.
Copying cpython implementation for strip, lstrip and rstrip to
str_ops.c.