made ascii string encoding faster#101777
Conversation
6e661f0 to
95c6e0d
Compare
|
It looks like this pull request may not have tests. Please make sure to add tests before merging. If you need an exemption to this rule, contact Hixie on the #hackers channel in Chat (don't just cc him here, he won't see it! He's on Discord!). If you are not sure if you need tests, consider this rule of thumb: the purpose of a test is to make sure someone doesn't accidentally revert the fix. Ask yourself, is there anything in your PR that you feel it is important we not accidentally revert back to how it was before your fix? Reviewers: Read the Tree Hygiene page and make sure this patch meets those guidelines before LGTMing. |
There was a problem hiding this comment.
I though dart utf8 encoding already had a fast path for this?
There was a problem hiding this comment.
Yep, but here are the reasons I believe we are getting faster results:
- We are removing an extra copy of the data: https://github.com/dart-lang/sdk/blob/56035a7df0bb26a6babce53fae21d46263f3bf26/sdk/lib/convert/utf.dart#L115
- It inlines the logic
- We are removing bounds checks https://github.com/dart-lang/sdk/blob/56035a7df0bb26a6babce53fae21d46263f3bf26/sdk/lib/convert/utf.dart#L98
There was a problem hiding this comment.
Here's another bounds check we were able to remove: https://github.com/dart-lang/sdk/blob/56035a7df0bb26a6babce53fae21d46263f3bf26/sdk/lib/convert/utf.dart#L201
There was a problem hiding this comment.
Dart's UTF8 encoder main loop is: https://github.com/dart-lang/sdk/blob/main/sdk/lib/convert/utf.dart#L197
For strings containing only ASCII Dart should just take each code unit and copy it into the output byte buffer, which is similar to what this loop is doing. I'm not sure why there would be a noticeable performance difference between Dart's encoder and this PR.
There was a problem hiding this comment.
Ahh I see. Darts encoder guarantees that the Uint8List it gives you is not just a view of some larger buffer somewhere else, but we don't need to worry about that since we're immediately writing this into another one.
Makes sense!
There was a problem hiding this comment.
@jason-simmons that sublist at the end is copying the data though, that might be the biggest difference?
There was a problem hiding this comment.
That makes sense - the typed data sublist is doing a memcpy. But this encoder can avoid that by writing the ASCII part and the non-ASCII part to the WriteBuffer as two separate chunks.
|
Dart also provides However, that probably won't work for |
|
In theory you could arrange things such that we accumulate the utf8 bits after leaving a spot for a length, while measuring the length, then go back and write it in. |
There was a problem hiding this comment.
If we're going to do this ourselves we should have a wide variety of unit tests to ensure that we cover both ascii/utf8 sufficently to ensure that the data is not corrupted
There was a problem hiding this comment.
We already have those tests written here: https://github.com/flutter/flutter/blob/e6f302289014371326e480b293779827da0c81d5/packages/flutter/test/services/message_codecs_test.dart#L213:L213
There was a problem hiding this comment.
Do you feel like those are sufficient?
There was a problem hiding this comment.
The new code has complete test coverage. Every line is exercised by a test. I can't imagine another test or input that would exercise it differently.
There was a problem hiding this comment.
If you feel that is sufficient, then that is fine. I also don't think you need a test exemption since you updated the benchmark, right?
We are using variable width sizes so you can't know how much space to reserve, except you could probably just choose the max size (5 bytes). You'd have to double check that the decoders would support that. |
Co-authored-by: Jonah Williams <[email protected]>
a799b72 to
6d5551d
Compare
In local testing this made the StandardMessageCodec_string benchmark go from 0.51338 µs to 0.34857µs (33% decrease).
Don't land before #101767
Test coverage already exists, this is just a performance change.
Pre-launch Checklist
///).If you need help, consider asking for advice on the #hackers-new channel on Discord.