Optimize integer zset scores in listpack (converting to string and back) by oranagra · Pull Request #10486 · redis/redis

oranagra · 2022-03-28T12:17:53Z

When the score doesn't have fractional part, and can be stored as an integer,
we use the integer capabilities of listpack to store it, rather than convert it to string.
This already existed before this PR (lpInsert dose that conversion implicitly).

But to do that, we would have first converted the score from double to string (calling d2string),
then pass the string to lpAppend which identified it as being an integer and convert it back to an int.
Now, instead of converting it to a string, we store it using lpAppendInteger`.

Unrelated:

Fix the double2ll range check (negative and positive ranges, and also the comparison operands
were slightly off. but also, the range could be made much larger, see comment).
Unify the double to string conversion code in rdb.c with the one in util.c
Small optimization in lpStringToInt64, don't attempt to convert strings that are obviously too long.

Benchmark;

Up to 20% improvement in certain tight loops doing zzlInsert with large integers.
(if listpack is pre-allocated to avoid realloc, and insertion is sorted from largest to smaller)

sundb

LGTM

zuiderkwast

Good idea.

zuiderkwast · 2022-03-30T16:56:18Z

+        double min = -4503599627370495; /* (2^52)-1 */
+        double max = 4503599627370496; /* -(2^52) */


I think the comments for min and max are swapped.

the comments are wrong, but i'm also uncertain the values are correct.
in integer's 2's complement, the negative side is the larger one (unlike this code).
however, doubles have a distinct sign bit, so i think their range to the positive and negative is the same.
i think the numbers should be 4503599627370495 and -4503599627370495

but i also think this check doesn't necessarily have to be here, and can't cause much harm.
i.e. i suppose we can also safely keep a value of 4503599627370495*4 as long without precision loss (but not *3).
and also, even if the range check would have been too big (allowing some values that can't be stored in long), the later conversion check is what actually decides, so this range check is just an optimization to avoid casting overheads?

see my last commit and the updated top comment.

The same changes need to be made for https://github.com/redis/redis/blob/b8eb2a73408fa3b8845760857dd6fcccb62107fe/src/rdb.c#L601:L602

Since doubles have distict sign bit, the range should be +/- ((2^52)-1) inclusive. Note that this is just a speedup to avoid the casting check, which is what actually matters.

Co-authored-by: Viktor Söderqvist <[email protected]>

sundb · 2022-04-02T08:59:09Z

Using the following test code shows that performance is indeed improved.

Here are a few test optimizations to avoid additional performance consumption

Pre-expansion listpack, avoid listpack realloc.
Insert scores from largest to smallest, so that entries are always inserted at the head of the listpack, avoiding searching the whole listpack.

max score	min score	time consume (unstable)	time consume (this PR)	performance enhancement
4503599627370494	4503599627370494 - 128	37.98 s	32.73	+1.16%
-4503599627370494 + 128	-4503599627370494	38.39 s	33.61 s	+1.14%

Test codes

long long start = ustime();
for (int j = 0; j < 1000000; j++) {
    unsigned char *lp = lpNew(2183);
    for (long long i = 4503599627370494; i > 4503599627370494 - 128; i--) {
        sds ele = sdscatfmt(sdsempty(), "key%I", i);
        lp = zzlInsert(lp, ele, i);
        sdsfree(ele);
    }
    zfree(lp);
}
printf("consume: %lld \n", ustime() - start);

When we have a double score and we know we cannot convert it to an integer, we convert it to a string, but then listpack was again trying to convert that string to an integer. Now we have an lpInsertDouble API that avoids these overheads. Note: i'm not happy with the result, specifically the interface of lpInsert (i.e. the NOT_INT part) Unrelated: improve lpStringToInt64 not to try converting long strings to integers.

oranagra · 2022-04-05T19:22:24Z

i pushed a commit that avoids the extra string to integer conversion attempt, didn't try to benchmark it yet.
also, i'm not happy with the result (specifically the interface of lpInsert), maybe someone has an idea...

p.s. the right solution to all of that is probably to add native double encoding for listpack (i.e. save the actual double bits), but that's complicated, and i'm not certain we have spare bits in the header to denote that type.

oranagra · 2022-04-06T09:15:27Z

benchmark using the code from #10486 (comment)

test	lpInsertDouble	lpInsertString / Integer	unstable
time	27999859	28029440	33364418
improvement vs unstable	1.1915 (~20%)	1.1903 (~20%)	1.0

this is expected since the last commit attempts to resolve issues from when fractional parts are used, but for some reason when i try to measure the same with fractional numbers, i don't see any difference between the 3 options.

sundb · 2022-04-06T11:20:28Z

benchmark using the code from #10486 (comment)

test lpInsertDouble lpInsertString / Integer unstable
time 27999859 28029440 33364418
improvement vs unstable 1.1915 (~20%) 1.1903 (~20%) 1.0
this is expected since the last commit attempts to resolve issues from when fractional parts are used, but for some reason when i try to measure the same with fractional numbers, i don't see any difference between the 3 options.

https://github.com/redis/redis/blob/3e09a8c09770dfbe5c8a1c3d2ebc4599448ba7c4/src/util.c#L591
When using fractional numbers, this line of code will consume most of the CPU,
resulting in other optimizations that are not as obvious.

oranagra · 2022-04-06T11:46:37Z

so maybe i should revert my last commit.
the optimization it brings of avoiding an attempt to convert a string to integer, isn't relevant in that code flow, and it does mess up the code / interfaces a bit.

sundb · 2022-04-07T02:02:50Z

@oranagra Perhaps we can reserve lp*double(), we can store the bits of the double directly into the listpack in the future.

oranagra · 2022-04-07T13:04:50Z

i don't see a point in reserving the API if we don't have the code to back it. we can add that later again if needed.
currently it looks ugly. so unless i can find that it helps with performance, i'd rather drop it.

sundb · 2022-04-14T02:39:33Z

I made an attempt to store double in listpack, it's really complicated, not in the storage, but in the interface of listpack (lpGet, lpGetValue, lpFind) will be changed dramatically, and eventually the caller (hash, zsortset, list, stream) will need to handle double encoding(a lot of changes), but it only improves zsortset.
It also has the side effect that when replying the score of zsortset, it will need to be converted to a string again using snprintf.

zuiderkwast · 2022-04-14T09:21:34Z

Good job @sundb. So I guess the conclusion is that it's not worth it, storing IEEE doubles in listpack.

Maybe we can optimize double <---> string conversion instead? There are some libraries that are several times faster than snprinf and strtod.

For snprintf, there's the Grisu2 and Grisu3 algorithms. Paper: https://www.cs.tufts.edu/~nr/cs257/archive/florian-loitsch/printf.pdf, implementation (C++, 3-clause BSD): https://github.com/google/double-conversion. I haven't found a plain C implementation yet.

For strtod, there's https://github.com/fastfloat/fast_float and https://github.com/lemire/fast_double_parser and blog posts like https://lemire.me/blog/2020/03/10/fast-float-parsing-in-practice/. Daniel Lemire has a lot of papers and blog posts on these topics.

sundb · 2022-04-14T10:00:08Z

@zuiderkwast You remind me of #8825

filipecosta90 · 2022-04-14T16:59:56Z

Good job @sundb. So I guess the conclusion is that it's not worth it, storing IEEE doubles in listpack.

Maybe we can optimize double <---> string conversion instead? There are some libraries that are several times faster than snprinf and strtod.

For snprintf, there's the Grisu2 and Grisu3 algorithms. Paper: https://www.cs.tufts.edu/~nr/cs257/archive/florian-loitsch/printf.pdf, implementation (C++, 3-clause BSD): https://github.com/google/double-conversion. I haven't found a plain C implementation yet.

For strtod, there's https://github.com/fastfloat/fast_float and https://github.com/lemire/fast_double_parser and blog posts like https://lemire.me/blog/2020/03/10/fast-float-parsing-in-practice/. Daniel Lemire has a lot of papers and blog posts on these topics.

@zuiderkwast WRT to #8825 I've opened a PR that uses a plain C implementation of grisu2. (tests are still failing and I need to address it but it would be of interest to have your opinion on it ). PR link: #10587

oranagra · 2022-04-17T13:00:44Z

@yossigo please review (see updated top comment). i'd like to merge this.

enjoy-binbin · 2022-04-18T02:05:33Z

+     * i.e. all double values in that range are representable as a long without precision loss,
+     * but not all long values in that range can be represented as a double.
+     * we only care about the first part here. */
+    if (d < -LLONG_MAX/2 || d > LLONG_MAX/2)


there is a warning, https://github.com/redis/redis/runs/6057010293?check_suite_focus=true#step:4:139
PR link: #10595

util.c:574:42: error: implicit conversion from 'long long' to 'double' changes value from 4611686018427387903 to 4611686018427387904 [-Werror,-Wimplicit-const-int-float-conversion] if (d < -LLONG_MAX/2 || d > LLONG_MAX/2)

There is a implicit conversion warning in clang: ``` util.c:574:23: error: implicit conversion from 'long long' to 'double' changes value from -4611686018427387903 to -4611686018427387904 [-Werror,-Wimplicit-const-int-float-conversion] if (d < -LLONG_MAX/2 || d > LLONG_MAX/2) ``` introduced in redis#10486 Co-authored-by: sundb <[email protected]>

There is a implicit conversion warning in clang: ``` util.c:574:23: error: implicit conversion from 'long long' to 'double' changes value from -4611686018427387903 to -4611686018427387904 [-Werror,-Wimplicit-const-int-float-conversion] if (d < -LLONG_MAX/2 || d > LLONG_MAX/2) ``` introduced in #10486 Co-authored-by: sundb <[email protected]>

…ck) (redis#10486) When the score doesn't have fractional part, and can be stored as an integer, we use the integer capabilities of listpack to store it, rather than convert it to string. This already existed before this PR (lpInsert dose that conversion implicitly). But to do that, we would have first converted the score from double to string (calling `d2string`), then pass the string to `lpAppend` which identified it as being an integer and convert it back to an int. Now, instead of converting it to a string, we store it using lpAppendInteger`. Unrelated: --- * Fix the double2ll range check (negative and positive ranges, and also the comparison operands were slightly off. but also, the range could be made much larger, see comment). * Unify the double to string conversion code in rdb.c with the one in util.c * Small optimization in lpStringToInt64, don't attempt to convert strings that are obviously too long. Benchmark; --- Up to 20% improvement in certain tight loops doing zzlInsert with large integers. (if listpack is pre-allocated to avoid realloc, and insertion is sorted from largest to smaller)

There is a implicit conversion warning in clang: ``` util.c:574:23: error: implicit conversion from 'long long' to 'double' changes value from -4611686018427387903 to -4611686018427387904 [-Werror,-Wimplicit-const-int-float-conversion] if (d < -LLONG_MAX/2 || d > LLONG_MAX/2) ``` introduced in redis#10486 Co-authored-by: sundb <[email protected]>

zset store score as integer in listpack when possible

ea8d8f8

oranagra requested review from sundb and yossigo March 28, 2022 12:18

sundb reviewed Mar 29, 2022

View reviewed changes

filipecosta90 added the action:run-benchmark Triggers the benchmark suite for this Pull Request label Mar 29, 2022

zuiderkwast reviewed Mar 30, 2022

View reviewed changes

oranagra added 2 commits March 31, 2022 09:24

Merge remote-tracking branch 'origin/unstable' into zset_listpack_int

cb3a882

Change the double2ll range check.

9e2b57e

Since doubles have distict sign bit, the range should be +/- ((2^52)-1) inclusive. Note that this is just a speedup to avoid the casting check, which is what actually matters.

zuiderkwast approved these changes Mar 31, 2022

View reviewed changes

yoav-steinberg reviewed Mar 31, 2022

View reviewed changes

Comment thread src/util.c Outdated

Comment thread src/util.c Outdated

zuiderkwast reviewed Mar 31, 2022

View reviewed changes

Comment thread src/util.c Outdated

avoid the mem write for fractional numbers

adc1150

Co-authored-by: Viktor Söderqvist <[email protected]>

sundb reviewed Apr 1, 2022

View reviewed changes

Comment thread src/util.c

typo

16c7613

sundb reviewed Apr 6, 2022

View reviewed changes

Comment thread src/listpack.c Outdated

revert lpInsertDouble work, didn't provide measurable impact

15d36ee

rdb.c to use double2ll. double2ll now handles bigger range as integers

3117d74

sundb reviewed Apr 14, 2022

View reviewed changes

Comment thread src/util.c Outdated

reduce range

b02c267

yossigo approved these changes Apr 17, 2022

View reviewed changes

oranagra changed the title ~~zset store score as integer in listpack when possible~~ Optimize integer zset scores in listpack (converting to string and back) Apr 17, 2022

oranagra merged commit 0c4733c into redis:unstable Apr 17, 2022

oranagra deleted the zset_listpack_int branch April 17, 2022 14:16

enjoy-binbin reviewed Apr 18, 2022

View reviewed changes

enjoy-binbin mentioned this pull request Apr 18, 2022

Fix long long to double implicit conversion warning #10595

Merged

oranagra mentioned this pull request Apr 27, 2022

Redis 7.0.0 #10652

Merged

		double min = -4503599627370495; /* (2^52)-1 */
		double max = 4503599627370496; /* -(2^52) */

Conversation

oranagra commented Mar 28, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Unrelated:

Benchmark;

Uh oh!

sundb left a comment

Choose a reason for hiding this comment

Uh oh!

zuiderkwast left a comment

Choose a reason for hiding this comment

Uh oh!

zuiderkwast Mar 30, 2022

Choose a reason for hiding this comment

Uh oh!

oranagra Mar 31, 2022

Choose a reason for hiding this comment

Uh oh!

oranagra Mar 31, 2022

Choose a reason for hiding this comment

Uh oh!

sundb Apr 1, 2022

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

sundb commented Apr 2, 2022

Uh oh!

oranagra commented Apr 5, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

oranagra commented Apr 6, 2022

Uh oh!

sundb commented Apr 6, 2022

Uh oh!

oranagra commented Apr 6, 2022

Uh oh!

sundb commented Apr 7, 2022

Uh oh!

oranagra commented Apr 7, 2022

Uh oh!

sundb commented Apr 14, 2022

Uh oh!

zuiderkwast commented Apr 14, 2022

Uh oh!

sundb commented Apr 14, 2022

Uh oh!

Uh oh!

filipecosta90 commented Apr 14, 2022

Uh oh!

oranagra commented Apr 17, 2022

Uh oh!

enjoy-binbin Apr 18, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

oranagra commented Mar 28, 2022 •

edited

Loading

oranagra commented Apr 5, 2022 •

edited

Loading

enjoy-binbin Apr 18, 2022 •

edited

Loading