Skip to content

Allocate memory more efficiently when packing objects#5170

Merged
ethomson merged 1 commit intolibgit2:masterfrom
bk2204:packbuilder-efficient-realloc
Jul 17, 2019
Merged

Allocate memory more efficiently when packing objects#5170
ethomson merged 1 commit intolibgit2:masterfrom
bk2204:packbuilder-efficient-realloc

Conversation

@bk2204
Copy link
Contributor

@bk2204 bk2204 commented Jul 17, 2019

The packbuilder code allocates memory in chunks. When it needs to allocate, it tries to add 1024 to the number of objects and multiply by 3/2. However, it actually multiplies by 1 instead, since it performs an integral division in the expression "3 / 2" and only then multiplies by the increased number of objects.

The current behavior causes the code to waste massive amounts of time copying memory when it reallocates, causing inserting all non-blob objects in the Linux repository into a new pack to take some indeterminate time greater than 5 minutes instead of 52 seconds. (I was impatient and didn't want to wait much longer than 5 minutes for my job to run.)

Correct this error by first dividing by two, and only then multiplying by 3. We still check for overflow for the multiplication, which is the only part that can overflow. This appears to be the only place in the code base which has this problem.

There are no tests for this because the only consequence of this change is a performance improvement: the old code would have been equally functional, just much slower.

The packbuilder code allocates memory in chunks. When it needs to
allocate, it tries to add 1024 to the number of objects and multiply by
3/2. However, it actually multiplies by 1 instead, since it performs an
integral division in the expression "3 / 2" and only then multiplies by
the increased number of objects.

The current behavior causes the code to waste massive amounts of time
copying memory when it reallocates, causing inserting all non-blob
objects in the Linux repository into a new pack to take some
indeterminate time greater than 5 minutes instead of 52 seconds.

Correct this error by first dividing by two, and only then multiplying
by 3. We still check for overflow for the multiplication, which is the
only part that can overflow. This appears to be the only place in the
code base which has this problem.
@ethomson
Copy link
Member

Ouch. Nice catch. That's almost certainly my bad, from when we added the overflow bits, my rather mechanical transformations screwed this up. :/

@ethomson
Copy link
Member

The current behavior causes the code to waste massive amounts of time copying memory when it reallocates, causing inserting all non-blob objects in the Linux repository into a new pack to take some indeterminate time greater than 5 minutes instead of 52 seconds.

Oh also, to clarify: with this PR it takes 52 seconds to repack? Or 52 seconds is the ideal git.git implementation time?

@bk2204
Copy link
Contributor Author

bk2204 commented Jul 17, 2019

Oh also, to clarify: with this PR it takes 52 seconds to repack? Or 52 seconds is the ideal git.git implementation time?

With this PR, just inserting the non-blob objects (i.e., calling git_packbuilder_insert on approximately 4710400 objects) takes 52 seconds. The actual pack process takes much longer, but I'll be working on some other potentially expensive areas as well to reduce the cost. Part of this is due to the fact that libgit2 doesn't reuse deltas, but there's other low-hanging fruit to get as well.

@ethomson
Copy link
Member

With this PR, just inserting the non-blob objects (i.e., calling git_packbuilder_insert on approximately 4710400 objects) takes 52 seconds.

👍 Just wanted to make sure that I understood correctly.

Thanks for the fix.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants