Checksums: use blocks to read the files#836
Conversation
When calculating the checksums of a large file, we need to read it into blocks (currently 16MB) to keep memory usage acceptable.
There was a problem hiding this comment.
this would be nicer if it was just on the next line...(starting at the (")
There was a problem hiding this comment.
"parents", while you are cleaning up stuff.
|
Looks OK to me. |
There was a problem hiding this comment.
split off the hash functions from the dicts/lambda's. too much duplicated code
try:
import hashlib
md5_func=hashlib.md5
sha1_func=hashlib.sha1
except ImportError:
import md5, sha
md5_func=md5.md5
sha1_func=sha.sha
CHECKSUM_FUNCTIONS['md5'] = lambda p: calc_block_checksum(p, md5_func())
CHECKSUM_FUNCTIONS['sha1'] = lambda p: calc_block_checksum(p, sha1_func())
There was a problem hiding this comment.
do the try/except a bit higher, and all this in the dict above. this looks odd now.
There was a problem hiding this comment.
are you sure you need to call md5_func and sha1_func here? It seems like you just want to pass them along.
(did this pass the untit tests?)
There was a problem hiding this comment.
We call them to have a md5 or sha1 object to work with. It has passed the unit tests (the last one failed because the disk was full, @boegel was going to fix it).
There was a problem hiding this comment.
so better to call the md5_class and sha1_class then? (and i wasn't going to make any more remarks...)
There was a problem hiding this comment.
Fair point, the would be more correct.
Represents more correct what the variable is/does.
|
Thanks @wpoely86, thanks for the reviewing @itkovian, @stdweird, @JensTimmerman! |
Checksums: use blocks to read the files
We need to read a file in blocks to calculate the checksum because else, the memory blows up if you try to install icc.
This PR fixes it for sha1 and md5 but not yet for addler32 or crc32. I will try to fix them when I find the time.