concise summary of the changes from the original repository to the fork:
-
Improved type hinting and code structure in the
HuffmanCodingclass. -
Enhanced file handling with
withstatements for better resource management. -
Implemented more efficient bit-level encoding and decoding processes (this is the resolve for this issue on the original repo).
- Bit-level encoding:
- The new version processes bits directly instead of creating a long binary string.
- It accumulates bits into bytes, writing them to the file as soon as 8 bits are collected.
- This approach is more memory-efficient, especially for large files, as it doesn't need to hold the entire encoded data in memory.
- Optimized decoding:
-
The decoder now reads one byte at a time and processes it bit by bit.
-
It builds up the Huffman code incrementally, checking against the character table after each bit.
-
This method is more efficient as it doesn't require storing the entire binary string in memory.
-
Reduced intermediate data structures:
-
The new implementation eliminates the need for intermediate string representations of binary data.
-
This reduction in data conversion and manipulation leads to faster processing and lower memory usage.
- Direct byte manipulation:
- The code now works directly with bytes and bit operations, which is generally faster than string manipulations.
- It uses bitwise operations (shifting and masking) for efficient bit handling.
- Streamlined file writing and reading:
- The encoding process writes bytes as soon as they're ready, rather than accumulating all the data before writing.
- The decoding process reads and processes the input file in smaller chunks, improving memory efficiency for large files.
-
Added a minimum file size check (100 bytes) before applying compression.
-
Refactored
encode_file()anddecode_file()methods for better performance. -
Improved main script with clearer usage instructions and more informative output.
-
Added checks for compression effectiveness, reporting size increase if compression is unsuccessful.