Tags · iii-i/zlib

dfltcc-25-12-24

Add support for IBM Z hardware-accelerated deflate

IBM Z mainframes starting from version z15 provide DFLTCC instruction,
which implements deflate algorithm in hardware with estimated
compression and decompression performance orders of magnitude faster
than the current zlib and ratio comparable with that of level 1.

This patch adds DFLTCC support to zlib. It can be enabled using the
following build commands:

    $ ./configure --dfltcc
    $ make

When built like this, zlib would compress in hardware on level 1, and
in software on all other levels. Decompression will always happen in
hardware. In order to enable DFLTCC compression for levels 1-6 (i.e.,
to make it used by default) one could either configure with
`--dfltcc-level-mask=0x7e` or `export DFLTCC_LEVEL_MASK=0x7e` at run
time.

Two DFLTCC compression calls produce the same results only when they
both are made on machines of the same generation, and when the
respective buffers have the same offset relative to the start of the
page. Therefore care should be taken when using hardware compression
when reproducible results are desired. One such use case - reproducible
software builds - is handled explicitly: when the `SOURCE_DATE_EPOCH`
environment variable is set, the hardware compression is disabled.

DFLTCC does not support every single zlib feature, in particular:

    * `inflate(Z_BLOCK)` and `inflate(Z_TREES)`
    * `inflateMark()`
    * `inflatePrime()`
    * `inflateSyncPoint()`

When used, these functions will either switch to software, or, in case
this is not possible, gracefully fail.

This patch tries to add DFLTCC support in the least intrusive way.
All SystemZ-specific code is placed into a separate file, but
unfortunately there is still a noticeable amount of changes in the
main zlib code. Below is the summary of these changes.

DFLTCC takes as arguments a parameter block, an input buffer, an output
buffer and a window. Since DFLTCC requires parameter block to be
doubleword-aligned, and it's reasonable to allocate it alongside
deflate and inflate states, The `ZALLOC_STATE()`, `ZFREE_STATE()` and
`ZCOPY_STATE()` macros are introduced in order to encapsulate the
allocation details. The same is true for window, for which
the `ZALLOC_WINDOW()` and `TRY_FREE_WINDOW()` macros are introduced.

Software and hardware window formats do not match, therefore,
`deflateSetDictionary()`, `deflateGetDictionary()`,
`inflateSetDictionary()` and `inflateGetDictionary()` need special
handling, which is triggered using the new
`DEFLATE_SET_DICTIONARY_HOOK()`, `DEFLATE_GET_DICTIONARY_HOOK()`,
`INFLATE_SET_DICTIONARY_HOOK()` and `INFLATE_GET_DICTIONARY_HOOK()`
macros.

`deflateResetKeep()` and `inflateResetKeep()` now update the DFLTCC
parameter block, which is allocated alongside zlib state, using
the new `DEFLATE_RESET_KEEP_HOOK()` and `INFLATE_RESET_KEEP_HOOK()`
macros.

The new `DEFLATE_PARAMS_HOOK()` macro switches between the hardware
and the software deflate implementations when the `deflateParams()`
arguments demand this.

The new `INFLATE_PRIME_HOOK()`, `INFLATE_MARK_HOOK()` and
`INFLATE_SYNC_POINT_HOOK()` macros make the respective unsupported
calls gracefully fail.

The algorithm implemented in the hardware has different compression
ratio than the one implemented in software. In order for
`deflateBound()` to return the correct results for the hardware
implementation, the new `DEFLATE_BOUND_ADJUST_COMPLEN()` and
`DEFLATE_NEED_CONSERVATIVE_BOUND()` macros are introduced.

Actual compression and decompression are handled by the new
`DEFLATE_HOOK()` and `INFLATE_TYPEDO_HOOK()` macros. Since inflation
with DFLTCC manages the window on its own, calling `updatewindow()` is
suppressed using the new `INFLATE_NEED_UPDATEWINDOW()` macro.

In addition to the compression, DFLTCC computes the CRC-32 and Adler-32
checksums, therefore, whenever it's used, the software checksumming is
suppressed using the new `DEFLATE_NEED_CHECKSUM()` and
`INFLATE_NEED_CHECKSUM()` macros.

DFLTCC will refuse to write an End-of-block Symbol if there is no input
data, thus in some cases it is necessary to do this manually. In order
to achieve this, `send_bits()`, `bi_reverse()`, `bi_windup()` and
`flush_pending()` are promoted from `local` to `ZLIB_INTERNAL`.
Furthermore, since the block and the stream termination must be handled
in software as well, `enum block_state` is moved to `deflate.h`.

Since the first call to `dfltcc_inflate()` already needs the window,
and it might be not allocated yet, `inflate_ensure_window()` is
factored out of `updatewindow()` and made `ZLIB_INTERNAL`.

Signed-off-by: Ilya Leoshkevich <[email protected]>

Nov 20, 2024
b3fb52a
zip
tar.gz

dfltcc-20230925

Add support for IBM Z hardware-accelerated deflate

IBM Z mainframes starting from version z15 provide DFLTCC instruction,
which implements deflate algorithm in hardware with estimated
compression and decompression performance orders of magnitude faster
than the current zlib and ratio comparable with that of level 1.

This patch adds DFLTCC support to zlib. It can be enabled using the
following build commands:

    $ ./configure --dfltcc
    $ make

When built like this, zlib would compress in hardware on level 1, and
in software on all other levels. Decompression will always happen in
hardware. In order to enable DFLTCC compression for levels 1-6 (i.e.,
to make it used by default) one could either configure with
`--dfltcc-level-mask=0x7e` or `export DFLTCC_LEVEL_MASK=0x7e` at run
time.

Two DFLTCC compression calls produce the same results only when they
both are made on machines of the same generation, and when the
respective buffers have the same offset relative to the start of the
page. Therefore care should be taken when using hardware compression
when reproducible results are desired. One such use case - reproducible
software builds - is handled explicitly: when the `SOURCE_DATE_EPOCH`
environment variable is set, the hardware compression is disabled.

DFLTCC does not support every single zlib feature, in particular:

    * `inflate(Z_BLOCK)` and `inflate(Z_TREES)`
    * `inflateMark()`
    * `inflatePrime()`
    * `inflateSyncPoint()`

When used, these functions will either switch to software, or, in case
this is not possible, gracefully fail.

This patch tries to add DFLTCC support in the least intrusive way.
All SystemZ-specific code is placed into a separate file, but
unfortunately there is still a noticeable amount of changes in the
main zlib code. Below is the summary of these changes.

DFLTCC takes as arguments a parameter block, an input buffer, an output
buffer and a window. Since DFLTCC requires parameter block to be
doubleword-aligned, and it's reasonable to allocate it alongside
deflate and inflate states, The `ZALLOC_STATE()`, `ZFREE_STATE()` and
`ZCOPY_STATE()` macros are introduced in order to encapsulate the
allocation details. The same is true for window, for which
the `ZALLOC_WINDOW()` and `TRY_FREE_WINDOW()` macros are introduced.

Software and hardware window formats do not match, therefore,
`deflateSetDictionary()`, `deflateGetDictionary()`,
`inflateSetDictionary()` and `inflateGetDictionary()` need special
handling, which is triggered using the new
`DEFLATE_SET_DICTIONARY_HOOK()`, `DEFLATE_GET_DICTIONARY_HOOK()`,
`INFLATE_SET_DICTIONARY_HOOK()` and `INFLATE_GET_DICTIONARY_HOOK()`
macros.

`deflateResetKeep()` and `inflateResetKeep()` now update the DFLTCC
parameter block, which is allocated alongside zlib state, using
the new `DEFLATE_RESET_KEEP_HOOK()` and `INFLATE_RESET_KEEP_HOOK()`
macros.

The new `DEFLATE_PARAMS_HOOK()` macro switches between the hardware
and the software deflate implementations when the `deflateParams()`
arguments demand this.

The new `INFLATE_PRIME_HOOK()`, `INFLATE_MARK_HOOK()` and
`INFLATE_SYNC_POINT_HOOK()` macros make the respective unsupported
calls gracefully fail.

The algorithm implemented in the hardware has different compression
ratio than the one implemented in software. In order for
`deflateBound()` to return the correct results for the hardware
implementation, the new `DEFLATE_BOUND_ADJUST_COMPLEN()` and
`DEFLATE_NEED_CONSERVATIVE_BOUND()` macros are introduced.

Actual compression and decompression are handled by the new
`DEFLATE_HOOK()` and `INFLATE_TYPEDO_HOOK()` macros. Since inflation
with DFLTCC manages the window on its own, calling `updatewindow()` is
suppressed using the new `INFLATE_NEED_UPDATEWINDOW()` macro.

In addition to the compression, DFLTCC computes the CRC-32 and Adler-32
checksums, therefore, whenever it's used, the software checksumming is
suppressed using the new `DEFLATE_NEED_CHECKSUM()` and
`INFLATE_NEED_CHECKSUM()` macros.

DFLTCC will refuse to write an End-of-block Symbol if there is no input
data, thus in some cases it is necessary to do this manually. In order
to achieve this, `send_bits()`, `bi_reverse()`, `bi_windup()` and
`flush_pending()` are promoted from `local` to `ZLIB_INTERNAL`.
Furthermore, since the block and the stream termination must be handled
in software as well, `enum block_state` is moved to `deflate.h`.

Since the first call to `dfltcc_inflate()` already needs the window,
and it might be not allocated yet, `inflate_ensure_window()` is
factored out of `updatewindow()` and made `ZLIB_INTERNAL`.

Signed-off-by: Ilya Leoshkevich <[email protected]>

Sep 25, 2023
481ee63
zip
tar.gz

crc32vx-v6

s390x: vectorize crc32

Use vector extensions when compiling for s390x and binutils knows
about them. At runtime, check whether kernel supports vector
extensions (it has to be not just the CPU, but also the kernel) and
choose between the regular and the vectorized implementations.

Sep 18, 2023
559c8ee
zip
tar.gz

dfltcc-20230428

Add support for IBM Z hardware-accelerated deflate

IBM Z mainframes starting from version z15 provide DFLTCC instruction,
which implements deflate algorithm in hardware with estimated
compression and decompression performance orders of magnitude faster
than the current zlib and ratio comparable with that of level 1.

This patch adds DFLTCC support to zlib. In order to enable it, the
following build commands should be used:

    $ ./configure --dfltcc
    $ make

When built like this, zlib would compress in hardware on level 1, and in
software on all other levels. Decompression will always happen in
hardware. In order to enable DFLTCC compression for levels 1-6 (i.e. to
make it used by default) one could either configure with
--dfltcc-level-mask=0x7e or set the environment variable
DFLTCC_LEVEL_MASK to 0x7e at run time.

Two DFLTCC compression calls produce the same results only when they
both are made on machines of the same generation, and when the
respective buffers have the same offset relative to the start of the
page. Therefore care should be taken when using hardware compression
when reproducible results are desired. One such use case - reproducible
software builds - is handled explicitly: when SOURCE_DATE_EPOCH
environment variable is set, the hardware compression is disabled.

DFLTCC does not support every single zlib feature, in particular:

    * inflate(Z_BLOCK) and inflate(Z_TREES)
    * inflateMark()
    * inflatePrime()
    * inflateSyncPoint()

When used, these functions will either switch to software, or, in case
this is not possible, gracefully fail.

This patch tries to add DFLTCC support in the least intrusive way.
All SystemZ-specific code is placed into a separate file, but
unfortunately there is still a noticeable amount of changes in the
main zlib code. Below is the summary of these changes.

DFLTCC takes as arguments a parameter block, an input buffer, an output
buffer and a window. Since DFLTCC requires parameter block to be
doubleword-aligned, and it's reasonable to allocate it alongside
deflate and inflate states, ZALLOC_STATE, ZFREE_STATE and ZCOPY_STATE
macros were introduced in order to encapsulate the allocation details.
The same is true for window, for which ZALLOC_WINDOW and
TRY_FREE_WINDOW macros were introduced.

Software and hardware window formats do not match, therefore,
deflateSetDictionary(), deflateGetDictionary(), inflateSetDictionary()
and inflateGetDictionary() need special handling, which is triggered
using DEFLATE_SET_DICTIONARY_HOOK, DEFLATE_GET_DICTIONARY_HOOK,
INFLATE_SET_DICTIONARY_HOOK and INFLATE_GET_DICTIONARY_HOOK macros.

deflateResetKeep() and inflateResetKeep() now update the DFLTCC
parameter block, which is allocated alongside zlib state, using
the new DEFLATE_RESET_KEEP_HOOK and INFLATE_RESET_KEEP_HOOK macros.

The new DEFLATE_PARAMS_HOOK switches between hardware and software
deflate implementations when deflateParams() arguments demand this.

The new INFLATE_PRIME_HOOK, INFLATE_MARK_HOOK and
INFLATE_SYNC_POINT_HOOK macros make the respective unsupported calls
gracefully fail.

The algorithm implemented in hardware has different compression ratio
than the one implemented in software. In order for deflateBound() to
return the correct results for the hardware implementation, the new
DEFLATE_BOUND_ADJUST_COMPLEN and DEFLATE_NEED_CONSERVATIVE_BOUND macros
were introduced.

Actual compression and decompression are handled by the new DEFLATE_HOOK
and INFLATE_TYPEDO_HOOK macros. Since inflation with DFLTCC manages the
window on its own, calling updatewindow() is suppressed using the new
INFLATE_NEED_UPDATEWINDOW() macro.

In addition to compression, DFLTCC computes CRC-32 and Adler-32
checksums, therefore, whenever it's used, software checksumming needs to
be suppressed using the new DEFLATE_NEED_CHECKSUM and
INFLATE_NEED_CHECKSUM macros.

DFLTCC will refuse to write an End-of-block Symbol if there is no input
data, thus in some cases it is necessary to do this manually. In order
to achieve this, send_bits, bi_reverse, bi_windup and flush_pending
were promoted from local to ZLIB_INTERNAL. Furthermore, since block and
stream termination must be handled in software as well, block_state enum
was moved to deflate.h.

Since the first call to dfltcc_inflate already needs the window, and it
might be not allocated yet, inflate_ensure_window was factored out of
updatewindow and made ZLIB_INTERNAL.

Apr 28, 2023
f6d382a
zip
tar.gz

crc32vx-v5-1.2.11

Feb 2, 2023
5eaae2a
zip
tar.gz

dfltcc-20230109

Add support for IBM Z hardware-accelerated deflate

IBM Z mainframes starting from version z15 provide DFLTCC instruction,
which implements deflate algorithm in hardware with estimated
compression and decompression performance orders of magnitude faster
than the current zlib and ratio comparable with that of level 1.

This patch adds DFLTCC support to zlib. In order to enable it, the
following build commands should be used:

    $ ./configure --dfltcc
    $ make

When built like this, zlib would compress in hardware on level 1, and in
software on all other levels. Decompression will always happen in
hardware. In order to enable DFLTCC compression for levels 1-6 (i.e. to
make it used by default) one could either configure with
--dfltcc-level-mask=0x7e or set the environment variable
DFLTCC_LEVEL_MASK to 0x7e at run time.

Two DFLTCC compression calls produce the same results only when they
both are made on machines of the same generation, and when the
respective buffers have the same offset relative to the start of the
page. Therefore care should be taken when using hardware compression
when reproducible results are desired. One such use case - reproducible
software builds - is handled explicitly: when SOURCE_DATE_EPOCH
environment variable is set, the hardware compression is disabled.

DFLTCC does not support every single zlib feature, in particular:

    * inflate(Z_BLOCK) and inflate(Z_TREES)
    * inflateMark()
    * inflatePrime()
    * inflateSyncPoint()

When used, these functions will either switch to software, or, in case
this is not possible, gracefully fail.

This patch tries to add DFLTCC support in the least intrusive way.
All SystemZ-specific code is placed into a separate file, but
unfortunately there is still a noticeable amount of changes in the
main zlib code. Below is the summary of these changes.

DFLTCC takes as arguments a parameter block, an input buffer, an output
buffer and a window. Since DFLTCC requires parameter block to be
doubleword-aligned, and it's reasonable to allocate it alongside
deflate and inflate states, ZALLOC_STATE, ZFREE_STATE and ZCOPY_STATE
macros were introduced in order to encapsulate the allocation details.
The same is true for window, for which ZALLOC_WINDOW and
TRY_FREE_WINDOW macros were introduced.

Software and hardware window formats do not match, therefore,
deflateSetDictionary(), deflateGetDictionary(), inflateSetDictionary()
and inflateGetDictionary() need special handling, which is triggered
using DEFLATE_SET_DICTIONARY_HOOK, DEFLATE_GET_DICTIONARY_HOOK,
INFLATE_SET_DICTIONARY_HOOK and INFLATE_GET_DICTIONARY_HOOK macros.

deflateResetKeep() and inflateResetKeep() now update the DFLTCC
parameter block, which is allocated alongside zlib state, using
the new DEFLATE_RESET_KEEP_HOOK and INFLATE_RESET_KEEP_HOOK macros.

The new DEFLATE_PARAMS_HOOK switches between hardware and software
deflate implementations when deflateParams() arguments demand this.

The new INFLATE_PRIME_HOOK, INFLATE_MARK_HOOK and
INFLATE_SYNC_POINT_HOOK macros make the respective unsupported calls
gracefully fail.

The algorithm implemented in hardware has different compression ratio
than the one implemented in software. In order for deflateBound() to
return the correct results for the hardware implementation, the new
DEFLATE_BOUND_ADJUST_COMPLEN and DEFLATE_NEED_CONSERVATIVE_BOUND macros
were introduced.

Actual compression and decompression are handled by the new DEFLATE_HOOK
and INFLATE_TYPEDO_HOOK macros. Since inflation with DFLTCC manages the
window on its own, calling updatewindow() is suppressed using the new
INFLATE_NEED_UPDATEWINDOW() macro.

In addition to compression, DFLTCC computes CRC-32 and Adler-32
checksums, therefore, whenever it's used, software checksumming needs to
be suppressed using the new DEFLATE_NEED_CHECKSUM and
INFLATE_NEED_CHECKSUM macros.

DFLTCC will refuse to write an End-of-block Symbol if there is no input
data, thus in some cases it is necessary to do this manually. In order
to achieve this, send_bits, bi_reverse, bi_windup and flush_pending
were promoted from local to ZLIB_INTERNAL. Furthermore, since block and
stream termination must be handled in software as well, block_state enum
was moved to deflate.h.

Since the first call to dfltcc_inflate already needs the window, and it
might be not allocated yet, inflate_ensure_window was factored out of
updatewindow and made ZLIB_INTERNAL.

Jan 9, 2023
1132034
zip
tar.gz

dfltcc-20221222

Add support for IBM Z hardware-accelerated deflate

IBM Z mainframes starting from version z15 provide DFLTCC instruction,
which implements deflate algorithm in hardware with estimated
compression and decompression performance orders of magnitude faster
than the current zlib and ratio comparable with that of level 1.

This patch adds DFLTCC support to zlib. In order to enable it, the
following build commands should be used:

    $ ./configure --dfltcc
    $ make

When built like this, zlib would compress in hardware on level 1, and in
software on all other levels. Decompression will always happen in
hardware. In order to enable DFLTCC compression for levels 1-6 (i.e. to
make it used by default) one could either configure with
--dfltcc-level-mask=0x7e or set the environment variable
DFLTCC_LEVEL_MASK to 0x7e at run time.

Two DFLTCC compression calls produce the same results only when they
both are made on machines of the same generation, and when the
respective buffers have the same offset relative to the start of the
page. Therefore care should be taken when using hardware compression
when reproducible results are desired. One such use case - reproducible
software builds - is handled explicitly: when SOURCE_DATE_EPOCH
environment variable is set, the hardware compression is disabled.

DFLTCC does not support every single zlib feature, in particular:

    * inflate(Z_BLOCK) and inflate(Z_TREES)
    * inflateMark()
    * inflatePrime()
    * inflateSyncPoint()

When used, these functions will either switch to software, or, in case
this is not possible, gracefully fail.

This patch tries to add DFLTCC support in the least intrusive way.
All SystemZ-specific code is placed into a separate file, but
unfortunately there is still a noticeable amount of changes in the
main zlib code. Below is the summary of these changes.

DFLTCC takes as arguments a parameter block, an input buffer, an output
buffer and a window. Since DFLTCC requires parameter block to be
doubleword-aligned, and it's reasonable to allocate it alongside
deflate and inflate states, ZALLOC_STATE, ZFREE_STATE and ZCOPY_STATE
macros were introduced in order to encapsulate the allocation details.
The same is true for window, for which ZALLOC_WINDOW and
TRY_FREE_WINDOW macros were introduced.

Software and hardware window formats do not match, therefore,
deflateSetDictionary(), deflateGetDictionary(), inflateSetDictionary()
and inflateGetDictionary() need special handling, which is triggered
using DEFLATE_SET_DICTIONARY_HOOK, DEFLATE_GET_DICTIONARY_HOOK,
INFLATE_SET_DICTIONARY_HOOK and INFLATE_GET_DICTIONARY_HOOK macros.

deflateResetKeep() and inflateResetKeep() now update the DFLTCC
parameter block, which is allocated alongside zlib state, using
the new DEFLATE_RESET_KEEP_HOOK and INFLATE_RESET_KEEP_HOOK macros.

The new DEFLATE_PARAMS_HOOK switches between hardware and software
deflate implementations when deflateParams() arguments demand this.

The new INFLATE_PRIME_HOOK, INFLATE_MARK_HOOK and
INFLATE_SYNC_POINT_HOOK macros make the respective unsupported calls
gracefully fail.

The algorithm implemented in hardware has different compression ratio
than the one implemented in software. In order for deflateBound() to
return the correct results for the hardware implementation, the new
DEFLATE_BOUND_ADJUST_COMPLEN and DEFLATE_NEED_CONSERVATIVE_BOUND macros
were introduced.

Actual compression and decompression are handled by the new DEFLATE_HOOK
and INFLATE_TYPEDO_HOOK macros. Since inflation with DFLTCC manages the
window on its own, calling updatewindow() is suppressed using the new
INFLATE_NEED_UPDATEWINDOW() macro.

In addition to compression, DFLTCC computes CRC-32 and Adler-32
checksums, therefore, whenever it's used, software checksumming needs to
be suppressed using the new DEFLATE_NEED_CHECKSUM and
INFLATE_NEED_CHECKSUM macros.

DFLTCC will refuse to write an End-of-block Symbol if there is no input
data, thus in some cases it is necessary to do this manually. In order
to achieve this, send_bits, bi_reverse, bi_windup and flush_pending
were promoted from local to ZLIB_INTERNAL. Furthermore, since block and
stream termination must be handled in software as well, block_state enum
was moved to deflate.h.

Since the first call to dfltcc_inflate already needs the window, and it
might be not allocated yet, inflate_ensure_window was factored out of
updatewindow and made ZLIB_INTERNAL.

Dec 22, 2022
007aae3
zip
tar.gz

dfltcc-20221215

Add support for IBM Z hardware-accelerated deflate

IBM Z mainframes starting from version z15 provide DFLTCC instruction,
which implements deflate algorithm in hardware with estimated
compression and decompression performance orders of magnitude faster
than the current zlib and ratio comparable with that of level 1.

This patch adds DFLTCC support to zlib. In order to enable it, the
following build commands should be used:

    $ ./configure --dfltcc
    $ make

When built like this, zlib would compress in hardware on level 1, and in
software on all other levels. Decompression will always happen in
hardware. In order to enable DFLTCC compression for levels 1-6 (i.e. to
make it used by default) one could either configure with
--dfltcc-level-mask=0x7e or set the environment variable
DFLTCC_LEVEL_MASK to 0x7e at run time.

Two DFLTCC compression calls produce the same results only when they
both are made on machines of the same generation, and when the
respective buffers have the same offset relative to the start of the
page. Therefore care should be taken when using hardware compression
when reproducible results are desired. One such use case - reproducible
software builds - is handled explicitly: when SOURCE_DATE_EPOCH
environment variable is set, the hardware compression is disabled.

DFLTCC does not support every single zlib feature, in particular:

    * inflate(Z_BLOCK) and inflate(Z_TREES)
    * inflateMark()
    * inflatePrime()
    * inflateSyncPoint()

When used, these functions will either switch to software, or, in case
this is not possible, gracefully fail.

This patch tries to add DFLTCC support in the least intrusive way.
All SystemZ-specific code is placed into a separate file, but
unfortunately there is still a noticeable amount of changes in the
main zlib code. Below is the summary of these changes.

DFLTCC takes as arguments a parameter block, an input buffer, an output
buffer and a window. Since DFLTCC requires parameter block to be
doubleword-aligned, and it's reasonable to allocate it alongside
deflate and inflate states, ZALLOC_STATE, ZFREE_STATE and ZCOPY_STATE
macros were introduced in order to encapsulate the allocation details.
The same is true for window, for which ZALLOC_WINDOW and
TRY_FREE_WINDOW macros were introduced.

Software and hardware window formats do not match, therefore,
deflateSetDictionary(), deflateGetDictionary(), inflateSetDictionary()
and inflateGetDictionary() need special handling, which is triggered
using DEFLATE_SET_DICTIONARY_HOOK, DEFLATE_GET_DICTIONARY_HOOK,
INFLATE_SET_DICTIONARY_HOOK and INFLATE_GET_DICTIONARY_HOOK macros.

deflateResetKeep() and inflateResetKeep() now update the DFLTCC
parameter block, which is allocated alongside zlib state, using
the new DEFLATE_RESET_KEEP_HOOK and INFLATE_RESET_KEEP_HOOK macros.

The new DEFLATE_PARAMS_HOOK switches between hardware and software
deflate implementations when deflateParams() arguments demand this.

The new INFLATE_PRIME_HOOK, INFLATE_MARK_HOOK and
INFLATE_SYNC_POINT_HOOK macros make the respective unsupported calls
gracefully fail.

The algorithm implemented in hardware has different compression ratio
than the one implemented in software. In order for deflateBound() to
return the correct results for the hardware implementation, the new
DEFLATE_BOUND_ADJUST_COMPLEN and DEFLATE_NEED_CONSERVATIVE_BOUND macros
were introduced.

Actual compression and decompression are handled by the new DEFLATE_HOOK
and INFLATE_TYPEDO_HOOK macros. Since inflation with DFLTCC manages the
window on its own, calling updatewindow() is suppressed using the new
INFLATE_NEED_UPDATEWINDOW() macro.

In addition to compression, DFLTCC computes CRC-32 and Adler-32
checksums, therefore, whenever it's used, software checksumming needs to
be suppressed using the new DEFLATE_NEED_CHECKSUM and
INFLATE_NEED_CHECKSUM macros.

DFLTCC will refuse to write an End-of-block Symbol if there is no input
data, thus in some cases it is necessary to do this manually. In order
to achieve this, send_bits, bi_reverse, bi_windup and flush_pending
were promoted from local to ZLIB_INTERNAL. Furthermore, since block and
stream termination must be handled in software as well, block_state enum
was moved to deflate.h.

Since the first call to dfltcc_inflate already needs the window, and it
might be not allocated yet, inflate_ensure_window was factored out of
updatewindow and made ZLIB_INTERNAL.

Dec 15, 2022
28a49d8
zip
tar.gz

dfltcc-20221122

Add support for IBM Z hardware-accelerated deflate

IBM Z mainframes starting from version z15 provide DFLTCC instruction,
which implements deflate algorithm in hardware with estimated
compression and decompression performance orders of magnitude faster
than the current zlib and ratio comparable with that of level 1.

This patch adds DFLTCC support to zlib. In order to enable it, the
following build commands should be used:

    $ ./configure --dfltcc
    $ make

When built like this, zlib would compress in hardware on level 1, and in
software on all other levels. Decompression will always happen in
hardware. In order to enable DFLTCC compression for levels 1-6 (i.e. to
make it used by default) one could either configure with
--dfltcc-level-mask=0x7e or set the environment variable
DFLTCC_LEVEL_MASK to 0x7e at run time.

Two DFLTCC compression calls produce the same results only when they
both are made on machines of the same generation, and when the
respective buffers have the same offset relative to the start of the
page. Therefore care should be taken when using hardware compression
when reproducible results are desired. One such use case - reproducible
software builds - is handled explicitly: when SOURCE_DATE_EPOCH
environment variable is set, the hardware compression is disabled.

DFLTCC does not support every single zlib feature, in particular:

    * inflate(Z_BLOCK) and inflate(Z_TREES)
    * inflateMark()
    * inflatePrime()
    * inflateSyncPoint()

When used, these functions will either switch to software, or, in case
this is not possible, gracefully fail.

This patch tries to add DFLTCC support in the least intrusive way.
All SystemZ-specific code is placed into a separate file, but
unfortunately there is still a noticeable amount of changes in the
main zlib code. Below is the summary of these changes.

DFLTCC takes as arguments a parameter block, an input buffer, an output
buffer and a window. Since DFLTCC requires parameter block to be
doubleword-aligned, and it's reasonable to allocate it alongside
deflate and inflate states, ZALLOC_STATE, ZFREE_STATE and ZCOPY_STATE
macros were introduced in order to encapsulate the allocation details.
The same is true for window, for which ZALLOC_WINDOW and
TRY_FREE_WINDOW macros were introduced.

While for inflate software and hardware window formats match, this is
not the case for deflate. Therefore, deflateSetDictionary and
deflateGetDictionary need special handling, which is triggered using the
new DEFLATE_SET_DICTIONARY_HOOK and DEFLATE_GET_DICTIONARY_HOOK macros.

deflateResetKeep() and inflateResetKeep() now update the DFLTCC
parameter block, which is allocated alongside zlib state, using
the new DEFLATE_RESET_KEEP_HOOK and INFLATE_RESET_KEEP_HOOK macros.

The new DEFLATE_PARAMS_HOOK switches between hardware and software
deflate implementations when deflateParams() arguments demand this.

The new INFLATE_PRIME_HOOK, INFLATE_MARK_HOOK and
INFLATE_SYNC_POINT_HOOK macros make the respective unsupported calls
gracefully fail.

The algorithm implemented in hardware has different compression ratio
than the one implemented in software. In order for deflateBound() to
return the correct results for the hardware implementation, the new
DEFLATE_BOUND_ADJUST_COMPLEN and DEFLATE_NEED_CONSERVATIVE_BOUND macros
were introduced.

Actual compression and decompression are handled by the new DEFLATE_HOOK
and INFLATE_TYPEDO_HOOK macros. Since inflation with DFLTCC manages the
window on its own, calling updatewindow() is suppressed using the new
INFLATE_NEED_UPDATEWINDOW() macro.

In addition to compression, DFLTCC computes CRC-32 and Adler-32
checksums, therefore, whenever it's used, software checksumming needs to
be suppressed using the new DEFLATE_NEED_CHECKSUM and
INFLATE_NEED_CHECKSUM macros.

DFLTCC will refuse to write an End-of-block Symbol if there is no input
data, thus in some cases it is necessary to do this manually. In order
to achieve this, send_bits, bi_reverse, bi_windup and flush_pending
were promoted from local to ZLIB_INTERNAL. Furthermore, since block and
stream termination must be handled in software as well, block_state enum
was moved to deflate.h.

Since the first call to dfltcc_inflate already needs the window, and it
might be not allocated yet, inflate_ensure_window was factored out of
updatewindow and made ZLIB_INTERNAL.

Nov 22, 2022
26f2c0a
zip
tar.gz

crc32vx-v5

s390x: vectorize crc32

Use vector extensions when compiling for s390x and binutils knows
about them. At runtime, check whether kernel supports vector
extensions (it has to be not just the CPU, but also the kernel) and
choose between the regular and the vectorized implementations.

Nov 22, 2022
6ae5490
zip
tar.gz

PreviousNext

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

dfltcc-25-12-24

dfltcc-20230925

crc32vx-v6

dfltcc-20230428

crc32vx-v5-1.2.11

dfltcc-20230109

dfltcc-20221222

dfltcc-20221215

dfltcc-20221122

crc32vx-v5

Tags: iii-i/zlib