[Packaging] Shrink wheel ~35 % via nvcc --compress-mode=size#1704
[Packaging] Shrink wheel ~35 % via nvcc --compress-mode=size#1704trmanish wants to merge 1 commit intobitsandbytes-foundation:mainfrom
Conversation
Signed-off-by: manish <[email protected]>
|
Thanks, appreciate the suggestion! I have the same concern mentioned over on PyTorch regarding support for users with older drivers: pytorch/pytorch#157791 (comment) Mainly it seems that this would require cu124+ users to have the 550+ driver, while currently we should still have compatibility for driver version 525+. So will have to weigh that in as a consideration. |
|
I believe an earlier comment from original PR did say it won't have that as a requirement pytorch/pytorch#157791 (comment) But I believe latest from Pytorch is as below: pytorch/pytorch#157791 (comment) However my understanding is(pls correct me if wrong) that the only variant that would be built with --compress-mode=size is the cu124 wheel, and that wheel already implies a 550-series driver. Users on 525/535 stay on the cu122 / cu121 wheels, which this PR leaves untouched. Options Opt-in flag – guard it behind ENABLE_BNB_CUDA_COMPRESSION=1; default off. Dual wheels – publish both bitsandbytes-cu124.whl and -cu124-slim.whl. |
|
Hi, We appreciate the effort and explanation, but unfortunately we cannot merge this. The assumption that only applying to cu124+ builds limits the scope is flawed, since cu124+ builds can still be utilized with the older driver versions thanks to CUDA's Minor Version Compatibility. This means that the 12.4, 12.6, 12.8, and 12.9 builds can currently run on systems with driver v525. We have sufficient evidence that this is a valid usage scenario. For example, vLLM received a similar PR and had to revert it for this reason: vllm-project/vllm#20853. See also: pytorch/pytorch#157791 (comment) The three most recent minor PyTorch releases use cu124+ builds by default, and it's supported on systems with drivers v525+. Publishing additional wheels adds extra complexity that we do not wish to take on. With that said, we will use this option when we start producing builds for CUDA 13, which by default will start to use the "balanced" compressed mode, and can provide guarantees that all users can support the "size" mode as well. Additionally, we will explorer further ways to limit our binary sizes:
|
What this PR does
--compress-mode=sizetoCMAKE_CUDA_FLAGSfor nvcc ≥ 12.4.Impact
Compatibility
import bitsandbytes.