Fix bf16 GGUF deleted by cleanup when exported alongside quantized types#4937
Fix bf16 GGUF deleted by cleanup when exported alongside quantized types#4937Shivamjohri247 wants to merge 8 commits intounslothai:mainfrom
Conversation
When bf16 is both the intermediate conversion format and a requested output, the cleanup step was unconditionally deleting the base GGUF file. Guard the deletion so the intermediate is only removed when it is not part of the user's requested quantization methods. Closes unslothai#4932
There was a problem hiding this comment.
Code Review
This pull request modifies unsloth/save.py to prevent the deletion of the base GGUF file if it's not part of the quantization method. However, this change introduces a regression in the file ordering for Vision-Language Models (VLMs), causing the mmproj file to be incorrectly positioned after list reversal, which could break downstream logic. A suggestion is provided to correctly reorder the files for VLMs when the base model is kept.
When bf16 is a requested output alongside other quantized types for VLMs, the base GGUF (text model) stays in the list. The subsequent reverse() would place text_model at the end instead of mmproj, breaking downstream code that expects mmproj as the last element. Swap base_gguf past mmproj before reversal so mmproj ends up at the correct position after the list is flipped. Credit: Gemini Code Assist review on PR unslothai#4937
|
Thanks for the catch! You're right — for VLMs where Fixed in the latest push by swapping
|
|
Thanks for the fix! I'm going to push a couple of commits to address a few items. |
The previous len(initial_files) > 1 guard would incorrectly fire for sharded text-only models (convert_to_gguf extends shard files into initial_files when max_shard_size splits a large model), reordering the shard list as if it were a VLM. is_vlm is already maintained from convert_to_gguf's return value and is the semantically correct flag for "has mmproj that must stay last".
When convert_to_gguf shards the text model output, initial_files has more than two entries ([text_shard1, text_shard2, ..., mmproj]). The previous insert(1, pop(0)) only repositioned one text part, leaving mmproj mid-list after the trailing reverse() and breaking the "mmproj is last" invariant consumed at save.py:2145 where all_file_locations[-1] is used for --mmproj. Rebuild the list explicitly so that after reverse() the final order is [quants_desc..., text_parts_in_order..., mmproj], correct for any number of text shards including the common single-file case.
|
Nice catch on the root cause. The preservation logic is correct for the common case, but two edges gave me pause so I pushed follow-ups (feel free to revert if you disagree):
For a sharded VLM, |
|
Thanks for the thorough review and the follow-up commits! Both fixes make total sense:
Appreciate you catching these edge cases. Happy with the changes, let's merge whenever you're ready. |
The delete branch of the GGUF cleanup only removed initial_files[0] from the returned list and from disk. For sharded outputs (large text models split by max_shard_size), the remaining shards were left: - on disk as orphans - in the returned list, pushing mmproj out of its [-1] position Exposed by sharded VLM exports (gemma-3-27b-it, 2 text shards + mmproj) with quantization_method=["q4_k_m"]: shard 1 was deleted, shard 2 survived as a stray, and all_file_locations[-1] pointed at shard 2 instead of mmproj when constructing the --mmproj CLI hint. Iterate over initial_files[:n_text] so all text parts are cleaned up; mmproj is excluded by the slice and remains at its mmproj position.
|
Follow-up from E2E testing on
When all_saved_locations.remove(base_gguf)
Path(base_gguf).unlink(missing_ok = True)
Observed with
The fix iterates This is pre-existing — it was broken before this PR opened. Scoping it in because the PR is already the cleanup-logic PR and the gemma sharded test made it loud. |
Summary
Fixes #4932
When
bf16is passed alongside other quantization types (e.g.["q4_k_m", "q6_k", "q8_0", "bf16"]), Unsloth's internal cleanup step deletes thebf16GGUF file after using it as an intermediate for quantizing the other formats — even thoughbf16was explicitly requested as an output.Root Cause
In
unsloth/save.py:1350-1353, the GGUF export logic:base_gguffile infirst_conversionformat (e.g.bf16) — this IS the bf16 outputquantization_method, skipping any method that equalsfirst_conversion(line 1300) — sobf16is never re-createdbase_gguf:This means when
bf16is both the conversion intermediate AND a requested output, the file gets deleted. Only the quantized outputs survive. The log shows:Fix
Guard the deletion so the intermediate file is only removed when it is NOT part of the user's requested
quantization_method:This also aligns with line 1363 which already correctly computes
want_full_precision = first_conversion in frozenset(quantization_method)— the cleanup was just not respecting that.Impact
bf16GGUF is silently deleted when exported alongside other quantized typesbf16GGUF is preserved when explicitly requested; intermediate-only files are still cleaned up as beforeTest plan
model.save_pretrained_gguf("./output", tokenizer, quantization_method=["q4_k_m", "q6_k", "q8_0", "bf16"])— verifyBF16.ggufis present in outputmodel.save_pretrained_gguf("./output", tokenizer, quantization_method=["q4_k_m", "q8_0"])— verify intermediatebf16GGUF IS cleaned upmodel.save_pretrained_gguf("./output", tokenizer, quantization_method=["bf16"])— verifyBF16.ggufis present (existing behavior unchanged)