fix: preserve bf16 GGUF file when explicitly requested in quantization list#4991
fix: preserve bf16 GGUF file when explicitly requested in quantization list#4991Ricardo-M-L wants to merge 4 commits intounslothai:mainfrom
Conversation
…n list When users request multiple quantization methods including the base format (e.g., ["q4_k_m", "bf16"]), the bf16 GGUF serves as both the intermediate conversion and a user-requested output. The cleanup step unconditionally deleted this file, losing the explicitly requested bf16 output. Only delete the intermediate base GGUF when the user did not request it. Fixes unslothai#4932 Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
There was a problem hiding this comment.
Code Review
This pull request modifies the model file cleanup logic in save_to_gguf to prevent the deletion of the base GGUF file when it is explicitly requested in the quantization methods. However, this change introduces a regression in the ordering of all_saved_locations, which is critical for Vision-Language Models (VLM) where specific file positions are expected for generated commands. A suggestion was provided to correctly handle the conditional deletion while ensuring proper list ordering for VLMs.
| if quants_created and first_conversion not in frozenset(quantization_method): | ||
| all_saved_locations.remove(base_gguf) | ||
| Path(base_gguf).unlink(missing_ok = True) |
There was a problem hiding this comment.
The current implementation makes the list reversal (line 1356) conditional on the deletion of the base GGUF file. This introduces a regression in the ordering of the all_saved_locations list when the base format is explicitly requested.
For Vision-Language Models (VLM), the example commands generated later (e.g., at line 2145) rely on the projector being the last element in the list. If the base file is preserved and the list is not reversed, the order remains [base_model, mmproj, quantized_model], causing the example command to incorrectly use a quantized model as the projector. For text models, skipping the reversal means the base model remains at index 0 instead of the newly created quantized model, which is usually the intended primary output.
Moving the reversal logic out of the conditional deletion block and adding a specific adjustment for VLMs ensures consistent and correct ordering.
| if quants_created and first_conversion not in frozenset(quantization_method): | |
| all_saved_locations.remove(base_gguf) | |
| Path(base_gguf).unlink(missing_ok = True) | |
| if quants_created: | |
| if first_conversion not in quantization_method: | |
| all_saved_locations.remove(base_gguf) | |
| Path(base_gguf).unlink(missing_ok = True) | |
| elif is_vlm and len(initial_files) > 1: | |
| # Move projector to start so it ends up at the end after reverse() | |
| all_saved_locations.remove(initial_files[1]) | |
| all_saved_locations.insert(0, initial_files[1]) |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 062af064df
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| ) | ||
| print("Unsloth: Model files cleanup...") | ||
| if quants_created: | ||
| if quants_created and first_conversion not in frozenset(quantization_method): |
There was a problem hiding this comment.
Preserve VLM file ordering when keeping base GGUF
This new condition now skips the entire cleanup block whenever first_conversion is explicitly requested, but that block also contains the all_saved_locations.reverse() reorder used to normalize multimodal output order ([text_model, mmproj]). In the affected case (e.g. VLM with quantization_method including both first_conversion and another quant), the list can remain in an order where callers using positional assumptions (like all_file_locations[0] as model and all_file_locations[-1] as --mmproj at save.py:2145) receive incorrect paths.
Useful? React with 👍 / 👎.
|
Lines 2142 to 2152 in 80c12ff not removing the said item from list makes this llamacpp example we show, point to bf16 variant ig? That is not ideal |
…ring Address review feedback: the reverse() call must always execute when quants_created is True to maintain correct [text_model, mmproj] ordering for VLMs. Only the file deletion should be conditional on whether the user requested the base format. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]> Signed-off-by: Ricardo-M-L <[email protected]>
…xample commands Address review from @Datta0: when the base format (e.g. bf16) is kept in all_saved_locations, it could end up at [-1], causing the VLM example command to use bf16 as --mmproj instead of the actual projector file. Move the preserved base file to index 1 (after the primary quantized model, before mmproj) so [0] and [-1] remain correct for example commands. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]> Signed-off-by: Ricardo-M-L <[email protected]>
|
@Datta0 Good catch! You're right — for VLMs, keeping the base file at Fixed in the latest commit: when the base format is preserved, it's moved to index 1 (after the primary quantized model, before mmproj), so Example ordering for VLM with
|
for more information, see https://pre-commit.ci
|
@Ricardo-M-L really appreciate the effort however this is duplicate of #4937 that i'm actively reviewing and that will probably be merged after pushing some additional gguf related fixes separately. I have to close this one, as we can't process duplicate issue fixes. @Datta0 note. Thanks! |
What does this PR do?
When users request multiple quantization methods including the base format
(e.g.,
["q4_k_m", "bf16"]), thebf16GGUF file serves dual roles: it isboth the intermediate conversion (
first_conversion) and a user-requestedoutput.
The cleanup step at
save.py:1351-1353unconditionally deletes the base GGUFfile after quantization, which loses the explicitly requested
bf16output.Fix
Add a guard to only delete the intermediate base GGUF when the user did not
request it as one of their quantization formats:
Reproduction
Fixes #4932