Skip to content

fix: preserve bf16 GGUF file when explicitly requested in quantization list#4991

Closed
Ricardo-M-L wants to merge 4 commits intounslothai:mainfrom
Ricardo-M-L:fix/preserve-bf16-gguf-v2
Closed

fix: preserve bf16 GGUF file when explicitly requested in quantization list#4991
Ricardo-M-L wants to merge 4 commits intounslothai:mainfrom
Ricardo-M-L:fix/preserve-bf16-gguf-v2

Conversation

@Ricardo-M-L
Copy link
Copy Markdown
Contributor

What does this PR do?

When users request multiple quantization methods including the base format
(e.g., ["q4_k_m", "bf16"]), the bf16 GGUF file serves dual roles: it is
both the intermediate conversion (first_conversion) and a user-requested
output.

The cleanup step at save.py:1351-1353 unconditionally deletes the base GGUF
file after quantization, which loses the explicitly requested bf16 output.

Fix

Add a guard to only delete the intermediate base GGUF when the user did not
request it as one of their quantization formats:

# Before
if quants_created:

# After  
if quants_created and first_conversion not in frozenset(quantization_method):

Reproduction

model.save_pretrained_gguf(
    "model",
    tokenizer,
    quantization_method=["q4_k_m", "bf16"],  # bf16 is deleted!
)

Fixes #4932

…n list

When users request multiple quantization methods including the base format
(e.g., ["q4_k_m", "bf16"]), the bf16 GGUF serves as both the intermediate
conversion and a user-requested output. The cleanup step unconditionally
deleted this file, losing the explicitly requested bf16 output.

Only delete the intermediate base GGUF when the user did not request it.

Fixes unslothai#4932

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request modifies the model file cleanup logic in save_to_gguf to prevent the deletion of the base GGUF file when it is explicitly requested in the quantization methods. However, this change introduces a regression in the ordering of all_saved_locations, which is critical for Vision-Language Models (VLM) where specific file positions are expected for generated commands. A suggestion was provided to correctly handle the conditional deletion while ensuring proper list ordering for VLMs.

Comment thread unsloth/save.py Outdated
Comment on lines 1351 to 1353
if quants_created and first_conversion not in frozenset(quantization_method):
all_saved_locations.remove(base_gguf)
Path(base_gguf).unlink(missing_ok = True)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The current implementation makes the list reversal (line 1356) conditional on the deletion of the base GGUF file. This introduces a regression in the ordering of the all_saved_locations list when the base format is explicitly requested.

For Vision-Language Models (VLM), the example commands generated later (e.g., at line 2145) rely on the projector being the last element in the list. If the base file is preserved and the list is not reversed, the order remains [base_model, mmproj, quantized_model], causing the example command to incorrectly use a quantized model as the projector. For text models, skipping the reversal means the base model remains at index 0 instead of the newly created quantized model, which is usually the intended primary output.

Moving the reversal logic out of the conditional deletion block and adding a specific adjustment for VLMs ensures consistent and correct ordering.

Suggested change
if quants_created and first_conversion not in frozenset(quantization_method):
all_saved_locations.remove(base_gguf)
Path(base_gguf).unlink(missing_ok = True)
if quants_created:
if first_conversion not in quantization_method:
all_saved_locations.remove(base_gguf)
Path(base_gguf).unlink(missing_ok = True)
elif is_vlm and len(initial_files) > 1:
# Move projector to start so it ends up at the end after reverse()
all_saved_locations.remove(initial_files[1])
all_saved_locations.insert(0, initial_files[1])

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 062af064df

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread unsloth/save.py Outdated
)
print("Unsloth: Model files cleanup...")
if quants_created:
if quants_created and first_conversion not in frozenset(quantization_method):
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Preserve VLM file ordering when keeping base GGUF

This new condition now skips the entire cleanup block whenever first_conversion is explicitly requested, but that block also contains the all_saved_locations.reverse() reorder used to normalize multimodal output order ([text_model, mmproj]). In the affected case (e.g. VLM with quantization_method including both first_conversion and another quant), the list can remain in an order where callers using positional assumptions (like all_file_locations[0] as model and all_file_locations[-1] as --mmproj at save.py:2145) receive incorrect paths.

Useful? React with 👍 / 👎.

@Datta0
Copy link
Copy Markdown
Collaborator

Datta0 commented Apr 13, 2026

unsloth/unsloth/save.py

Lines 2142 to 2152 in 80c12ff

if is_vlm_update:
print("\n")
print(
f"Unsloth: example usage for Multimodal LLMs: {os.path.join(_bin_dir, 'llama-mtmd-cli' + _exe)} -m {all_file_locations[0]} --mmproj {all_file_locations[-1]}"
)
print("Unsloth: load image inside llama.cpp runner: /image test_image.jpg")
print("Unsloth: Prompt model to describe the image")
else:
print(
f'Unsloth: example usage for text only LLMs: {os.path.join(_bin_dir, "llama-cli" + _exe)} --model {all_file_locations[0]} -p "why is the sky blue?"'
)

not removing the said item from list makes this llamacpp example we show, point to bf16 variant ig? That is not ideal

Ricardo-M-L and others added 2 commits April 14, 2026 00:37
…ring

Address review feedback: the reverse() call must always execute when
quants_created is True to maintain correct [text_model, mmproj] ordering
for VLMs. Only the file deletion should be conditional on whether the
user requested the base format.

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Signed-off-by: Ricardo-M-L <[email protected]>
…xample commands

Address review from @Datta0: when the base format (e.g. bf16) is kept
in all_saved_locations, it could end up at [-1], causing the VLM example
command to use bf16 as --mmproj instead of the actual projector file.

Move the preserved base file to index 1 (after the primary quantized
model, before mmproj) so [0] and [-1] remain correct for example
commands.

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Signed-off-by: Ricardo-M-L <[email protected]>
@Ricardo-M-L
Copy link
Copy Markdown
Contributor Author

@Datta0 Good catch! You're right — for VLMs, keeping the base file at [-1] would make the example command use bf16 as --mmproj instead of the actual projector.

Fixed in the latest commit: when the base format is preserved, it's moved to index 1 (after the primary quantized model, before mmproj), so [0] and [-1] remain correct for example commands.

Example ordering for VLM with ["q4_k_m", "bf16"]:

  • [0] = q4_k_m (model) ✓
  • [1] = bf16 (preserved base)
  • [-1] = mmproj ✓

@rolandtannous
Copy link
Copy Markdown
Collaborator

@Ricardo-M-L really appreciate the effort however this is duplicate of #4937 that i'm actively reviewing and that will probably be merged after pushing some additional gguf related fixes separately. I have to close this one, as we can't process duplicate issue fixes. @Datta0 note. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug] bf16 GGUF deleted by internal cleanup when exported alongside quantized types

3 participants