fix: preserve bf16 GGUF file when explicitly requested in quantization list by Ricardo-M-L · Pull Request #4991 · unslothai/unsloth

Ricardo-M-L · 2026-04-13T04:04:42Z

What does this PR do?

When users request multiple quantization methods including the base format
(e.g., ["q4_k_m", "bf16"]), the bf16 GGUF file serves dual roles: it is
both the intermediate conversion (first_conversion) and a user-requested
output.

The cleanup step at save.py:1351-1353 unconditionally deletes the base GGUF
file after quantization, which loses the explicitly requested bf16 output.

Fix

Add a guard to only delete the intermediate base GGUF when the user did not
request it as one of their quantization formats:

# Before
if quants_created:

# After  
if quants_created and first_conversion not in frozenset(quantization_method):

Reproduction

model.save_pretrained_gguf(
    "model",
    tokenizer,
    quantization_method=["q4_k_m", "bf16"],  # bf16 is deleted!
)

Fixes #4932

…n list When users request multiple quantization methods including the base format (e.g., ["q4_k_m", "bf16"]), the bf16 GGUF serves as both the intermediate conversion and a user-requested output. The cleanup step unconditionally deleted this file, losing the explicitly requested bf16 output. Only delete the intermediate base GGUF when the user did not request it. Fixes unslothai#4932 Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

gemini-code-assist

Code Review

This pull request modifies the model file cleanup logic in save_to_gguf to prevent the deletion of the base GGUF file when it is explicitly requested in the quantization methods. However, this change introduces a regression in the ordering of all_saved_locations, which is critical for Vision-Language Models (VLM) where specific file positions are expected for generated commands. A suggestion was provided to correctly handle the conditional deletion while ensuring proper list ordering for VLMs.

gemini-code-assist · 2026-04-13T04:07:35Z

+        if quants_created and first_conversion not in frozenset(quantization_method):
            all_saved_locations.remove(base_gguf)
            Path(base_gguf).unlink(missing_ok = True)


The current implementation makes the list reversal (line 1356) conditional on the deletion of the base GGUF file. This introduces a regression in the ordering of the all_saved_locations list when the base format is explicitly requested.

For Vision-Language Models (VLM), the example commands generated later (e.g., at line 2145) rely on the projector being the last element in the list. If the base file is preserved and the list is not reversed, the order remains [base_model, mmproj, quantized_model], causing the example command to incorrectly use a quantized model as the projector. For text models, skipping the reversal means the base model remains at index 0 instead of the newly created quantized model, which is usually the intended primary output.

Moving the reversal logic out of the conditional deletion block and adding a specific adjustment for VLMs ensures consistent and correct ordering.

Suggested change

if quants_created and first_conversion not in frozenset(quantization_method):

all_saved_locations.remove(base_gguf)

Path(base_gguf).unlink(missing_ok = True)

if quants_created:

if first_conversion not in quantization_method:

all_saved_locations.remove(base_gguf)

Path(base_gguf).unlink(missing_ok = True)

elif is_vlm and len(initial_files) > 1:

# Move projector to start so it ends up at the end after reverse()

all_saved_locations.remove(initial_files[1])

all_saved_locations.insert(0, initial_files[1])

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 062af064df

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2026-04-13T04:07:49Z

                        )
        print("Unsloth: Model files cleanup...")
-        if quants_created:
+        if quants_created and first_conversion not in frozenset(quantization_method):


Preserve VLM file ordering when keeping base GGUF

This new condition now skips the entire cleanup block whenever first_conversion is explicitly requested, but that block also contains the all_saved_locations.reverse() reorder used to normalize multimodal output order ([text_model, mmproj]). In the affected case (e.g. VLM with quantization_method including both first_conversion and another quant), the list can remain in an order where callers using positional assumptions (like all_file_locations[0] as model and all_file_locations[-1] as --mmproj at save.py:2145) receive incorrect paths.

Useful? React with 👍 / 👎.

Datta0 · 2026-04-13T07:21:30Z

unsloth/unsloth/save.py

Lines 2142 to 2152 in 80c12ff

    
           if is_vlm_update: 
        
               print("\n") 
        
               print( 
        
                   f"Unsloth: example usage for Multimodal LLMs: {os.path.join(_bin_dir, 'llama-mtmd-cli' + _exe)} -m {all_file_locations[0]} --mmproj {all_file_locations[-1]}" 
        
               ) 
        
               print("Unsloth: load image inside llama.cpp runner: /image test_image.jpg") 
        
               print("Unsloth: Prompt model to describe the image") 
        
           else: 
        
               print( 
        
                   f'Unsloth: example usage for text only LLMs: {os.path.join(_bin_dir, "llama-cli" + _exe)} --model {all_file_locations[0]} -p "why is the sky blue?"' 
        
               )

not removing the said item from list makes this llamacpp example we show, point to bf16 variant ig? That is not ideal

…ring Address review feedback: the reverse() call must always execute when quants_created is True to maintain correct [text_model, mmproj] ordering for VLMs. Only the file deletion should be conditional on whether the user requested the base format. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]> Signed-off-by: Ricardo-M-L <[email protected]>

@Datta0

…xample commands Address review from @Datta0: when the base format (e.g. bf16) is kept in all_saved_locations, it could end up at [-1], causing the VLM example command to use bf16 as --mmproj instead of the actual projector file. Move the preserved base file to index 1 (after the primary quantized model, before mmproj) so [0] and [-1] remain correct for example commands. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]> Signed-off-by: Ricardo-M-L <[email protected]>

Ricardo-M-L · 2026-04-14T14:10:14Z

@Datta0 Good catch! You're right — for VLMs, keeping the base file at [-1] would make the example command use bf16 as --mmproj instead of the actual projector.

Fixed in the latest commit: when the base format is preserved, it's moved to index 1 (after the primary quantized model, before mmproj), so [0] and [-1] remain correct for example commands.

Example ordering for VLM with ["q4_k_m", "bf16"]:

[0] = q4_k_m (model) ✓
[1] = bf16 (preserved base)
[-1] = mmproj ✓

for more information, see https://pre-commit.ci

rolandtannous · 2026-04-15T05:06:57Z

@Ricardo-M-L really appreciate the effort however this is duplicate of #4937 that i'm actively reviewing and that will probably be merged after pushing some additional gguf related fixes separately. I have to close this one, as we can't process duplicate issue fixes. @Datta0 note. Thanks!

Ricardo-M-L requested review from danielhanchen and rolandtannous as code owners April 13, 2026 04:04

gemini-code-assist Bot reviewed Apr 13, 2026

View reviewed changes

chatgpt-codex-connector Bot reviewed Apr 13, 2026

View reviewed changes

Ricardo-M-L and others added 2 commits April 14, 2026 00:37

[pre-commit.ci] auto fixes from pre-commit.com hooks

0ae5091

for more information, see https://pre-commit.ci

rolandtannous closed this Apr 15, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: preserve bf16 GGUF file when explicitly requested in quantization list#4991

fix: preserve bf16 GGUF file when explicitly requested in quantization list#4991
Ricardo-M-L wants to merge 4 commits intounslothai:mainfrom
Ricardo-M-L:fix/preserve-bf16-gguf-v2

Ricardo-M-L commented Apr 13, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot Apr 13, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot Apr 13, 2026

Uh oh!

Datta0 commented Apr 13, 2026

Uh oh!

Ricardo-M-L commented Apr 14, 2026

Uh oh!

rolandtannous commented Apr 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

Ricardo-M-L commented Apr 13, 2026

What does this PR do?

Fix

Reproduction

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Apr 13, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Apr 13, 2026

Choose a reason for hiding this comment

Uh oh!

Datta0 commented Apr 13, 2026

Uh oh!

Ricardo-M-L commented Apr 14, 2026

Uh oh!

rolandtannous commented Apr 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants