Fix bf16 GGUF deleted by cleanup when exported alongside quantized types by Shivamjohri247 · Pull Request #4937 · unslothai/unsloth

Shivamjohri247 · 2026-04-09T17:27:33Z

Summary

When bf16 is passed alongside other quantization types (e.g. ["q4_k_m", "q6_k", "q8_0", "bf16"]), Unsloth's internal cleanup step deletes the bf16 GGUF file after using it as an intermediate for quantizing the other formats — even though bf16 was explicitly requested as an output.

Root Cause

In unsloth/save.py:1350-1353, the GGUF export logic:

Creates a base_gguf file in first_conversion format (e.g. bf16) — this IS the bf16 output
Loops through quantization_method, skipping any method that equals first_conversion (line 1300) — so bf16 is never re-created
After quantization, unconditionally deletes base_gguf:

# BEFORE (broken)
if quants_created:
    all_saved_locations.remove(base_gguf)
    Path(base_gguf).unlink(missing_ok = True)

This means when bf16 is both the conversion intermediate AND a requested output, the file gets deleted. Only the quantized outputs survive. The log shows:

Unsloth: [1] Converting model into bf16 GGUF format.
Unsloth: [2] Converting GGUF bf16 into q4_k_m.
Unsloth: [2] Converting GGUF bf16 into q6_k.
Unsloth: [2] Converting GGUF bf16 into q8_0.
Unsloth: Model files cleanup...         ← BF16.gguf deleted here
Generated files: [Q8_0.gguf, Q6_K.gguf, Q4_K_M.gguf, BF16-mmproj.gguf]
                 ^ BF16.gguf missing

Fix

Guard the deletion so the intermediate file is only removed when it is NOT part of the user's requested quantization_method:

# AFTER (fixed)
if quants_created:
    if first_conversion not in quantization_method:
        all_saved_locations.remove(base_gguf)
        Path(base_gguf).unlink(missing_ok = True)

This also aligns with line 1363 which already correctly computes want_full_precision = first_conversion in frozenset(quantization_method) — the cleanup was just not respecting that.

Impact

Before: bf16 GGUF is silently deleted when exported alongside other quantized types
After: bf16 GGUF is preserved when explicitly requested; intermediate-only files are still cleaned up as before

Test plan

Export with mixed quant types: model.save_pretrained_gguf("./output", tokenizer, quantization_method=["q4_k_m", "q6_k", "q8_0", "bf16"]) — verify BF16.gguf is present in output
Export with only quant types (no bf16): model.save_pretrained_gguf("./output", tokenizer, quantization_method=["q4_k_m", "q8_0"]) — verify intermediate bf16 GGUF IS cleaned up
Export with bf16 only: model.save_pretrained_gguf("./output", tokenizer, quantization_method=["bf16"]) — verify BF16.gguf is present (existing behavior unchanged)

When bf16 is both the intermediate conversion format and a requested output, the cleanup step was unconditionally deleting the base GGUF file. Guard the deletion so the intermediate is only removed when it is not part of the user's requested quantization methods. Closes unslothai#4932

gemini-code-assist

Code Review

This pull request modifies unsloth/save.py to prevent the deletion of the base GGUF file if it's not part of the quantization method. However, this change introduces a regression in the file ordering for Vision-Language Models (VLMs), causing the mmproj file to be incorrectly positioned after list reversal, which could break downstream logic. A suggestion is provided to correctly reorder the files for VLMs when the base model is kept.

When bf16 is a requested output alongside other quantized types for VLMs, the base GGUF (text model) stays in the list. The subsequent reverse() would place text_model at the end instead of mmproj, breaking downstream code that expects mmproj as the last element. Swap base_gguf past mmproj before reversal so mmproj ends up at the correct position after the list is flipped. Credit: Gemini Code Assist review on PR unslothai#4937

Shivamjohri247 · 2026-04-09T18:01:43Z

Thanks for the catch! You're right — for VLMs where initial_files = [text_model, mmproj], keeping the base GGUF and then calling reverse() would place text_model at the end instead of mmproj, breaking downstream code at line 2146 that expects mmproj as the last element.

Fixed in the latest push by swapping base_gguf past mmproj before reversal when it's a VLM (len(initial_files) > 1). Verified all 4 scenarios:

Scenario	VLM	bf16 requested	Result
Non-VLM, no bf16	No	No	Base deleted, reversed quantized files
Non-VLM, with bf16	No	Yes	bf16 + quantized, correct order
VLM, no bf16	Yes	No	Base deleted, mmproj at end after reverse
VLM, with bf16	Yes	Yes	mmproj swapped to front before reverse, ends at end

rolandtannous · 2026-04-13T13:30:48Z

Thanks for the fix! I'm going to push a couple of commits to address a few items.

The previous len(initial_files) > 1 guard would incorrectly fire for sharded text-only models (convert_to_gguf extends shard files into initial_files when max_shard_size splits a large model), reordering the shard list as if it were a VLM. is_vlm is already maintained from convert_to_gguf's return value and is the semantically correct flag for "has mmproj that must stay last".

When convert_to_gguf shards the text model output, initial_files has more than two entries ([text_shard1, text_shard2, ..., mmproj]). The previous insert(1, pop(0)) only repositioned one text part, leaving mmproj mid-list after the trailing reverse() and breaking the "mmproj is last" invariant consumed at save.py:2145 where all_file_locations[-1] is used for --mmproj. Rebuild the list explicitly so that after reverse() the final order is [quants_desc..., text_parts_in_order..., mmproj], correct for any number of text shards including the common single-file case.

rolandtannous · 2026-04-13T18:48:46Z

Nice catch on the root cause. The preservation logic is correct for the common case, but two edges gave me pause so I pushed follow-ups (feel free to revert if you disagree):

316af41a — use is_vlm instead of len(initial_files) > 1

convert_to_gguf extends initial_files with shard files when max_shard_size splits a large non-VLM model. The length check would then fire the VLM-only reorder for a sharded text-only model and scramble the shard list. is_vlm (already set from convert_to_gguf's return) is the semantic flag we actually want.

c9666917 — handle sharded VLMs in the reorder

For a sharded VLM, initial_files = [text_shard1, text_shard2, …, mmproj]. insert(1, pop(0)) only moves one text element, leaving mmproj mid-list after reverse() — which breaks the all_file_locations[-1] mmproj assumption at save.py:2145. Rebuilt the list explicitly so any number of text shards is handled.

Shivamjohri247 · 2026-04-13T18:56:03Z

Thanks for the thorough review and the follow-up commits! Both fixes make total sense:

Using is_vlm is clearly the right call over the length check — I didn't account for sharded non-VLM models.
The explicit list rebuild for sharded VLMs is the correct approach too, pop(0)/insert(1) was only moving one shard.

Appreciate you catching these edge cases. Happy with the changes, let's merge whenever you're ready.

The delete branch of the GGUF cleanup only removed initial_files[0] from the returned list and from disk. For sharded outputs (large text models split by max_shard_size), the remaining shards were left: - on disk as orphans - in the returned list, pushing mmproj out of its [-1] position Exposed by sharded VLM exports (gemma-3-27b-it, 2 text shards + mmproj) with quantization_method=["q4_k_m"]: shard 1 was deleted, shard 2 survived as a stray, and all_file_locations[-1] pointed at shard 2 instead of mmproj when constructing the --mmproj CLI hint. Iterate over initial_files[:n_text] so all text parts are cleaned up; mmproj is excluded by the slice and remains at its mmproj position.

rolandtannous · 2026-04-14T10:16:38Z

Follow-up from E2E testing on unsloth/gemma-3-27b-it (sharded VLM, 2 text shards + mmproj). Exposed a pre-existing bug in the cleanup delete branch that the fix 2 work adjacent to it did not address.

6bcb02f5 — delete all text intermediates, not just base_gguf

When first_conversion is not in quantization_method, the old code did:

all_saved_locations.remove(base_gguf)
Path(base_gguf).unlink(missing_ok = True)

base_gguf = initial_files[0]. For a sharded output, that's only shard 1 — shards 2..N are left orphaned on disk AND still in the returned list, which breaks the all_file_locations[-1] == mmproj invariant consumed at save.py:2145.

Observed with quantization_method=["q4_k_m"] on gemma-3-27b-it:

initial_files = [BF16-00001-of-00002, BF16-00002-of-00002, BF16-mmproj]
Returned list: [Q4_K_M, BF16-mmproj, BF16-00002-of-00002] ← shard 2 stray, mmproj not at [-1]
On disk: shard 1 gone, shard 2 (3.9G) still there as garbage

The fix iterates initial_files[:n_text] so every text part is both removed from the list and unlinked from disk. n_text excludes mmproj for VLMs so the existing mmproj preservation is untouched. Backwards compatible with the non-sharded single-file case (n_text=1, loop runs once, same behavior as before).

This is pre-existing — it was broken before this PR opened. Scoping it in because the PR is already the cleanup-logic PR and the gemma sharded test made it loud.

Shivamjohri247 requested review from danielhanchen and rolandtannous as code owners April 9, 2026 17:27

gemini-code-assist Bot reviewed Apr 9, 2026

View reviewed changes

Comment thread unsloth/save.py

Merge branch 'main' into fix/bf16-gguf-cleanup-deletion

62d34c0

rolandtannous added the testing label Apr 10, 2026

rolandtannous self-assigned this Apr 10, 2026

rolandtannous and others added 3 commits April 13, 2026 17:45

Merge branch 'main' into fix/bf16-gguf-cleanup-deletion

fa03d9b

rolandtannous and others added 2 commits April 14, 2026 13:10

Merge branch 'main' into fix/bf16-gguf-cleanup-deletion

1140549

This was referenced Apr 15, 2026

Fix bf16 GGUF deleted by cleanup when exported alongside quantized types #5028

Closed

fix: preserve bf16 GGUF file when explicitly requested in quantization list #4991

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix bf16 GGUF deleted by cleanup when exported alongside quantized types#4937

Fix bf16 GGUF deleted by cleanup when exported alongside quantized types#4937
Shivamjohri247 wants to merge 8 commits intounslothai:mainfrom
Shivamjohri247:fix/bf16-gguf-cleanup-deletion

Shivamjohri247 commented Apr 9, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Uh oh!

Shivamjohri247 commented Apr 9, 2026

Uh oh!

rolandtannous commented Apr 13, 2026

Uh oh!

rolandtannous commented Apr 13, 2026

Uh oh!

Shivamjohri247 commented Apr 13, 2026

Uh oh!

rolandtannous commented Apr 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

Shivamjohri247 commented Apr 9, 2026

Summary

Root Cause

Fix

Impact

Test plan

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Shivamjohri247 commented Apr 9, 2026

Uh oh!

rolandtannous commented Apr 13, 2026

Uh oh!

rolandtannous commented Apr 13, 2026

Uh oh!

Shivamjohri247 commented Apr 13, 2026

Uh oh!

rolandtannous commented Apr 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants