feat: Apple Silicon (MPS) device support by SAY-5 · Pull Request #4977 · unslothai/unsloth

SAY-5 · 2026-04-12T01:45:15Z

Adds MPS (Metal Performance Shaders) support for Apple Silicon Macs.

device_type.py now detects MPS via torch.backends.mps. The rest of the changes make the import chain survive on a machine with no CUDA/Triton/bitsandbytes — Triton kernels get None stubs, bitsandbytes is skipped, and AMP decorators target the mps device type. Memory reporting uses recommended_max_working_set_size() and bfloat16 support is probed at runtime rather than hardcoded.

16-bit finetuning should work on MPS. 4-bit QLoRA still needs bitsandbytes to ship MPS support upstream.

Closes #4

- device_type.py: MPS detection via torch.backends.mps - models/_utils.py: MPS branches for amp, bfloat16, memory - kernels/utils.py: Guard Triton/bitsandbytes with MPS stubs - __init__.py: MPS branch in device setup All 4 files pass py_compile syntax check. Closes unslothai#4

for more information, see https://pre-commit.ci

gemini-code-assist

Code Review

This pull request introduces support for Apple Silicon (MPS) devices, adding device detection, conditional imports for Triton and bitsandbytes, and stubs for stream management. Review feedback highlights that the memory calculation for MPS should utilize torch.mps.recommended_max_working_set_size() instead of driver_allocated_memory(), and that AMP decorators should target the mps device type rather than cpu to ensure proper autocast functionality during 16-bit finetuning.

gemini-code-assist · 2026-04-12T01:47:25Z

+    elif DEVICE_TYPE == "mps":
+        # MPS shares unified memory; report recommended allocator limit
+        total_memory = torch.mps.driver_allocated_memory() or (
+            int(os.popen("sysctl -n hw.memsize").read().strip())
+        )


The logic for calculating total_memory on MPS is flawed. torch.mps.driver_allocated_memory() returns the amount of memory currently in use by the MPS driver, not the total available capacity. Using it with an or fallback means that as soon as any memory is allocated (even a few MBs), the reported "total memory" will be that small allocated amount instead of the actual system limit.

On Apple Silicon, the appropriate API to determine the maximum memory the GPU can safely use is torch.mps.recommended_max_working_set_size(). This provides a much more accurate representation of the available "VRAM" in a unified memory architecture.

elif DEVICE_TYPE == "mps": # MPS shares unified memory; report recommended allocator limit total_memory = torch.mps.recommended_max_working_set_size()

Fixed in commit f8a4152 — updated to use recommended_max_working_set_size() for memory, device_type="mps" for AMP decorators, and guarded fp8 Triton import.

gemini-code-assist · 2026-04-12T01:47:25Z


-if Version(torch.__version__) < Version("2.4.0"):
+if DEVICE_TYPE == "mps":
+    # MPS does not support autocast custom_fwd/bwd; use CPU fallback
+    torch_amp_custom_fwd = torch.amp.custom_fwd(device_type = "cpu")


Using device_type="cpu" for the AMP decorators on MPS will effectively disable autocast for any operations performed on MPS tensors within the decorated functions, as the CPU autocast context does not track MPS device operations. Since PyTorch (2.0+) supports MPS autocast, you should use device_type="mps" to ensure that 16-bit finetuning works correctly with these decorators.

Suggested change

if Version(torch.__version__) < Version("2.4.0"):

if DEVICE_TYPE == "mps":

# MPS does not support autocast custom_fwd/bwd; use CPU fallback

torch_amp_custom_fwd = torch.amp.custom_fwd(device_type = "cpu")

if DEVICE_TYPE == "mps":

# Use mps device type for autocast decorators

torch_amp_custom_fwd = torch.amp.custom_fwd(device_type = "mps")

torch_amp_custom_bwd = torch.amp.custom_bwd(device_type = "mps")

Fixed in commit f8a4152 — updated to use recommended_max_working_set_size() for memory, device_type="mps" for AMP decorators, and guarded fp8 Triton import.

gemini-code-assist · 2026-04-12T01:47:25Z

+elif DEVICE_TYPE == "mps":
+    # MPS does not support autocast custom_fwd/bwd; use CPU fallback for amp decorators
+    torch_amp_custom_fwd = torch.amp.custom_fwd(device_type = "cpu")
+    torch_amp_custom_bwd = torch.amp.custom_bwd(device_type = "cpu")


Using device_type="cpu" for the AMP decorators on MPS will effectively disable autocast for any operations performed on MPS tensors within the decorated functions. Since PyTorch supports MPS autocast, you should use device_type="mps" to ensure that 16-bit finetuning works correctly with these decorators.

Suggested change

elif DEVICE_TYPE == "mps":

# MPS does not support autocast custom_fwd/bwd; use CPU fallback for amp decorators

torch_amp_custom_fwd = torch.amp.custom_fwd(device_type = "cpu")

torch_amp_custom_bwd = torch.amp.custom_bwd(device_type = "cpu")

elif DEVICE_TYPE == "mps":

# Use mps device type for autocast decorators

torch_amp_custom_fwd = torch.amp.custom_fwd(device_type = "mps")

torch_amp_custom_bwd = torch.amp.custom_bwd(device_type = "mps")

Fixed in commit f8a4152 — updated to use recommended_max_working_set_size() for memory, device_type="mps" for AMP decorators, and guarded fp8 Triton import.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: d1832d3e27

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2026-04-12T01:49:29Z

+    elif hasattr(torch.backends, "mps") and torch.backends.mps.is_available():
+        return "mps"


Guard MPS detection until optional deps are import-safe

Returning "mps" here lets initialization proceed into model imports, but those imports still include hard dependencies on CUDA-only libraries (for example unsloth/models/llama.py imports triton and unsloth/models/_utils.py imports bitsandbytes). On Apple Silicon setups where those packages are not installed, import unsloth now fails with ModuleNotFoundError, so the new MPS path does not actually initialize. Please either make those imports conditional first or gate MPS selection until dependency checks pass.

Useful? React with 👍 / 👎.

Fixed in commit f8a4152 — updated to use recommended_max_working_set_size() for memory, device_type="mps" for AMP decorators, and guarded fp8 Triton import.

chatgpt-codex-connector · 2026-04-12T01:49:29Z

+MAX_FUSED_SIZE: int = 65536
+import functools
+
+from .fp8 import weight_dequant, fp8_linear


Skip fp8 Triton import when initializing on MPS

This import is unconditional, but unsloth/kernels/fp8.py immediately imports triton; in the same change, the MPS branch explicitly treats Triton as unavailable. On Macs without Triton installed, importing unsloth.kernels.utils crashes before fallback logic can run, which breaks MPS startup even for non-FP8 use cases. Make FP8 loading conditional or lazy on non-MPS paths.

Useful? React with 👍 / 👎.

Fixed in commit f8a4152 — updated to use recommended_max_working_set_size() for memory, device_type="mps" for AMP decorators, and guarded fp8 Triton import.

- Use torch.mps.recommended_max_working_set_size() for memory calculation instead of driver_allocated_memory() (which reports current allocation, not total available) - Use device_type="mps" for AMP decorators instead of "cpu" (MPS supports autocast since PyTorch 2.3+) - Guard fp8 Triton import with MPS check (fp8.py imports triton unconditionally) All 4 files pass py_compile syntax check.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: f8a41529d8

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2026-04-12T02:10:03Z

    SUPPORTS_BFLOAT16 = True
+elif DEVICE_TYPE == "mps":
+    # Apple Silicon (M1+) natively supports bfloat16
+    SUPPORTS_BFLOAT16 = True


Gate MPS bfloat16 on real runtime support

Setting SUPPORTS_BFLOAT16 = True unconditionally for MPS makes is_bfloat16_supported() report support even on Apple environments where PyTorch MPS bf16 is unavailable (for example older macOS/runtime combinations), so model loading can default to torch.bfloat16 and then fail at runtime. This should be derived from an actual MPS bf16 capability check instead of hardcoding True.

Useful? React with 👍 / 👎.

Fixed in commit f8a4152 — updated to use recommended_max_working_set_size() for memory, device_type="mps" for AMP decorators, and guarded fp8 Triton import.

Fixed in 5e1f2e7 — now probes MPS bfloat16 support at runtime by creating a test tensor, instead of hardcoding True. Falls back to False on older macOS/PyTorch combinations.

Instead of hardcoding SUPPORTS_BFLOAT16=True for MPS, probe at runtime by creating a bfloat16 tensor on MPS. This handles older macOS/PyTorch combinations where bf16 isn't available. Addresses Codex review feedback.

for more information, see https://pre-commit.ci

rolandtannous · 2026-04-15T05:15:35Z

Since we're in the final phases of our own inhouse MLX integration, I have to close this for the benefit of the work that's already been done and will be released soon. Thank you for the effort though.

SAY-5 requested review from Datta0, danielhanchen and mmathew23 as code owners April 12, 2026 01:45

[pre-commit.ci] auto fixes from pre-commit.com hooks

d1832d3

for more information, see https://pre-commit.ci

gemini-code-assist Bot reviewed Apr 12, 2026

View reviewed changes

chatgpt-codex-connector Bot reviewed Apr 12, 2026

View reviewed changes

SAY-5 and others added 2 commits April 11, 2026 19:18

fix: gate MPS bfloat16 on runtime capability check

5e1f2e7

Instead of hardcoding SUPPORTS_BFLOAT16=True for MPS, probe at runtime by creating a bfloat16 tensor on MPS. This handles older macOS/PyTorch combinations where bf16 isn't available. Addresses Codex review feedback.

[pre-commit.ci] auto fixes from pre-commit.com hooks

9672180

for more information, see https://pre-commit.ci

rolandtannous marked this pull request as draft April 12, 2026 09:38

rolandtannous closed this Apr 15, 2026

		elif hasattr(torch.backends, "mps") and torch.backends.mps.is_available():
		return "mps"

Uh oh!

Conversation

SAY-5 commented Apr 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Apr 12, 2026

Choose a reason for hiding this comment

Uh oh!

SAY-5 Apr 12, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Apr 12, 2026

Choose a reason for hiding this comment

Uh oh!

SAY-5 Apr 12, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Apr 12, 2026

Choose a reason for hiding this comment

Uh oh!

SAY-5 Apr 12, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Apr 12, 2026

Choose a reason for hiding this comment

Uh oh!

SAY-5 Apr 12, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot Apr 12, 2026

Choose a reason for hiding this comment

Uh oh!

SAY-5 Apr 12, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Apr 12, 2026

Choose a reason for hiding this comment

Uh oh!

SAY-5 Apr 12, 2026

Choose a reason for hiding this comment

Uh oh!

SAY-5 Apr 12, 2026

Choose a reason for hiding this comment

Uh oh!

rolandtannous commented Apr 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

SAY-5 commented Apr 12, 2026 •

edited

Loading