High-Performance Random Forest with Metal GPU Acceleration for Apple Silicon
randomForestMPS is a blazing-fast Random Forest implementation for R that leverages Apple's Metal Performance Shaders (MPS) to deliver 20-100x speedup in prediction performance on Apple Silicon (M1/M2/M3) Macs.
- 🎮 GPU Acceleration: Utilizes Apple Metal Performance Shaders for massive parallelization
- ⚡ 20-100x Speedup: Achieves up to 5.8 million predictions/second
- 🧠 Persistent GPU Model: Trees stay loaded on GPU between predictions
- 🔄 Drop-in Replacement: Compatible API with randomForest and ranger
- 🎯 Optimized Kernels: Multiple specialized kernels for different scenarios
- 📊 Automatic Batching: Queues and processes predictions efficiently
- 🔒 Thread-Safe: Supports parallel prediction calls
- 🍎 Apple Silicon Native: Optimized for M1/M2/M3 architecture
| Dataset Size | scikit-learn | ranger | randomForestMPS | Speedup |
|---|---|---|---|---|
| 1,000 samples | 14,463 | 222,097 | 898,619 | 4.0x vs ranger |
| 10,000 samples | 142,977 | 108,805 | 4,140,478 | 29.0x vs sklearn |
| 50,000 samples | 391,502 | 98,833 | 5,521,727 | 14.1x vs sklearn |
| 100,000 samples | 495,263 | 95,270 | 5,125,631 | 10.3x vs sklearn |
Results: predictions per second on Apple M-series with 100 trees
| Implementation | Relative Speed |
|---|---|
| randomForestMPS | 2-4x faster than randomForest |
| ranger | Similar to randomForestMPS |
| randomForest | Baseline |
- Real-time Inference: Sub-millisecond predictions for live applications
- Large-scale Batch Processing: Process millions of samples efficiently
- Interactive Data Science: Fast experimentation and model tuning
- Production ML Pipelines: High-throughput prediction services
- Edge Deployment: Efficient inference on Apple Silicon devices
- macOS: 11.0 (Big Sur) or later
- Hardware: Apple Silicon (M1/M2/M3)
- R: Version 4.0.0 or later
- Xcode: Command Line Tools (for Metal support)
# Clone the repository
git clone https://github.com/yourusername/randomForestMPS.git
cd randomForestMPS
# Build the package
R CMD build .
# Install
R CMD INSTALL randomForestMPS_0.1.0.tar.gz# Install devtools if needed
install.packages("devtools")
# Install randomForestMPS
devtools::install("randomForestMPS")library(randomForestMPS)
# Check Metal availability
metalAvailable() # Should return TRUE on Apple Silicon
# Quick test
data(iris)
model <- randomForestMPS(as.matrix(iris[,1:4]),
as.integer(iris$Species),
n_trees = 10)
predictions <- predict(model, as.matrix(iris[,1:4]))
print(predictions)library(randomForestMPS)
# Prepare data
data(iris)
x <- as.matrix(iris[, 1:4])
y <- as.integer(iris$Species)
# Train model with GPU acceleration
model <- randomForestMPS(x, y,
n_trees = 100,
max_depth = 10,
use_mps = TRUE, # Enable GPU
persistent_gpu = TRUE) # Keep model on GPU
# Make predictions (extremely fast!)
predictions <- predict(model, x)
# Check accuracy
accuracy <- mean(predictions == y)
print(paste("Accuracy:", round(accuracy, 4)))# GPU memory management
preloadToGPU(model) # Explicitly load to GPU
isOnGPU(model) # Check GPU status
releaseFromGPU(model) # Free GPU memory
# Predict probabilities
probs <- predict(model, x, type = "prob")
head(probs)
# Feature importance
importance <- model$importances
barplot(importance, main = "Feature Importance")randomForestMPS(x, y, n_trees = 100, max_depth = 10,
min_samples_split = 2, min_samples_leaf = 1,
mtry = NULL, bootstrap = TRUE, n_jobs = -1,
random_state = 0, use_mps = TRUE,
persistent_gpu = TRUE, batch_size = 10000)Parameters:
| Parameter | Description | Default |
|---|---|---|
x |
Feature matrix (numeric) | Required |
y |
Target vector (integer/factor) | Required |
n_trees |
Number of trees in forest | 100 |
max_depth |
Maximum tree depth | 10 |
min_samples_split |
Min samples to split node | 2 |
min_samples_leaf |
Min samples in leaf | 1 |
mtry |
Features per split (NULL = sqrt) | NULL |
bootstrap |
Use bootstrap sampling | TRUE |
n_jobs |
Parallel jobs for training (-1 = all) | -1 |
random_state |
Random seed | 0 |
use_mps |
Enable Metal GPU acceleration | TRUE |
persistent_gpu |
Keep model on GPU | TRUE |
batch_size |
GPU batch size | 10000 |
predict(object, newdata, type = "class")Parameters:
object: Trained randomForestMPS modelnewdata: New data for prediction (matrix)type: "class" (default) or "prob" for probabilities
Returns: Vector of predictions (or probability matrix)
# Check Metal availability
metalAvailable()
# Preload model to GPU (for repeated predictions)
preloadToGPU(model)
# Check if model is on GPU
isOnGPU(model)
# Release model from GPU (free memory)
releaseFromGPU(model)For Maximum Speed (Default):
model <- randomForestMPS(x, y,
n_trees = 100,
use_mps = TRUE,
persistent_gpu = TRUE,
batch_size = 10000)For CPU-Only (Fallback):
model <- randomForestMPS(x, y,
n_trees = 100,
use_mps = FALSE) # Disable GPUFor Large Datasets (>100K samples):
model <- randomForestMPS(x, y,
n_trees = 100,
use_mps = TRUE,
batch_size = 50000) # Larger batches# Run comprehensive benchmark
./run_benchmark.sh
# Or manually with uv
uv pip install scikit-learn numpy pandas
uv run benchmark_sklearn.pyThis will generate:
benchmark_sklearn_results.json- Machine-readable results- Console output with comparison tables
library(randomForestMPS)
# Generate large dataset
set.seed(42)
n <- 100000
x <- matrix(rnorm(n * 50), ncol = 50)
y <- sample(1:3, n, replace = TRUE)
# Benchmark
start <- Sys.time()
model <- randomForestMPS(x, y, n_trees = 100, persistent_gpu = TRUE)
train_time <- difftime(Sys.time(), start, units = "secs")
start <- Sys.time()
preds <- predict(model, x)
predict_time <- difftime(Sys.time(), start, units = "secs")
cat(sprintf("Training: %.2f sec\n", train_time))
cat(sprintf("Prediction: %.2f sec (%.0f pred/sec)\n",
predict_time, n / as.numeric(predict_time)))R Interface
↓
Rcpp Bridge (C++17, thread-safe)
↓
Random Forest Engine
├── CPU Training (multi-threaded)
└── GPU Prediction (persistent model)
↓
PersistentGPUPredictor (C++/Objective-C++)
├── Metal Device Management
├── Buffer Pooling
├── Kernel Selection
└── Performance Stats
↓
Metal Compute Shaders (GPU)
├── predictRandomForestOptimized
├── predictSmallForest
├── predictBinaryForest
└── predictTreeChunk
randomForestMPS/
├── R/ # R interface
│ ├── random_forest.R # Main functions
│ └── zzz.R # Package initialization
├── src/ # C++ source
│ ├── tree.cpp/h # Decision trees
│ ├── forest.cpp/h # Random forest
│ ├── persistent_gpu_predictor.mm # GPU predictor (Objective-C++)
│ ├── metal_bridge.mm # Metal bridge
│ ├── rcpp_interface.cpp # Rcpp bindings
│ └── shaders/ # Metal shaders
│ ├── predict.metal
│ └── predict_optimized.metal
├── inst/include/ # Header files
│ ├── tree.h
│ ├── forest.h
│ └── persistent_gpu_predictor.h
├── tests/ # Unit tests
├── benchmarks/ # Benchmark scripts
│ ├── benchmark_sklearn.py
│ └── run_benchmark.sh
├── DESCRIPTION # Package metadata
├── NAMESPACE # R exports
├── LICENSE # MIT License
└── README.md # This file
metalAvailable() # Returns FALSESolutions:
- Ensure you're on Apple Silicon (not Intel Mac)
- Check macOS version:
sw_vers(need 11.0+) - Install Xcode Command Line Tools:
xcode-select --install
Error: GPU out of memory
Solutions:
- Reduce batch_size:
batch_size = 5000 - Use CPU fallback:
use_mps = FALSE - Process in chunks
- Release model:
releaseFromGPU(model)
# Clean and rebuild
R CMD INSTALL --preclean randomForestMPSWe welcome contributions! Please see our Contributing Guide for details.
# Clone repo
git clone https://github.com/yourusername/randomForestMPS.git
cd randomForestMPS
# Build and test
R CMD build .
R CMD check randomForestMPS_0.1.0.tar.gz
# Run tests
Rscript -e "devtools::test()"
# Run benchmarks
./run_benchmark.sh- API Docs: See
man/randomForestMPSin R - Benchmarks: See
BENCHMARK_COMPREHENSIVE_RESULTS.md - Examples: See
examples/directory
If you use randomForestMPS in your research, please cite:
@software{randomforestmps2024,
title = {randomForestMPS: High-Performance Random Forest with Metal GPU Acceleration},
author = {Your Name},
year = {2024},
url = {https://github.com/yourusername/randomForestMPS}
}This project is licensed under the MIT License - see the LICENSE file for details.
- Inspired by randomForest and ranger
- Built with Rcpp and Metal
- Optimized for Apple Silicon
- Issues: GitHub Issues
- Discussions: GitHub Discussions
- Email: [email protected]
Made with ❤️ for the R and Apple Silicon community
Transform your Random Forest workflows with GPU acceleration! 🚀