randomForestMPS

High-Performance Random Forest with Metal GPU Acceleration for Apple Silicon

randomForestMPS is a blazing-fast Random Forest implementation for R that leverages Apple's Metal Performance Shaders (MPS) to deliver 20-100x speedup in prediction performance on Apple Silicon (M1/M2/M3) Macs.

🚀 Key Features

🎮 GPU Acceleration: Utilizes Apple Metal Performance Shaders for massive parallelization
⚡ 20-100x Speedup: Achieves up to 5.8 million predictions/second
🧠 Persistent GPU Model: Trees stay loaded on GPU between predictions
🔄 Drop-in Replacement: Compatible API with randomForest and ranger
🎯 Optimized Kernels: Multiple specialized kernels for different scenarios
📊 Automatic Batching: Queues and processes predictions efficiently
🔒 Thread-Safe: Supports parallel prediction calls
🍎 Apple Silicon Native: Optimized for M1/M2/M3 architecture

📊 Performance Benchmarks

Prediction Throughput Comparison

Dataset Size	scikit-learn	ranger	randomForestMPS	Speedup
1,000 samples	14,463	222,097	898,619	4.0x vs ranger
10,000 samples	142,977	108,805	4,140,478	29.0x vs sklearn
50,000 samples	391,502	98,833	5,521,727	14.1x vs sklearn
100,000 samples	495,263	95,270	5,125,631	10.3x vs sklearn

Results: predictions per second on Apple M-series with 100 trees

Training Speed Comparison

Implementation	Relative Speed
randomForestMPS	2-4x faster than randomForest
ranger	Similar to randomForestMPS
randomForest	Baseline

🎯 Use Cases

Real-time Inference: Sub-millisecond predictions for live applications
Large-scale Batch Processing: Process millions of samples efficiently
Interactive Data Science: Fast experimentation and model tuning
Production ML Pipelines: High-throughput prediction services
Edge Deployment: Efficient inference on Apple Silicon devices

📦 Installation

Requirements

macOS: 11.0 (Big Sur) or later
Hardware: Apple Silicon (M1/M2/M3)
R: Version 4.0.0 or later
Xcode: Command Line Tools (for Metal support)

Install from Source

# Clone the repository
git clone https://github.com/yourusername/randomForestMPS.git
cd randomForestMPS

# Build the package
R CMD build .

# Install
R CMD INSTALL randomForestMPS_0.1.0.tar.gz

Install with devtools

# Install devtools if needed
install.packages("devtools")

# Install randomForestMPS
devtools::install("randomForestMPS")

Verify Installation

library(randomForestMPS)

# Check Metal availability
metalAvailable()  # Should return TRUE on Apple Silicon

# Quick test
data(iris)
model <- randomForestMPS(as.matrix(iris[,1:4]), 
                         as.integer(iris$Species),
                         n_trees = 10)
predictions <- predict(model, as.matrix(iris[,1:4]))
print(predictions)

💡 Quick Start

Basic Usage

library(randomForestMPS)

# Prepare data
data(iris)
x <- as.matrix(iris[, 1:4])
y <- as.integer(iris$Species)

# Train model with GPU acceleration
model <- randomForestMPS(x, y, 
                         n_trees = 100,
                         max_depth = 10,
                         use_mps = TRUE,           # Enable GPU
                         persistent_gpu = TRUE)     # Keep model on GPU

# Make predictions (extremely fast!)
predictions <- predict(model, x)

# Check accuracy
accuracy <- mean(predictions == y)
print(paste("Accuracy:", round(accuracy, 4)))

Advanced Usage

# GPU memory management
preloadToGPU(model)      # Explicitly load to GPU
isOnGPU(model)           # Check GPU status
releaseFromGPU(model)    # Free GPU memory

# Predict probabilities
probs <- predict(model, x, type = "prob")
head(probs)

# Feature importance
importance <- model$importances
barplot(importance, main = "Feature Importance")

📖 API Reference

Main Function: `randomForestMPS()`

randomForestMPS(x, y, n_trees = 100, max_depth = 10,
                min_samples_split = 2, min_samples_leaf = 1,
                mtry = NULL, bootstrap = TRUE, n_jobs = -1,
                random_state = 0, use_mps = TRUE,
                persistent_gpu = TRUE, batch_size = 10000)

Parameters:

Parameter	Description	Default
`x`	Feature matrix (numeric)	Required
`y`	Target vector (integer/factor)	Required
`n_trees`	Number of trees in forest	100
`max_depth`	Maximum tree depth	10
`min_samples_split`	Min samples to split node	2
`min_samples_leaf`	Min samples in leaf	1
`mtry`	Features per split (NULL = sqrt)	NULL
`bootstrap`	Use bootstrap sampling	TRUE
`n_jobs`	Parallel jobs for training (-1 = all)	-1
`random_state`	Random seed	0
`use_mps`	Enable Metal GPU acceleration	TRUE
`persistent_gpu`	Keep model on GPU	TRUE
`batch_size`	GPU batch size	10000

Prediction: `predict()`

predict(object, newdata, type = "class")

Parameters:

object: Trained randomForestMPS model
newdata: New data for prediction (matrix)
type: "class" (default) or "prob" for probabilities

Returns: Vector of predictions (or probability matrix)

GPU Management Functions

# Check Metal availability
metalAvailable()

# Preload model to GPU (for repeated predictions)
preloadToGPU(model)

# Check if model is on GPU
isOnGPU(model)

# Release model from GPU (free memory)
releaseFromGPU(model)

🔧 Configuration

Optimize for Your Use Case

For Maximum Speed (Default):

model <- randomForestMPS(x, y, 
                         n_trees = 100,
                         use_mps = TRUE,
                         persistent_gpu = TRUE,
                         batch_size = 10000)

For CPU-Only (Fallback):

model <- randomForestMPS(x, y, 
                         n_trees = 100,
                         use_mps = FALSE)  # Disable GPU

For Large Datasets (>100K samples):

model <- randomForestMPS(x, y, 
                         n_trees = 100,
                         use_mps = TRUE,
                         batch_size = 50000)  # Larger batches

🧪 Running Benchmarks

Compare with scikit-learn and ranger

# Run comprehensive benchmark
./run_benchmark.sh

# Or manually with uv
uv pip install scikit-learn numpy pandas
uv run benchmark_sklearn.py

This will generate:

benchmark_sklearn_results.json - Machine-readable results
Console output with comparison tables

Quick R Benchmark

library(randomForestMPS)

# Generate large dataset
set.seed(42)
n <- 100000
x <- matrix(rnorm(n * 50), ncol = 50)
y <- sample(1:3, n, replace = TRUE)

# Benchmark
start <- Sys.time()
model <- randomForestMPS(x, y, n_trees = 100, persistent_gpu = TRUE)
train_time <- difftime(Sys.time(), start, units = "secs")

start <- Sys.time()
preds <- predict(model, x)
predict_time <- difftime(Sys.time(), start, units = "secs")

cat(sprintf("Training: %.2f sec\n", train_time))
cat(sprintf("Prediction: %.2f sec (%.0f pred/sec)\n", 
            predict_time, n / as.numeric(predict_time)))

🏗️ Architecture

R Interface
    ↓
Rcpp Bridge (C++17, thread-safe)
    ↓
Random Forest Engine
    ├── CPU Training (multi-threaded)
    └── GPU Prediction (persistent model)
        ↓
PersistentGPUPredictor (C++/Objective-C++)
    ├── Metal Device Management
    ├── Buffer Pooling
    ├── Kernel Selection
    └── Performance Stats
        ↓
Metal Compute Shaders (GPU)
    ├── predictRandomForestOptimized
    ├── predictSmallForest
    ├── predictBinaryForest
    └── predictTreeChunk

📁 Project Structure

randomForestMPS/
├── R/                          # R interface
│   ├── random_forest.R        # Main functions
│   └── zzz.R                  # Package initialization
├── src/                        # C++ source
│   ├── tree.cpp/h             # Decision trees
│   ├── forest.cpp/h           # Random forest
│   ├── persistent_gpu_predictor.mm  # GPU predictor (Objective-C++)
│   ├── metal_bridge.mm        # Metal bridge
│   ├── rcpp_interface.cpp     # Rcpp bindings
│   └── shaders/               # Metal shaders
│       ├── predict.metal
│       └── predict_optimized.metal
├── inst/include/              # Header files
│   ├── tree.h
│   ├── forest.h
│   └── persistent_gpu_predictor.h
├── tests/                     # Unit tests
├── benchmarks/                # Benchmark scripts
│   ├── benchmark_sklearn.py
│   └── run_benchmark.sh
├── DESCRIPTION                # Package metadata
├── NAMESPACE                  # R exports
├── LICENSE                    # MIT License
└── README.md                  # This file

🐛 Troubleshooting

Metal Not Available

metalAvailable()  # Returns FALSE

Solutions:

Ensure you're on Apple Silicon (not Intel Mac)
Check macOS version: sw_vers (need 11.0+)
Install Xcode Command Line Tools: xcode-select --install

GPU Memory Error

Error: GPU out of memory

Solutions:

Reduce batch_size: batch_size = 5000
Use CPU fallback: use_mps = FALSE
Process in chunks
Release model: releaseFromGPU(model)

Compilation Errors

# Clean and rebuild
R CMD INSTALL --preclean randomForestMPS

🤝 Contributing

We welcome contributions! Please see our Contributing Guide for details.

Development Setup

# Clone repo
git clone https://github.com/yourusername/randomForestMPS.git
cd randomForestMPS

# Build and test
R CMD build .
R CMD check randomForestMPS_0.1.0.tar.gz

# Run tests
Rscript -e "devtools::test()"

# Run benchmarks
./run_benchmark.sh

📚 Documentation

API Docs: See man/randomForestMPS in R
Benchmarks: See BENCHMARK_COMPREHENSIVE_RESULTS.md
Examples: See examples/ directory

📝 Citation

If you use randomForestMPS in your research, please cite:

@software{randomforestmps2024,
  title = {randomForestMPS: High-Performance Random Forest with Metal GPU Acceleration},
  author = {Your Name},
  year = {2024},
  url = {https://github.com/yourusername/randomForestMPS}
}

📜 License

This project is licensed under the MIT License - see the LICENSE file for details.

🙏 Acknowledgments

Inspired by randomForest and ranger
Built with Rcpp and Metal
Optimized for Apple Silicon

📧 Contact

Issues: GitHub Issues
Discussions: GitHub Discussions
Email: [email protected]

Made with ❤️ for the R and Apple Silicon community

Transform your Random Forest workflows with GPU acceleration! 🚀

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
examples		examples
randomForestMPS		randomForestMPS
.gitignore		.gitignore
BENCHMARK_COMPREHENSIVE_RESULTS.md		BENCHMARK_COMPREHENSIVE_RESULTS.md
BENCHMARK_RESULTS.md		BENCHMARK_RESULTS.md
BENCHMARK_SKLEARN_SUMMARY.md		BENCHMARK_SKLEARN_SUMMARY.md
CONTRIBUTING.md		CONTRIBUTING.md
DOCUMENTATION_INDEX.md		DOCUMENTATION_INDEX.md
GPU_BENCHMARK_REPORT.md		GPU_BENCHMARK_REPORT.md
IMPLEMENTATION_SUMMARY.md		IMPLEMENTATION_SUMMARY.md
NEWS.md		NEWS.md
OPTIMIZATION_RESULTS.md		OPTIMIZATION_RESULTS.md
README.md		README.md
benchmark_gpu_results.txt		benchmark_gpu_results.txt
benchmark_large_output.txt		benchmark_large_output.txt
benchmark_output.txt		benchmark_output.txt
benchmark_results.txt		benchmark_results.txt
benchmark_rf.R		benchmark_rf.R
benchmark_sklearn.py		benchmark_sklearn.py
benchmark_sklearn_output.txt		benchmark_sklearn_output.txt
benchmark_sklearn_results.json		benchmark_sklearn_results.json
quick_benchmark.R		quick_benchmark.R
quick_benchmark_results.txt		quick_benchmark_results.txt
randomForestMPS_0.1.0.tar.gz		randomForestMPS_0.1.0.tar.gz
requirements-benchmark.txt		requirements-benchmark.txt
run_benchmark.sh		run_benchmark.sh

Folders and files

Latest commit

History

Repository files navigation

randomForestMPS

🚀 Key Features

📊 Performance Benchmarks

Prediction Throughput Comparison

Training Speed Comparison

🎯 Use Cases

📦 Installation

Requirements

Install from Source

Install with devtools

Verify Installation

💡 Quick Start

Basic Usage

Advanced Usage

📖 API Reference

Main Function: randomForestMPS()

Prediction: predict()

GPU Management Functions

🔧 Configuration

Optimize for Your Use Case

🧪 Running Benchmarks

Compare with scikit-learn and ranger

Quick R Benchmark

🏗️ Architecture

📁 Project Structure

🐛 Troubleshooting

Metal Not Available

GPU Memory Error

Compilation Errors

🤝 Contributing

Development Setup

📚 Documentation

📝 Citation

📜 License

🙏 Acknowledgments

📧 Contact

About

Resources

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Main Function: `randomForestMPS()`

Prediction: `predict()`

Packages