Local LLM Testing & Benchmarking for Apple Silicon | Community Leaderboard
Anubis is a native macOS app for benchmarking, comparing, and managing local large language models using any OpenAI-compatible endpoint — Ollama, MLX, LM Studio Server, OpenWebUI, Docker Models, etc. Built with SwiftUI for Apple Silicon, it provides real-time hardware telemetry correlated with full, history-saved inference performance — something no CLI tool or chat wrapper offers. Export benchmarks directly without having to screenshot, and export the raw data as .MD or .CSV from the history. You can even OLLAMA PULL models directly within the app.
Push your Apple Silicon to its limits and observe power draw, thermal throttling, and frequency scaling under controlled load — all from within the Monitor.
- CPU stress — spawns
yesprocesses per core. Choose All Cores, P-Cores only, E-Cores only, or Single Core - GPU stress — Metal compute shader renders a Mandelbrot fractal zoom in a separate window. Randomized zoom targets and color palettes on each run. Four intensity levels (Low / Medium / High / Extreme) control iterations, supersampling, and passes per frame
- Memory bandwidth stress — allocates memory then continuously streams through it with
memcpyto saturate the memory bus. Reports measured bandwidth in GB/s, directly comparable to your chip's theoretical max. Three pressure levels (Light 25% / Moderate 50% / Heavy 75% of free memory) - Safety mechanisms — 5-minute auto-timeout, thermal watchdog (auto-stop at critical), GPU auto-downgrade if FPS drops below 5, cleanup on view disappear and app quit
A compact, frameless, always-on-top overlay showing live system metrics — launchable from any tab via the sidebar or from the Monitor's Float button.
- Dark glass material, draggable, visible on all Spaces
- Live CPU %, GPU %, memory, power, GPU frequency, and thermal state
- Hides the main window when launched from Monitor (detach mode) or stays alongside when launched from the sidebar
Five new built-in prompts covering causal reasoning, system design, dialogue writing, historical analysis, and constrained writing — bringing the total to 15 across five categories.
The local LLM ecosystem on macOS is fragmented:
- Chat wrappers (Ollama, LM Studio, Jan) focus on conversation, not systematic testing
- Performance monitors (asitop, macmon, mactop) are CLI-only and lack LLM context
- Evaluation frameworks (promptfoo) require YAML configs and terminal expertise
- No tool correlates hardware metrics (GPU / CPU / ANE / power / memory) with inference speed in real time
Anubis fills that gap — all in a native macOS app.
The dataset is robust and open source — check it out here, please contribute!
Real-time performance dashboard for single-model testing.
- Select any model from any configured backend
- Stream responses with live metrics overlay
- 8 metric cards: Tokens/sec, GPU %, CPU %, Time to First Token, Process Memory, Model Memory, Thermal State, GPU Frequency
- 7 live charts: Tokens/sec, GPU utilization, CPU utilization, process memory, GPU/CPU/ANE/DRAM power, GPU frequency — all updating in real time
- Power telemetry: Real-time GPU, CPU, ANE, and DRAM power consumption in watts via IOReport
- Process monitoring: Auto-detects backend process by port (Ollama, LM Studio, mlx-lm, vLLM, etc.) with manual process picker
- Detailed session stats: average tok/s (total tokens / decode time), peak tok/s (highest instantaneous rate), TTFT, model load time, context length, eval duration, power averages
- Configurable parameters: temperature, top-p, max tokens, system prompt
- 15 prompt presets organized by category (Reasoning, Coding, Creative, Knowledge, Instruction)
- Session history with full replay, CSV export, and Markdown reports
- 3-column expanded dashboard: Full-screen metrics view showing all charts without scrolling — system info, utilization, cores, power, and frequency at a glance
- Image export: Copy to clipboard, save as PNG, or share — 2x retina rendering with watermark, respects light/dark mode
- Smart URL handling: Auto-strips
/v1suffix from backend URLs to prevent double-pathing errors
Side-by-side A/B model comparison with the same prompt.
- Dual model selectors with independent backend selection
- Sequential mode (memory-safe, one at a time) or Parallel mode (both simultaneously)
- Shared prompt, system prompt, and generation parameters
- Real-time streaming in both panels
- Voting system: pick Model A, Model B, or Tie — votes are persisted
- Per-panel stats grid (9 metrics each)
- Model manager: view loaded models and unload to free memory
- Comparison history with voting records
Standalone real-time hardware monitoring dashboard — no benchmark required.
- One-click start: Begin recording CPU, GPU, memory, power, and thermal metrics
- 3-column live dashboard: All charts visible at once — CPU/GPU utilization, memory, per-core grids, power breakdown, GPU frequency
- Stress testing: CPU, GPU (Mandelbrot), and memory bandwidth stress tests with adjustable intensity
- Floating HUD: Detach a compact always-on-top metrics overlay while you work
- Accumulating charts: Data builds up over time with automatic downsampling for long sessions
- System info card: Live readouts for CPU %, GPU %, memory, power draw, and thermal state
- No persistence: Data lives in memory only — nothing is saved when the monitor is closed
Upload your benchmark results to the community leaderboard and see how your Mac stacks up against other Apple Silicon machines.
- One-click upload from the benchmark toolbar after a completed run
- Community rankings sorted by tokens/sec with full drill-down into performance, power, and hardware details
- Model quantization & format tracking — every submission records the quantization level (Q4_K_M, FP16, 4-bit, etc.) and model format (GGUF vs MLX) so you can compare apples to apples
- Filter by chip, model, quantization, or format to compare like-for-like
- Data Explorer — interactive pivot table and charting powered by FINOS Perspective
- Privacy-first: no accounts, no response text uploaded — just metrics and a display name
- HMAC-signed submissions with server-side rate limiting
Unified model management across all backends.
- Aggregated model list with search and backend filter chips
- Running models section with live VRAM usage
- Model inspector: size, parameters, quantization, format (GGUF/MLX), family, context window, architecture details, file path
- Automatic metadata enrichment for OpenAI-compatible models — parses model IDs for family and parameter count, scans
~/.lmstudio/models/and~/.cache/huggingface/hub/for disk size, quantization, and path - Pull new models, delete existing ones, unload from memory
- Popular model suggestions for quick setup
- Total disk usage display
Anubis checks for updates automatically via Sparkle and notifies you when a new version is available.
- Automatic checks on launch with user-controlled frequency
- Manual check via the app menu (Anubis OSS > Check for Updates...) or Settings > About
- Updates are code-signed, notarized, and verified with EdDSA before installation
Settings (add connections with quick presets)

Vault — View model details, unload, and Pull models directly for Ollama

| Backend | Type | Default Port | Setup |
|---|---|---|---|
| Ollama | Native support | 11434 | Install from ollama.com — auto-detected on launch |
| LM Studio | OpenAI-compatible | 1234 | Enable local server in LM Studio settings |
| mlx-lm | OpenAI-compatible | 8080 | pip install mlx-lm && mlx_lm.server --model <model> |
| vLLM | OpenAI-compatible | 8000 | Add in Settings |
| LocalAI | OpenAI-compatible | 8080 | Add in Settings |
| Docker ModelRunner | OpenAI-compatible | user selected | Add in Settings |
Any OpenAI-compatible server can be added through Settings > Add OpenAI-Compatible Server with a name, URL, and optional API key.
Anubis captures Apple Silicon telemetry during inference via IOReport and system APIs:
| Metric | Source | Description |
|---|---|---|
| GPU Utilization | IOReport | GPU active residency percentage |
| CPU Utilization | host_processor_info |
Usage across all cores |
| GPU Power | IOReport Energy Model | GPU power consumption in watts |
| CPU Power | IOReport Energy Model | CPU (E-cores + P-cores) power in watts |
| ANE Power | IOReport Energy Model | Neural Engine power consumption |
| DRAM Power | IOReport Energy Model | Memory subsystem power |
| GPU Frequency | IOReport GPU Stats | Weighted average from P-state residency |
| Process Memory | proc_pid_rusage |
Backend process phys_footprint (includes Metal/GPU allocations) |
| Thermal State | ProcessInfo.thermalState |
System thermal pressure level |
Anubis automatically detects which process is serving your model:
- Port-based detection: Uses
lsofto find the PID listening on the inference port (called once per benchmark start) - Backend identification: Matches process path and command-line args to identify Ollama, LM Studio, mlx-lm, vLLM, LocalAI, llama.cpp
- Memory accounting: Uses
phys_footprint(same as Activity Monitor) which includes Metal/GPU buffer allocations — critical for MLX and other GPU-accelerated backends - LM Studio support: Walks Electron app bundle descendants to find the model-serving process
- Manual override: Process picker lets you select any process by name, sorted by memory usage
Metrics degrade gracefully — if IOReport access is unavailable (e.g., in a VM), Anubis still shows inference-derived metrics.
- macOS 15.0 (Sequoia) or later
- Apple Silicon (M1 / M2 / M3 / M4 / M5 +) — Intel is not supported
- 8 GB unified memory minimum (16 GB+ recommended for larger models)
- At least one inference backend installed (Ollama recommended)
# macOS — install Ollama
brew install ollama
# Start the server
ollama serve
# Pull a model
ollama pull llama3.2:3bgit clone https://github.com/uncSoft/anubis-oss.git
cd anubis-oss/anubis
open anubis.xcodeprojIn Xcode:
- Set your development team in Signing & Capabilities
- Build and run (
Cmd+R)
Anubis will auto-detect Ollama on launch. Other backends can be added in Settings.
- Select a model from the dropdown
- Type a prompt or pick one from Presets
- Click Run
- Watch the metrics light up in real time
After a benchmark completes, click the Upload button in the benchmark toolbar to submit your results to the community leaderboard. Enter a display name and your run will appear in the rankings — no account required. Only performance metrics and hardware info are submitted; response text is never uploaded.
# Clone
git clone https://github.com/uncSoft/anubis-oss.git
cd anubis-oss/anubis
# Build via command line
xcodebuild -scheme anubis-oss -configuration Debug build
# Run tests
xcodebuild -scheme anubis-oss -configuration Debug test
# Or just open in Xcode
open anubis.xcodeprojResolved automatically by Swift Package Manager on first build:
| Package | Purpose | License |
|---|---|---|
| GRDB.swift | SQLite database | MIT |
| Sparkle | Auto-update framework | MIT |
| Swift Charts | Data visualization | Apple |
Anubis follows MVVM with a layered service architecture:
┌─────────────────────────────────────────────────────────────┐
│ PRESENTATION LAYER │
│ BenchmarkView ArenaView MonitorView VaultView Settings │
├─────────────────────────────────────────────────────────────┤
│ SERVICE LAYER │
│ MetricsService InferenceService ModelService Export │
├─────────────────────────────────────────────────────────────┤
│ INTEGRATION LAYER │
│ OllamaClient OpenAICompatibleClient IOReportBridge ProcessMonitor │
├─────────────────────────────────────────────────────────────┤
│ PERSISTENCE LAYER │
│ SQLite (GRDB) File System │
└─────────────────────────────────────────────────────────────┘
Views display data and delegate to ViewModels. ViewModels coordinate Services. Services are stateless and use async/await. Integrations are thin adapters wrapping external systems (Ollama API, IOReport, etc.).
anubis/
├── App/ # Entry point, app state, navigation
├── Features/
│ ├── Benchmark/ # Performance dashboard
│ ├── Arena/ # A/B model comparison
│ ├── Monitor/ # System monitor, stress tests, floating HUD
│ ├── Vault/ # Model management
│ └── Settings/ # Backend config, about, help, contact
├── Services/ # MetricsService, InferenceService, ExportService
├── Integrations/ # OllamaClient, OpenAICompatibleClient, IOReportBridge, ProcessMonitor
├── Models/ # Data models (BenchmarkSession, ModelInfo, etc.)
├── Database/ # GRDB setup & migrations
├── DesignSystem/ # Theme, colors, reusable components
├── Demo/ # Demo mode for App Store review
└── Utilities/ # Formatters, constants, logger, benchmark prompts
All inference backends implement a shared protocol, making it straightforward to add new ones:
protocol InferenceBackend {
var id: String { get }
var displayName: String { get }
var isAvailable: Bool { get async }
func listModels() async throws -> [ModelInfo]
func generate(prompt: String, parameters: GenerationParameters)
-> AsyncThrowingStream<InferenceChunk, Error>
}All data is stored locally — nothing leaves your machine.
| Data | Location |
|---|---|
| Database | ~/Library/Application Support/Anubis/anubis.db |
| Exports | Generated on demand (CSV, Markdown) |
| Preferences | UserDefaults |
# Make sure Ollama is running
ollama serve
# Verify it's accessible
curl http://localhost:11434/api/tags- GPU metrics require IOReport access via IOKit
- Some configurations or VMs may not expose these APIs
- Anubis will still show inference-derived metrics (tokens/sec, TTFT, etc.)
- Use Sequential mode in Arena to run one model at a time
- Unload unused models via Arena > Models > Unload All
- Choose smaller quantized models (Q4_K_M over Q8_0)
- Click Refresh Models in Settings
- Ensure the model is pulled:
ollama pull <model-name> - For OpenAI-compatible backends, verify the server is running and the URL is correct
Contributions are welcome. A few guidelines:
- Follow the existing patterns — MVVM, async/await, guard-let over force-unwrap
- Keep files under 300 lines — split if larger
- One feature per PR — small, focused changes are easier to review
- Test services and integrations — views are harder to unit test, but services should have coverage
- Handle errors gracefully — always provide
errorDescriptionandrecoverySuggestion
- Create a new file in
Integrations/implementingInferenceBackend - Register it in
InferenceService - Add configuration UI in
Settings/ - That's it — the rest of the app works through the protocol
If Anubis is useful to you, consider buying me a coffee on Ko-fi or sponsoring on GitHub. It helps fund continued development and new features.
A sandboxed, less feature rich version is also available on the Mac App Store if you prefer a managed install.
GPL-3.0 License — see LICENSE for details.


