Skip to content
View druide67's full-sized avatar

Block or report druide67

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
druide67/README.md

Same model. Same Mac. 30 vs 71 tok/s. That's why I built asiai.

🦞 I'm Jean-Marc (druide67) — I build tools for local LLM inference on Apple Silicon.

asiai — Benchmark, monitor & compare 6 inference engines (Ollama, LM Studio, mlx-lm, llama.cpp, vllm-mlx, Exo). One CLI. Real numbers.

Built because my AI agents needed to monitor their own inference. So I gave them asiai's API. They started monitoring themselves.

Bench your claw!

Recent discoveries

  • MLX is 2.3x faster than llama.cpp for MoE architectures on Apple Silicon
  • DeltaNet KV cache stays flat from 64k to 256k context (same VRAM!)
  • Same model, same Mac: 30 tok/s on one engine, 71 tok/s on another

OpenClaw contributor — multi-agent AI assistant.

Strasbourg, France | asiai.dev | @jmn67 on X | LinkedIn

Pinned Loading

  1. asiai asiai Public

    Multi-engine LLM benchmark & monitoring CLI for Apple Silicon

    Python 2