Welcome to Squish AI. We build high-performance tools designed to make local AI development as fast and efficient as possible, specifically optimized for Apple Silicon and modern developer workflows.
Compress local LLMs once, run them forever at sub-second load times. Squish is a drop-in replacement for OpenAI + Ollama workflows on Apple Silicon. It focuses on eliminating the "cold start" problem of local LLMs.
- Performance: 54× faster cold starts compared to standard local runners.
- Accuracy: Statistically identical accuracy to original models.
- Compatibility: Seamless integration for existing Python-based AI applications.
The transformer is a brilliant hack scaled past its limits. DREX is what comes next — tiered memory, sparse execution, and a learned controller that knows what to remember.
Most local AI tools are bulky and slow to initialize. Squish AI projects are built with a "Speed First" mentality:
- Sub-second Load Times: No more waiting for weights to move to VRAM.
- Apple Silicon Native: Deeply optimized for Metal and unified memory architectures.
- Developer Experience: Simple APIs that don't require rewriting your entire codebase.
To get started with our flagship compression tool:
pip install squishaiThen, replace your existing client:
from squish import SquishClient
# Statistically identical to OpenAI/Ollama API
client = SquishClient()
response = client.chat.completions.create(
model="llama3-8b-squish",
messages=[{"role": "user", "content": "Squeeze this data!"}]
)- Issues: Please use the respective repository issue trackers for bugs and feature requests.
- Contributions: We welcome PRs! Please check the
CONTRIBUTING.mdin each repo. - Location: United States 🇺🇸
Built for speed. Optimized for accuracy. Squeezed for performance.