Unix Reimagined | toast
Tools that save you time, are simple, composable, and fun.
For humans and AI alike.
Join on Mac or Linux
curl -sSL linuxtoaster.com/install | sh
Click to copy
$20/yr PayGo — includes $20 in inference credits. Top off any time. BYOK and local inference is FREE. We collect anonymized usage data (model, token count) — never prompts.
BYOK support: OpenAI · Anthropic · Google · Mistral · Groq · Cerebras · Perplexity · xAI · OpenRouter · Together
Local support: Ollama · MLX · LM Studio · KoboldCpp · llama.cpp · vLLM · LocalAI · Jan
toast — AI in your terminal
With toast you get "sed with a brain" — pipe text in, get Unix knowledge out.
Understand anything
Legacy code. Config files. Cryptic logs. Get explanations.
Get the command you need
Describe what you want in plain English. Get the exact command.
Diagnose your system
Not sure which tab is burning the CPU? Ask.
PID 75517 — Safari WebContent, 45.3% CPU. Kill it: kill 75517
Mass updates
toast reads files, writes patches, and works with any format.
Terminal Chat
When you need a back-and-forth. Pull files into context with @.
> @models.py explain this
sure, the file contains...
> what does function...
Toast on Telegram
Talk to toast from your phone. Link your account, then message the bot.
Power Users
Simple for beginners. Deep for experts. The toaster grows with you.
Custom Personas
Drop a .persona file in any project. Toast picks it up automatically, zero config. Test by chatting with the .persona
Pipe chains
Compose like Unix. Chain multiple transforms.
Project context
Drop a .crumbs file. AI knows your stack.
Edit a book
Iterative refinement. Each pass reads, learns, decides, refines. Gradient descent for prose.
Edit a book until done.
Let the AI decide when it's done. Loops until the command signals completion. Add a cap for safety.
@file injection
In chat mode, pull files into context on the fly. Multi-file supported.
Any model
One interface, many providers. Compare models without changing your workflow.
Build a bot on our Mac in one line
Email, iMessage. One line. Your AI, your rules.
Local Inference
Start toasted and toast will use it for local inference. You can also use Ollama, MLX, LM Studio, KoboldCpp, llama.cpp, vLLM, LocalAI, as inference provider. Full privacy, no internet required.
Usage stats
Token counts and latency per provider. Tracked locally via mmap, zero overhead.
jam — The AI shell that doesn't fight you
No quoting nightmares. No expansion. No $ surprises. What you type is what you get. Type something that isn't a command, and the AI answers.
# Strings just work. No escaping.
🍞 echo "The price is $100"
The price is $100
# Environment variables. Explicit words, not sigils.
🍞 set API_KEY sk-abc123
🍞 get API_KEY
sk-abc123
# Built-in RPN for math. No bc, no expr.
🍞 100 2 / 3 *
150
# Not a command? AI answers instead of "command not found".
🍞 what processes are using port 8080
lsof -i :8080
Loops in plain English
Gradient descent for documents. Bounded or unbounded. The AI can decide when it's done. Add a cap for safety.
AI helps use the terminal
Builtin → RPN → PATH → AI. Type eixt and the AI tells you it's exit. The shell understands intent, not just syntax.
Did you mean: exit
Per-project context
.persona, .crumbs, .history walk up from cwd. Different folder means different project, different history, different AI behavior. Zero config.
# per project directory
AIgents see each other on the network
UDP multicast. Every jam instance on the subnet hears it. No broker. No server. No configuration. This is the nervous system.
# Send a message to every machine on the network
🍞 send status deploying
# Listen for a specific key — blocks until match
🍞 listen status
web3:status deploying
# An AI agent that monitors and summarizes the network
🍞 while listen | toast "summarize this event"
# Wait for 3 nodes to report ready
🍞 3 times listen ready
Three linuxtoaster boxes running jam are three islands — unless they can talk to each other. send and listen turn them into a fleet. No etcd. No consul. No Kubernetes. Just multicast.
Four scopes. Each is a word. Each composes with pipes.
ito — Version control for AI, and humans
Git records the what and attaches a why. ito flips it — record intent, derive diffs. 15 commands instead of 150. Single C file, ~1,100 lines. No staging area. No detached HEAD. No .gitignore. Built for AI and humans alike.
# One command to save. Intent is the source of truth.
$ ito log "added rate limiting to prevent abuse"
# Interactive history navigator — arrow keys, side-by-side diffs.
$ ito changes
# Search by intent, not by grepping diffs.
$ ito search "auth" | toast "summarize the approach"
# Sync via rsync. No GitHub required.
$ ito sync user@server:/repos/project
Intent-first snapshots
Every snapshot is a moment — why you changed, not just what changed. Six months later, ito search finds the reasoning, not just fix stuff.
Opt-in tracking
No .gitignore. A .ito/track file lists what to snapshot. Everything else is invisible. No 10 GB of build artifacts in version control.
*.c *.h Makefile *.md
15 commands, not 150
ito log instead of git add -A && git commit -m. ito on experiment instead of git checkout -b. ito undo instead of figuring out reset vs revert.
ito undo
ito merge alice
AI-native search
Search the why-layer directly. Agents ask "find everything related to auth performance" and get real answers. Composes with toast.
Whole-file merge
Designed for real AI projects with clear ownership. Side-by-side diff: pick ours, theirs, or make your own.
Single C file
~1,100 lines. No dependencies beyond rsync and diff. Content-addressed, immutable objects. Sync is safe by construction — rsync can only add, never corrupt.
toasted — A local brain for your laptop
A from-scratch inference daemon for Apple Silicon. ~1,800 lines of C++, no Python. A 30B-parameter model running at ~100 tok/s generation, ~400 tok/s prompt reading. Zero cost per token. Zero data exposure. 128 GB RAM supports 8-bit, 6-bit, and 4-bit quantization. 64 GB supports 4-bit.
# Start the daemon. Model loads once, stays hot in GPU memory.
$ toasted start
# toast auto-detects local inference. Same interface as cloud.
$ toast "explain quicksort"
# Pipe chains and chat work locally.
$ cat auth.py | Security "audit this"
$ git diff | Reviewer
~100 tok/s generation
Mixture-of-experts routes through 8 of 512 experts per token — the knowledge of all 512 at the cost of 8. Speeds typically associated with a 7B dense model, from a 30B-parameter brain.
~400 tok/s prompt reading
Chunked batch prefill processes context in 32-token chunks. 17K tokens prefills in ~44 seconds instead of 7 minutes. 56× faster than our first implementation.
Session cache — 0.6s to first word
Only the last message is new. toasted hashes prior conversation, restores cached state, prefills just the delta. A 125× improvement in time-to-first-token.
Written in C++, not Python
Built against Apple's MLX C++ API with a hand-tuned Metal kernel for DeltaNet. No Python startup, no fragile environments. The model is a single file.
True privacy
Air-gapped environments, regulated industries, security-conscious teams. Your code never leaves the machine. No API keys. No internet required.
Zero marginal cost
The daemon loads the model once into unified memory. Metal shaders stay compiled. Cache stays warm. Every subsequent request is free — just electricity.
Requires Apple Silicon Mac. 128 GB unified memory supports 8-bit, 6-bit, and 4-bit quantization. 64 GB supports 4-bit. When toasted is running, toast automatically defaults to local inference. Cloud models still available with -p provider.
Pricing
The future of software depends on getting the balance between deterministic and non-deterministic right. Traditional Unix is deterministic — predictable, composable, reliable. AI is non-deterministic — creative, adaptive, surprising. The hard part is making them work together. That's what we're building: Unix reimagined for both AI and people. Your membership funds that work.
$20/yr PayGo to start. $49/mo Member for the full stack. Found Partner for teams.
PayGo
- toast
- $20 in AI credits included — top off any time
- Anonymized usage data only (model, token count — never prompts)
- Custom personas via .persona files
- BYOK & local models free
- All updates
- Community Support
Member
- Everything in PayGo
- toasted local inference daemon
- Qwen3-Next-Coder on M4 at ~100 tok/s — zero cost per token
- Every growing list of Unix re-imagined tools: jam, ito..
- UDP networking for agents
- Priority Support
Founding Partner
- Everything in Member
- Fund the rewrite of Unix.
- You tell us what is missing. We implement. You get the credit.
- Priority & dedicated support
- Consulting, seminars & FDE options
FAQ
How does it work?
Lightweight toast talks to local toastd, which keeps an HTTP/2 connection pool to linuxtoaster.com. Written in C to minimize latency. With BYOK, toastd connects directly to your provider—your traffic never touches our servers.
What's BYOK?
Got a PROVIDER_API_KEY set for Anthropic, Cerebras, Google Gemini, Groq, OpenAI, OpenRouter, Together, Mistral, Perplexity, and/or xAI? Use toast -p provider. Zero config.
What is a Founding Partner?
Companies funding the rewrite of Unix. Your team gets a software license, priority support, and consulting options. You're funding tools that make software simpler for all LinuxToaster users. Talk to us.
Can I run it fully offline?
Yes. Use any local backend—Ollama, MLX, LM Studio, KoboldCpp, llama.cpp, vLLM, LocalAI, or Jan. No internet, no API keys, full privacy.
What's jam?
A shell rebuilt for AI. No quoting, no expansion, no $ syntax. Strings just work. Unrecognized input goes to the AI. Includes set/get for env vars, while/times for loops, RPN math, and a UDP multicast basket for multi-machine coordination.
What's toasted?
A from-scratch local inference daemon for Apple Silicon. Written in C++ against Apple's MLX API. Loads a 30B-parameter model once, serves requests via Unix socket. ~100 tok/s generation, ~400 tok/s prefill, 0.6s time-to-first-token with session caching. 128 GB supports 8/6/4-bit quantization, 64 GB supports 4-bit.
Where's my data stored?
Locally. Context in .crumbs, conversations in .chat. Version them, grep them, delete them. Your machine, your files.
macOS? Windows?
macOS and Linux today. Windows WSL works.
What about consulting?
Consulting is available for teams that want hands-on help with deployment, integration, or training. Enterprise accounts have a Forward Deployed Engineering option.
How does billing work?
$20/yr gets you a membership and $20 in AI credits — top off anytime. AI inference is charged based on use. BYOK or local inference carries no cost. We collect anonymized usage data (which model, token count) but never your prompts. You may choose to pay for consulting. You may choose to pay the monthly cost of a FDE.