Universal LLM orchestrator for running a “planner + workers + synthesis” flow across multiple providers (Anthropic, OpenAI, Ollama, llama.cpp). It chooses between single‑shot or parallel execution, aggregates costs, and stores session logs locally.
- Provider‑agnostic: mix cloud and local models.
- Cost tracking per run with a breakdown.
- Session history saved to
~/.llm-use/sessions. - Works fully offline with Ollama.
- Optional real web scraping + caching.
- Optional MCP server (via PolyMCP).
- TUI chat mode with live logs.
- Python 3.10+
- Optional provider SDKs:
anthropic,openai requests(for Ollama HTTP calls)- Ollama installed and running for local models
- Optional:
beautifulsoup4for scraping - Optional:
polymcp+uvicornfor MCP server
pip install requests
# Optional: cloud providers
pip install anthropic openai
# Optional: Ollama
curl -fsSL https://ollama.com/install.sh | sh
# Optional: scraping
pip install beautifulsoup4
# Optional: MCP server
pip install polymcp uvicorn
# Optional: Playwright (dynamic scraping)
pip install playwright
playwright install
# Install as a package (editable)
pip install -e .ollama pull llama3.1:70b
ollama pull llama3.1:8b
python3 cli.py exec \
--orchestrator ollama:llama3.1:70b \
--worker ollama:llama3.1:8b \
--task "Research AI from 5 sources"export ANTHROPIC_API_KEY="sk-ant-..."
ollama pull llama3.1:8b
python3 cli.py exec \
--orchestrator anthropic:claude-3-7-sonnet-20250219 \
--worker ollama:llama3.1:8b \
--task "Compare 10 products"python3 cli.py chat \
--orchestrator anthropic:claude-3-7-sonnet-20250219 \
--worker ollama:llama3.1:8bpython3 cli.py mcp \
--orchestrator anthropic:claude-3-7-sonnet-20250219 \
--worker ollama:llama3.1:8b \
--host 127.0.0.1 \
--port 8000python3 cli.py install --allpython3 cli.py exec \
--orchestrator <provider>:<model> \
--worker <provider>:<model> \
--task "your task"python3 cli.py exec \
--router ollama:llama3.1:8b \
--orchestrator anthropic:claude-3-7-sonnet-20250219 \
--worker openai:gpt-4o-mini \
--task "Explain TCP in 5 bullets"python3 cli.py exec \
--router-path /path/to/your/router/model \
--llama-cpp-url http://localhost:8080 \
--orchestrator anthropic:claude-3-7-sonnet-20250219 \
--worker openai:gpt-4o-mini \
--task "Explain TCP in 5 bullets"If the router model fails or is unavailable, it falls back to a heuristic router.
By default the heuristic uses only length + URL signals. You can add your own patterns in router_rules.json (or set LLM_USE_ROUTER_RULES to a custom path).
The router also learns from past tasks by storing (task, mode) pairs and using cosine similarity on token vectors. This is local, cheap, and improves routing over time. Clear the cache to reset (~/.llm-use/cache.sqlite).
python3 cli.py exec \
--orchestrator anthropic:claude-3-7-sonnet-20250219 \
--worker anthropic:claude-3-5-haiku-20241022 \
--max-workers 8 \
--task "Summarize 20 documents"python3 cli.py exec \
--orchestrator openai:gpt-4o \
--worker openai:gpt-4o-mini \
--no-cache \
--task "Draft a brief memo"python3 cli.py exec \
--orchestrator openai:gpt-4o \
--worker openai:gpt-4o-mini \
--enable-scrape \
--task "Find 3 sources about X and summarize them"python3 cli.py exec \
--orchestrator openai:gpt-4o \
--worker openai:gpt-4o-mini \
--enable-scrape \
--scrape-backend playwright \
--task "Find 3 sources about X and summarize them"python3 cli.py statspython3 cli.py router-resetpython3 cli.py router-export --out router_examples.json
python3 cli.py router-import --in router_examples.jsonThe export includes created timestamp and optional confidence if available.
pip install -e .
llm-use exec --orchestrator ollama:llama3.1:70b --worker ollama:llama3.1:8b --task "Hello"These examples show how to use the orchestrator as the “brain” that delegates work to cheaper or local workers.
python3 cli.py exec \
--orchestrator anthropic:claude-3-7-sonnet-20250219 \
--worker openai:gpt-4o-mini \
--task "Collect 8 reliable sources on X and produce a pros/cons summary"python3 cli.py exec \
--orchestrator openai:gpt-4o \
--worker openai:gpt-4o-mini \
--max-workers 6 \
--task "Analyze 6 documents and return an executive brief with risks and opportunities"python3 cli.py exec \
--orchestrator ollama:qwen2.5:72b \
--worker ollama:mistral:7b \
--task "Extract requirements from internal notes and produce a checklist"python3 cli.py exec \
--orchestrator anthropic:claude-3-7-sonnet-20250219 \
--worker ollama:llama3.1:8b \
--task "Generate 20 ideas, then pick the top 5 with brief rationale"- Define the expected output format in the task (bullets, table, JSON).
- Avoid vague tasks: ask for decomposition and synthesis with clear criteria.
- Use cheaper workers for data gathering and a stronger orchestrator for synthesis.
- Set
--max-workersbased on rate limits and the number of subtasks. - For sensitive data, prefer Ollama or isolated environments.
If your agent works on structured inputs, it helps to include the content directly in the prompt.
python3 cli.py exec \
--orchestrator openai:gpt-4o \
--worker openai:gpt-4o-mini \
--task "Summarize in 5 bullets the content of this file:\n\n$(cat notes.txt)"python3 cli.py exec \
--orchestrator anthropic:claude-3-7-sonnet-20250219 \
--worker anthropic:claude-3-5-haiku-20241022 \
--task "Analyze the CSV below, describe the schema and 3 insights:\n\n$(cat data.csv)"python3 cli.py exec \
--orchestrator ollama:llama3.1:70b \
--worker ollama:llama3.1:8b \
--task "Extract requirements in JSON with keys: title, priority, rationale:\n\n$(cat requirements.md)"The following model names are recognized out of the box. You can also pass custom models with provider:model.
claude-3-5-haiku-20241022claude-3-7-sonnet-20250219claude-4-opus-20250514
gpt-4o-minigpt-4oo1
llama3.1:70bllama3.1:8bqwen2.5:72bmistral:7b
Use llama_cpp:<model> with a llama.cpp server that exposes /v1/chat/completions.
from llm_use import Orchestrator, ModelConfig
orch = Orchestrator(
orchestrator=ModelConfig(name="llama3.1:70b", provider="ollama"),
worker=ModelConfig(name="llama3.1:8b", provider="ollama")
)
result = orch.execute("Your task")
print(f"Cost: ${result['cost']:.6f}")
print(result["output"])Costs are estimated using provider list prices per million tokens and token counts returned by the SDKs. For Ollama, cost is zero by default. Token usage for Ollama is estimated from word counts.
ollama serve
ollama listexport ANTHROPIC_API_KEY="sk-ant-..."
export OPENAI_API_KEY="sk-..."pip install pytest
pytestMIT
