Skip to content

llm-use/llm-use

Repository files navigation

llm-use_Logo

License Stars Python

Universal LLM orchestrator for running a “planner + workers + synthesis” flow across multiple providers (Anthropic, OpenAI, Ollama, llama.cpp). It chooses between single‑shot or parallel execution, aggregates costs, and stores session logs locally.

Highlights

  • Provider‑agnostic: mix cloud and local models.
  • Cost tracking per run with a breakdown.
  • Session history saved to ~/.llm-use/sessions.
  • Works fully offline with Ollama.
  • Optional real web scraping + caching.
  • Optional MCP server (via PolyMCP).
  • TUI chat mode with live logs.

Requirements

  • Python 3.10+
  • Optional provider SDKs: anthropic, openai
  • requests (for Ollama HTTP calls)
  • Ollama installed and running for local models
  • Optional: beautifulsoup4 for scraping
  • Optional: polymcp + uvicorn for MCP server

Installation

pip install requests

# Optional: cloud providers
pip install anthropic openai

# Optional: Ollama
curl -fsSL https://ollama.com/install.sh | sh

# Optional: scraping
pip install beautifulsoup4

# Optional: MCP server
pip install polymcp uvicorn

# Optional: Playwright (dynamic scraping)
pip install playwright
playwright install

# Install as a package (editable)
pip install -e .

Quick Start (Local Only)

ollama pull llama3.1:70b
ollama pull llama3.1:8b

python3 cli.py exec \
  --orchestrator ollama:llama3.1:70b \
  --worker ollama:llama3.1:8b \
  --task "Research AI from 5 sources"

Quick Start (Hybrid)

export ANTHROPIC_API_KEY="sk-ant-..."
ollama pull llama3.1:8b

python3 cli.py exec \
  --orchestrator anthropic:claude-3-7-sonnet-20250219 \
  --worker ollama:llama3.1:8b \
  --task "Compare 10 products"

TUI Chat

python3 cli.py chat \
  --orchestrator anthropic:claude-3-7-sonnet-20250219 \
  --worker ollama:llama3.1:8b

MCP Server (PolyMCP)

python3 cli.py mcp \
  --orchestrator anthropic:claude-3-7-sonnet-20250219 \
  --worker ollama:llama3.1:8b \
  --host 127.0.0.1 \
  --port 8000

Install Extras (Helper)

python3 cli.py install --all

Usage

Basic

python3 cli.py exec \
  --orchestrator <provider>:<model> \
  --worker <provider>:<model> \
  --task "your task"

Router (Cheap Model to Skip Orchestration)

python3 cli.py exec \
  --router ollama:llama3.1:8b \
  --orchestrator anthropic:claude-3-7-sonnet-20250219 \
  --worker openai:gpt-4o-mini \
  --task "Explain TCP in 5 bullets"

Router via llama.cpp Local Path

python3 cli.py exec \
  --router-path /path/to/your/router/model \
  --llama-cpp-url http://localhost:8080 \
  --orchestrator anthropic:claude-3-7-sonnet-20250219 \
  --worker openai:gpt-4o-mini \
  --task "Explain TCP in 5 bullets"

If the router model fails or is unavailable, it falls back to a heuristic router.

Heuristic Router Rules (No Hardcoded Keywords)

By default the heuristic uses only length + URL signals. You can add your own patterns in router_rules.json (or set LLM_USE_ROUTER_RULES to a custom path).

Learned Router (Lightweight ML)

The router also learns from past tasks by storing (task, mode) pairs and using cosine similarity on token vectors. This is local, cheap, and improves routing over time. Clear the cache to reset (~/.llm-use/cache.sqlite).

Parallel Worker Control

python3 cli.py exec \
  --orchestrator anthropic:claude-3-7-sonnet-20250219 \
  --worker anthropic:claude-3-5-haiku-20241022 \
  --max-workers 8 \
  --task "Summarize 20 documents"

Disable Cache

python3 cli.py exec \
  --orchestrator openai:gpt-4o \
  --worker openai:gpt-4o-mini \
  --no-cache \
  --task "Draft a brief memo"

Real Scraping (Workers)

python3 cli.py exec \
  --orchestrator openai:gpt-4o \
  --worker openai:gpt-4o-mini \
  --enable-scrape \
  --task "Find 3 sources about X and summarize them"

Dynamic Scraping (Playwright)

python3 cli.py exec \
  --orchestrator openai:gpt-4o \
  --worker openai:gpt-4o-mini \
  --enable-scrape \
  --scrape-backend playwright \
  --task "Find 3 sources about X and summarize them"

Stats

python3 cli.py stats

Router Reset (Clear Learned Memory)

python3 cli.py router-reset

Router Export / Import

python3 cli.py router-export --out router_examples.json
python3 cli.py router-import --in router_examples.json

The export includes created timestamp and optional confidence if available.

Python Package

pip install -e .
llm-use exec --orchestrator ollama:llama3.1:70b --worker ollama:llama3.1:8b --task "Hello"

Concrete Examples (Agent Support)

These examples show how to use the orchestrator as the “brain” that delegates work to cheaper or local workers.

Multi‑source research with final synthesis

python3 cli.py exec \
  --orchestrator anthropic:claude-3-7-sonnet-20250219 \
  --worker openai:gpt-4o-mini \
  --task "Collect 8 reliable sources on X and produce a pros/cons summary"

Concurrent document analysis (agent brief)

python3 cli.py exec \
  --orchestrator openai:gpt-4o \
  --worker openai:gpt-4o-mini \
  --max-workers 6 \
  --task "Analyze 6 documents and return an executive brief with risks and opportunities"

Privacy‑first local pipeline (offline agent)

python3 cli.py exec \
  --orchestrator ollama:qwen2.5:72b \
  --worker ollama:mistral:7b \
  --task "Extract requirements from internal notes and produce a checklist"

Brainstorm + validation (creative + critic)

python3 cli.py exec \
  --orchestrator anthropic:claude-3-7-sonnet-20250219 \
  --worker ollama:llama3.1:8b \
  --task "Generate 20 ideas, then pick the top 5 with brief rationale"

Best Practices for Agents

  • Define the expected output format in the task (bullets, table, JSON).
  • Avoid vague tasks: ask for decomposition and synthesis with clear criteria.
  • Use cheaper workers for data gathering and a stronger orchestrator for synthesis.
  • Set --max-workers based on rate limits and the number of subtasks.
  • For sensitive data, prefer Ollama or isolated environments.

File/CSV Examples (Prompt‑In‑File)

If your agent works on structured inputs, it helps to include the content directly in the prompt.

Summarize a local file

python3 cli.py exec \
  --orchestrator openai:gpt-4o \
  --worker openai:gpt-4o-mini \
  --task "Summarize in 5 bullets the content of this file:\n\n$(cat notes.txt)"

CSV analysis (schema + insights)

python3 cli.py exec \
  --orchestrator anthropic:claude-3-7-sonnet-20250219 \
  --worker anthropic:claude-3-5-haiku-20241022 \
  --task "Analyze the CSV below, describe the schema and 3 insights:\n\n$(cat data.csv)"

JSON output for agent pipelines

python3 cli.py exec \
  --orchestrator ollama:llama3.1:70b \
  --worker ollama:llama3.1:8b \
  --task "Extract requirements in JSON with keys: title, priority, rationale:\n\n$(cat requirements.md)"

Providers and Models

The following model names are recognized out of the box. You can also pass custom models with provider:model.

Anthropic

  • claude-3-5-haiku-20241022
  • claude-3-7-sonnet-20250219
  • claude-4-opus-20250514

OpenAI

  • gpt-4o-mini
  • gpt-4o
  • o1

Ollama

  • llama3.1:70b
  • llama3.1:8b
  • qwen2.5:72b
  • mistral:7b

llama.cpp (OpenAI-compatible server)

Use llama_cpp:<model> with a llama.cpp server that exposes /v1/chat/completions.

Python API

from llm_use import Orchestrator, ModelConfig

orch = Orchestrator(
    orchestrator=ModelConfig(name="llama3.1:70b", provider="ollama"),
    worker=ModelConfig(name="llama3.1:8b", provider="ollama")
)

result = orch.execute("Your task")
print(f"Cost: ${result['cost']:.6f}")
print(result["output"])

Cost Notes

Costs are estimated using provider list prices per million tokens and token counts returned by the SDKs. For Ollama, cost is zero by default. Token usage for Ollama is estimated from word counts.

Troubleshooting

Ollama not found

ollama serve
ollama list

Missing API keys

export ANTHROPIC_API_KEY="sk-ant-..."
export OPENAI_API_KEY="sk-..."

Testing

pip install pytest
pytest

License

MIT

About

LLM orchestration toolkit for agent workflows: planner + workers + synthesis, optional router (LLM + learned fallback), supports OpenAI/Anthropic/Ollama/llama.cpp, real scraping with caching, MCP server integration, and a TUI chat UI.

Topics

Resources

License

Stars

Watchers

Forks

Languages