A lightweight chat UI for running local language models via Ollama. Single HTML file, no frameworks, no cloud dependencies.
- Streaming chat with markdown, syntax highlighting, and LaTeX math
- Vision — upload images for models that support it (e.g. Qwen3.5, Gemma3)
- Agent tools — the model can autonomously use:
- Web search — DuckDuckGo, model decides when to search
- Browse URL — fetch and read any web page
- Calculator — evaluate math expressions (supports Python math functions)
- Python runner — execute code with matplotlib plot support (inline charts)
- Notes — save, list, and read local markdown notes
- PDF generation — create printable documents from markdown
- PDF upload — extract and summarise text from uploaded PDFs
- Thinking — toggle visible chain-of-thought reasoning
- Tool memory — tool calls and results persist across turns (collapsed, expandable in chat)
- Context counter — real token count from Ollama (warns at 70%, critical at 90%)
- Print chat — full conversation with expanded thinking blocks
- Configurable — model, temperature, context length, max tokens, system prompt
- Ollama installed and running
- Python 3.x
pdfplumberandmatplotlibfor PDF extraction and plotting:pip3 install pdfplumber matplotlib
- At least one model pulled, e.g.:
ollama pull qwen3.5:9b-q8_0
# 1. Start Ollama (if not already running)
OLLAMA_KEEP_ALIVE=-1 ollama serve
# 2. Start the chat server
cd local_llm
python3 serve_ui.py &
# 3. Open in browser
open http://127.0.0.1:3000/| Model | Size | Vision | Tools | Notes |
|---|---|---|---|---|
| qwen3.5:9b-q8_0 | 10 GB | Yes | Yes | Best all-rounder, default |
| qwen3.5:35b-a3b | 23 GB | Yes | Yes | MoE, use context length 2048 |
| gemma3:27b | 17 GB | Yes | No | Good quality, no tool calling |
| qwq:latest | 19 GB | No | Yes | Good reasoning |
| deepseek-r1:14b | 9 GB | No | No | Fits easily |
| llama3.1:8b | 5 GB | No | Yes | Fits easily |
local_llm/
chat-ui.html # Complete chat UI (single file)
serve_ui.py # Proxy server + agent tool endpoints
vendor/ # Bundled JS/CSS (marked, highlight.js, KaTeX)
notes/ # Saved notes (created at runtime)
generated/ # Generated PDF pages (created at runtime)
Browser (:3000) <--> serve_ui.py <--> Ollama (:11434)
|
/search (DuckDuckGo)
/browse (fetch web pages)
/calc (math expressions)
/run_code (Python + matplotlib)
/save_note, /list_notes, /read_note
/extract_pdf, /generate_pdf
serve_ui.pyserves the UI and proxies/api/*requests to Ollama- Agent tool endpoints handle search, browsing, code execution, notes, and PDFs
- Tool calling is automatic — the model decides when to use tools
- All processing runs locally — no data leaves your machine (except search queries and browsed URLs)
- 32GB RAM + large model: set Context Length to 2048 in settings to avoid memory pressure
- Thinking toggle: turn off for faster responses when you don't need reasoning
- Vision: attach images via the paperclip button or drag & drop
- Search: the model automatically uses web search when it needs current information
- Plots: ask the model to create charts — matplotlib runs locally, plots display inline
- Context counter: shows real token usage from Ollama — start a new chat when it gets high
- Models without tool support (Gemma3, DeepSeek): tools are automatically disabled