Local LLM Chat

A lightweight chat UI for running local language models via Ollama. Single HTML file, no frameworks, no cloud dependencies.

Features

Streaming chat with markdown, syntax highlighting, and LaTeX math
Vision — upload images for models that support it (e.g. Qwen3.5, Gemma3)
Agent tools — the model can autonomously use:
- Web search — DuckDuckGo, model decides when to search
- Browse URL — fetch and read any web page
- Calculator — evaluate math expressions (supports Python math functions)
- Python runner — execute code with matplotlib plot support (inline charts)
- Notes — save, list, and read local markdown notes
- PDF generation — create printable documents from markdown
PDF upload — extract and summarise text from uploaded PDFs
Thinking — toggle visible chain-of-thought reasoning
Tool memory — tool calls and results persist across turns (collapsed, expandable in chat)
Context counter — real token count from Ollama (warns at 70%, critical at 90%)
Print chat — full conversation with expanded thinking blocks
Configurable — model, temperature, context length, max tokens, system prompt

Prerequisites

Ollama installed and running
Python 3.x
pdfplumber and matplotlib for PDF extraction and plotting:
```
pip3 install pdfplumber matplotlib
```
At least one model pulled, e.g.:
```
ollama pull qwen3.5:9b-q8_0
```

Quick Start

# 1. Start Ollama (if not already running)
OLLAMA_KEEP_ALIVE=-1 ollama serve

# 2. Start the chat server
cd local_llm
python3 serve_ui.py &

# 3. Open in browser
open http://127.0.0.1:3000/

Models Tested on 32GB Apple Silicon

Model	Size	Vision	Tools	Notes
qwen3.5:9b-q8_0	10 GB	Yes	Yes	Best all-rounder, default
qwen3.5:35b-a3b	23 GB	Yes	Yes	MoE, use context length 2048
gemma3:27b	17 GB	Yes	No	Good quality, no tool calling
qwq:latest	19 GB	No	Yes	Good reasoning
deepseek-r1:14b	9 GB	No	No	Fits easily
llama3.1:8b	5 GB	No	Yes	Fits easily

Project Structure

local_llm/
  chat-ui.html    # Complete chat UI (single file)
  serve_ui.py     # Proxy server + agent tool endpoints
  vendor/         # Bundled JS/CSS (marked, highlight.js, KaTeX)
  notes/          # Saved notes (created at runtime)
  generated/      # Generated PDF pages (created at runtime)

How It Works

Browser (:3000)  <-->  serve_ui.py  <-->  Ollama (:11434)
                          |
                      /search    (DuckDuckGo)
                      /browse    (fetch web pages)
                      /calc      (math expressions)
                      /run_code  (Python + matplotlib)
                      /save_note, /list_notes, /read_note
                      /extract_pdf, /generate_pdf

serve_ui.py serves the UI and proxies /api/* requests to Ollama
Agent tool endpoints handle search, browsing, code execution, notes, and PDFs
Tool calling is automatic — the model decides when to use tools
All processing runs locally — no data leaves your machine (except search queries and browsed URLs)

Tips

32GB RAM + large model: set Context Length to 2048 in settings to avoid memory pressure
Thinking toggle: turn off for faster responses when you don't need reasoning
Vision: attach images via the paperclip button or drag & drop
Search: the model automatically uses web search when it needs current information
Plots: ask the model to create charts — matplotlib runs locally, plots display inline
Context counter: shows real token usage from Ollama — start a new chat when it gets high
Models without tool support (Gemma3, DeepSeek): tools are automatically disabled

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
local_llm		local_llm
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Local LLM Chat

Features

Prerequisites

Quick Start

Models Tested on 32GB Apple Silicon

Project Structure

How It Works

Tips

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Local LLM Chat

Features

Prerequisites

Quick Start

Models Tested on 32GB Apple Silicon

Project Structure

How It Works

Tips

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages