Skip to content

gtgabriel/local_chat_agent

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

19 Commits
 
 
 
 
 
 

Repository files navigation

Local LLM Chat

A lightweight chat UI for running local language models via Ollama. Single HTML file, no frameworks, no cloud dependencies.

Features

  • Streaming chat with markdown, syntax highlighting, and LaTeX math
  • Vision — upload images for models that support it (e.g. Qwen3.5, Gemma3)
  • Agent tools — the model can autonomously use:
    • Web search — DuckDuckGo, model decides when to search
    • Browse URL — fetch and read any web page
    • Calculator — evaluate math expressions (supports Python math functions)
    • Python runner — execute code with matplotlib plot support (inline charts)
    • Notes — save, list, and read local markdown notes
    • PDF generation — create printable documents from markdown
  • PDF upload — extract and summarise text from uploaded PDFs
  • Thinking — toggle visible chain-of-thought reasoning
  • Tool memory — tool calls and results persist across turns (collapsed, expandable in chat)
  • Context counter — real token count from Ollama (warns at 70%, critical at 90%)
  • Print chat — full conversation with expanded thinking blocks
  • Configurable — model, temperature, context length, max tokens, system prompt

Prerequisites

  • Ollama installed and running
  • Python 3.x
  • pdfplumber and matplotlib for PDF extraction and plotting:
    pip3 install pdfplumber matplotlib
  • At least one model pulled, e.g.:
    ollama pull qwen3.5:9b-q8_0

Quick Start

# 1. Start Ollama (if not already running)
OLLAMA_KEEP_ALIVE=-1 ollama serve

# 2. Start the chat server
cd local_llm
python3 serve_ui.py &

# 3. Open in browser
open http://127.0.0.1:3000/

Models Tested on 32GB Apple Silicon

Model Size Vision Tools Notes
qwen3.5:9b-q8_0 10 GB Yes Yes Best all-rounder, default
qwen3.5:35b-a3b 23 GB Yes Yes MoE, use context length 2048
gemma3:27b 17 GB Yes No Good quality, no tool calling
qwq:latest 19 GB No Yes Good reasoning
deepseek-r1:14b 9 GB No No Fits easily
llama3.1:8b 5 GB No Yes Fits easily

Project Structure

local_llm/
  chat-ui.html    # Complete chat UI (single file)
  serve_ui.py     # Proxy server + agent tool endpoints
  vendor/         # Bundled JS/CSS (marked, highlight.js, KaTeX)
  notes/          # Saved notes (created at runtime)
  generated/      # Generated PDF pages (created at runtime)

How It Works

Browser (:3000)  <-->  serve_ui.py  <-->  Ollama (:11434)
                          |
                      /search    (DuckDuckGo)
                      /browse    (fetch web pages)
                      /calc      (math expressions)
                      /run_code  (Python + matplotlib)
                      /save_note, /list_notes, /read_note
                      /extract_pdf, /generate_pdf
  • serve_ui.py serves the UI and proxies /api/* requests to Ollama
  • Agent tool endpoints handle search, browsing, code execution, notes, and PDFs
  • Tool calling is automatic — the model decides when to use tools
  • All processing runs locally — no data leaves your machine (except search queries and browsed URLs)

Tips

  • 32GB RAM + large model: set Context Length to 2048 in settings to avoid memory pressure
  • Thinking toggle: turn off for faster responses when you don't need reasoning
  • Vision: attach images via the paperclip button or drag & drop
  • Search: the model automatically uses web search when it needs current information
  • Plots: ask the model to create charts — matplotlib runs locally, plots display inline
  • Context counter: shows real token usage from Ollama — start a new chat when it gets high
  • Models without tool support (Gemma3, DeepSeek): tools are automatically disabled

About

Lightweight chat UI for local LLMs via Ollama — vision, web search, thinking, syntax highlighting

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors