This repository provides hands-on examples and learning resources for working with large language models (LLMs) in local development environments.
- Local inference with Ollama and llama.cpp
- Direct model loading with HuggingFace Transformers
- LangChain: prompt templates, output parsers, chains, and agents
- RAG (Retrieval-Augmented Generation) with pgvector
- Gradio web interfaces
- Prompting techniques: zero-shot, few-shot, chain-of-thought, ReAct
- 9 demos: chatbots, LangChain patterns, agents, RAG knowledge systems, fine-tuning & evaluation
- 8 slide decks: covering deployment, prompting, LangChain, fine-tuning, and evaluation
- 7 activities: hands-on exercises building on each demo
Complete documentation: https://gperdrizet.github.io/llms-demo
The documentation covers:
- Setup and installation
- Demo usage and concepts
- Inference server configuration
- Library reference with code examples
- Model specifications and serving commands
- Systemd deployment for production use
- Slide and activity guides
-
Click Fork in the top-right corner of this repo on GitHub to create your own copy.
-
Clone your fork:
git clone https://github.com/<your-username>/llms-demo.git
- Open the cloned folder in VS Code.
- When prompted "Reopen in Container", click it - or run the command Dev Containers: Reopen in Container from the Command Palette (
Ctrl+Shift+P). - VS Code will build and start the container. This takes a few minutes the first time.
The dev container is based on the gperdrizet/llms-gpu image (NVIDIA GPU-enabled). On first creation, the postCreateCommand runs automatically and does the following:
| Step | What it does |
|---|---|
mkdir -p models/hugging_face && mkdir -p models/ollama |
Creates local directories for model storage |
pip install -r requirements.txt |
Installs Python dependencies: bert-score, evaluate, gradio, huggingface-hub, langchain-ollama, openai, peft, python-dotenv, trl, torch, transformers |
bash .devcontainer/install_ollama.sh |
Downloads and installs the Ollama CLI |
The container also pre-configures the following:
| Setting | Detail |
|---|---|
| GPU access | All host GPUs are passed through (--gpus all) |
| Python interpreter | /usr/bin/python is set as the default |
HF_HOME |
Points to models/hugging_face so Hugging Face downloads stay in the repo |
OLLAMA_MODELS |
Points to models/ollama so Ollama downloads stay in the repo |
| Port 7860 | Forwarded automatically for Gradio web UIs |
| VS Code extensions | Python, Jupyter, Code Spell Checker, and Marp (slide viewer) are installed |
Once the container is ready you can start running the demos - no extra setup needed.
See the Demos documentation for detailed instructions on running each chatbot, including:
- Concepts covered in each demo
- Tools and libraries used
- Step-by-step setup and execution
Quick example - Ollama chatbot:
# 1. Start the Ollama server
ollama serve
# 2. Pull a model (in another terminal)
ollama pull qwen2.5:3b
# 3. Run the chatbot
python demos/chatbots/ollama_chatbot.pyFor complete instructions on all four demos, visit the documentation.