LLM-RAG-Cheat-Sheet

A collection of practical examples and tools for building LLM (Large Language Model) applications, focusing on RAG (Retrieval-Augmented Generation) patterns. Created for the talk "Open Data / Freie Daten in KI Chatbots nutzen" (Open Data in AI Chatbots).

Requirements

Ollama running on localhost:11434 (or configure alternative API endpoint)
Python 3.x for Python scripts
Models: Tested with llama3.2, gemma3:4b, qwen3, phi4

Quick Start

Terminal Chat

cd simple_shell_chat
pip install -r requirements.txt
python llm_shell.py --model llama3.2 --api http://localhost:11434

Commands:

clear - Reset conversation context
model - Show current model
model <name> - Switch to different model
ollama ls - List available models
ollama ps - Show running models
bye - Exit

Web Chat

cd simple_web_chat
./run_server.sh
# Open browser to http://localhost:8000

Directory Structure

`/tools/` - Function/Tool Calling

Demonstrates how to implement and use tool calling with LLMs:

Function Calling Example:

# tools/function_calling.py
tools = [{
    "type": "function",
    "function": {
        "name": "get_weather",
        "description": "Get current temperature for a given location.",
        "parameters": {
            "type": "object",
            "properties": {
                "location": { "type": "string" }
            },
            "required": ["location"]
        }
    }
}]

payload = {
    "model": "llama3.2",
    "tools": tools,
    "messages": [{"role": "user", "content": "What is the weather in Paris?"}]
}

Key Files:

function_calling.py - Basic tool calling with weather API
bash-agent.py - LLM agents that execute shell commands
rag_response.py - RAG pattern examples
tool_example.py - Various tool integration tests

`/api/` - API Clients & Response Formatting

Multi-language API client implementations and structured output:

Structured JSON with Pydantic:

# api/json_format_pydantic.py
class TranslationFormat(BaseModel):
    german: str
    spanish: str
    english: str | None = None

schema = TranslationFormat.model_json_schema()
payload = {
    "model": "llama3.2",
    "response_format": {
        "type": "json_schema",
        "json_schema": {"strict": True, "schema": schema}
    }
}

Key Files:

chat_python_httpclient.py / chat_python_langchain.py - Python HTTP clients
json_format_pydantic.py / json_format_langchain.py - Structured JSON output
chat_curl.sh / chat_curl_stream.sh - Curl-based API examples
mood_classifier.py - Sentiment/mood classification
Java examples: json_format_java.java, chat_java_simplejson.java

`/context/` - RAG Context Management

Text retrieval and context management for RAG systems:

Vector Index with Multiple Retrievers:

# context/vectorindex.py
from langchain.vectorstores import FAISS
from langchain.retrievers import BM25Retriever, TFIDFRetriever

# Split text into chunks
text_splitter = CharacterTextSplitter(chunk_size=512, chunk_overlap=64)
texts = text_splitter.split_documents(documents)

# Create multiple retrievers
vectorstore = FAISS.from_documents(texts, HuggingFaceEmbeddings(model_name="all-MiniLM-L6-v2"))
vector_retriever = vectorstore.as_retriever(search_kwargs={"k": 4})
tfidf_retriever = TFIDFRetriever.from_documents(texts)
bm25_retriever = BM25Retriever.from_documents(texts)

Key Files:

context_provider.py - Document loading and TF-IDF retrieval
vectorindex.py - FAISS, BM25, TF-IDF vector index implementation
wikipedia_news.py / wikipedia_news_de.py - Wikipedia revision tracking
rag_proxy.py - RAG proxy implementation
Sample texts: pg50221.txt, pg3008.txt (Project Gutenberg)

`/image/` - Multimodal Vision Examples

Image analysis with vision-capable LLMs:

Image Analysis:

# image/image_basic_openai_api.py
with open("images/scalarproduct.png", "rb") as image_file:
    base64_image = base64.b64encode(image_file.read()).decode('utf-8')

payload = {
    "model": "gemma3:4b",
    "messages": [{
        "role": "user",
        "content": [
            {"type": "text", "text": "Explain the image"},
            {"type": "image_url", "image_url": {"url": f"data:image/png;base64,{base64_image}"}}
        ]
    }]
}

Key Files:

image_basic_ollama_api.py - Basic image analysis with Ollama
image_basic_openai_api.py - OpenAI-compatible API for image analysis
image_catalog.py - Image cataloging utilities
image_fetch.sh - Fetch images from URLs

`/simple_web_chat/` - Web Chat Interface

Terminal-style browser-based chat client:

Features:

Streaming token-by-token output
Model switching (model <name>)
Host configuration (host <url>)
LocalStorage persistence
Markdown rendering
RAG context injection (via chat.html)

Run:

cd simple_web_chat
./run_server.sh
# Or manually: python -m http.server 8000

Commands:

reset - Clear conversation
model - Show current model
model <name> - Switch model
host <url> - Change API endpoint
ollama ls - List available models

`/simple_shell_chat/` - Terminal Chat Interface

Command-line chat client for terminal interaction:

Features:

Streaming token-by-token output
Model management (switch, list, show)
Context persistence
Multi-line input (triple quote """)
Custom system prompts via config

Common Patterns

RAG with Context Injection

# Retrieve relevant documents
retrieved_docs = vector_retriever.get_relevant_documents(query)
context_text = "\n\n".join([doc.page_content for doc in retrieved_docs])

# Inject into prompt
messages = [{
    "role": "system",
    "content": f"Answer using this context:\n{context_text}"
}, {
    "role": "user",
    "content": query
}]

Streaming Responses

# API request with streaming
body = {
    "model": "llama3.2",
    "stream": True,
    "messages": messages,
    "max_tokens": 8192
}
conn.request("POST", "/v1/chat/completions", json.dumps(body))

# Process SSE stream
for line in response:
    if "data: [DONE]" in line: break
    data = json.loads(line[6:])
    token = data["choices"][0]["delta"].get("content", "")
    print(token, end="", flush=True)

Configuration

Ollama Setup

# Install and start Ollama
brew install ollama  # macOS
ollama pull llama3.2
ollama serve

Environment

Default Ollama endpoint: http://localhost:11434
Use --api flag or host command to change endpoint
Default model configurable in scripts

License

MIT License - see LICENSE.txt for details

Repository: https://github.com/orbiter/llm-rag-cheat-sheet

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LLM-RAG-Cheat-Sheet

Requirements

Quick Start

Terminal Chat

Web Chat

Directory Structure

`/tools/` - Function/Tool Calling

`/api/` - API Clients & Response Formatting

`/context/` - RAG Context Management

`/image/` - Multimodal Vision Examples

`/simple_web_chat/` - Web Chat Interface

`/simple_shell_chat/` - Terminal Chat Interface

Common Patterns

RAG with Context Injection

Streaming Responses

Configuration

Ollama Setup

Environment

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 36 Commits
api		api
context		context
image		image
simple_shell_chat		simple_shell_chat
simple_web_chat		simple_web_chat
tools		tools
.gitignore		.gitignore
LICENSE.txt		LICENSE.txt
README.md		README.md

Folders and files

Latest commit

History

Repository files navigation

LLM-RAG-Cheat-Sheet

Requirements

Quick Start

Terminal Chat

Web Chat

Directory Structure

/tools/ - Function/Tool Calling

/api/ - API Clients & Response Formatting

/context/ - RAG Context Management

/image/ - Multimodal Vision Examples

/simple_web_chat/ - Web Chat Interface

/simple_shell_chat/ - Terminal Chat Interface

Common Patterns

RAG with Context Injection

Streaming Responses

Configuration

Ollama Setup

Environment

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`/tools/` - Function/Tool Calling

`/api/` - API Clients & Response Formatting

`/context/` - RAG Context Management

`/image/` - Multimodal Vision Examples

`/simple_web_chat/` - Web Chat Interface

`/simple_shell_chat/` - Terminal Chat Interface

Packages