This repository contains the materials for Session 5 of Applied NLP.
- Slides: see
slides/folder - Notebooks: see
notebooks/folder
This session introduces Retrieval-Augmented Generation (RAG), a technique that combines information retrieval with large language models to answer questions based on external documents.
- Text Chunking — Split long documents into overlapping chunks for efficient retrieval
- Embeddings — Convert text chunks into semantic vector representations
- Vector Databases — Store and search embeddings using FAISS
- Retrieval Chains — Build a complete RAG pipeline connecting retrieval + generation
- Local LLM Integration — Use Ollama for API-free local inference
We use Lewis Carroll's two Alice books as our corpus like all previous sessions:
- Alice's Adventures in Wonderland
- Through the Looking-Glass
- LangChain — Framework for building LLM applications
- FAISS — Facebook AI Similarity Search for vector storage
- Sentence Transformers — Embedding models (
all-mpnet-base-v2) - Ollama — Local LLM runtime (using
llama3.2model) - LangSmith — Prompt hub for retrieval templates
Before starting, please fork this repository and create a fresh Python virtual environment.
All required libraries are listed in requirements.txt.
⚠️ If you encounter errors duringpip install, try removing the version pinning for the failing package(s) inrequirements.txt.
On Apple M1/M2 systems you may also need to install additional system packages (the “M1 shizzle”).
# Select Python version (if using pyenv)
pyenv local 3.11.3
# Create and activate virtual environment
python -m venv .venv
source .venv/bin/activate
# Upgrade pip and install dependencies
pip install --upgrade pip
pip install -r requirements.txt# Select Python version (if using pyenv)
pyenv local 3.11.3
# Create and activate virtual environment
python -m venv .venv
.venv\Scripts\Activate.ps1
# Upgrade pip and install dependencies
python -m pip install --upgrade pip
pip install -r requirements.txt# Select Python version (if using pyenv)
pyenv local 3.11.3
# Create and activate virtual environment
python -m venv .venv
source .venv/Scripts/activate
# Upgrade pip and install dependencies
python -m pip install --upgrade pip
pip install -r requirements.txtYou’re now ready to run the session notebooks!
Deactivate the environment when you’re done:
deactivate