LocalRAG

Simple and easy local RAG (Retrieval-Augmented Generation) guide you can set up to run your local LLMs for free.

LocalRAG

Guide

System Requirements

This guide was tested with the following system specs

Windows 11
NVIDIA RTX 4080
16GB RAM
Processor Intel(R) Core(TM) i7-14650HX, 2200 Mhz, 16 Core(s), 24 Logical Processor(s)
NVIDIA GeForce RTX 4080 Laptop GPU

Guide 1 Run Local RAG Using Anythingllm and OpenRouter

Guide based on AnythingLLM v1.8.2

1 Install Anythingllm

(recommended) Download the latest AnythinLLM version from here https://anythingllm.com/

2 Generate API Keys for OpenRouter

Open router provides use access to a wide range of free LLMs developed from the majority of the big players (OpenAI, META, Google, Grok and more)

Head to https://openrouter.ai and sign up and confirm email
Head to https://openrouter.ai/settings/keys and generate key

3 Launch AnythingLLM and configure to make use of OpenRouter

Head to settings -> AI Providers -> LLMs
Select OpenRouter and set API Key

Select a model marked as free
1. In this example I have user meta-llama/llama-4-maverick:free but other excellent choices are also
  1. mistralai/mistral-small-3.1-24b-instruct:free
  2. deepseek/deepseek-r1-distill-llama-70b:free
  3. apidog.com+14gaikarkshitij.medium.com+14reddit.com+14
  4. reddit.com+9ollama.hf-mirror.com+9

4 Create Workspace

Now that you have added the LLM you should be able to start chatting wright away.

Create a work space

5 Set Up Vector Database and embed your documents into your RAG

This is the most important step of the RAG system, embedding your documents into a vector database. By embedding your documents you are generating essentially an array of numbers (relations) from each document you upload. Similar document generate similar arrays of numbers and are hence grouped/linked/related more than others. The vector database serves as the knowledge base for your LLM (any LLM you plan to use will make use of this database, so you can switch between LLMs).

We have 2 machine learning models that we require, 1 for producing the vector database/knowledge base and one for chatting and providing information from this vector database/knowledge base.

Set up vector database (for local set up default option works fine)

Select an Embedding LLM (for local set up default option works fine).

Also some other options

Model/API	Dim	Free?	API Option	Notes
Cohere/embed-english-light-v3.0	1024	✅ 1M tokens/month	Cohere	Fast, accurate, multilingual
OpenAI/text-embedding-3-small	1536	❌ after limit	OpenAI	Great but paid
Mistral Embed	1024	✅ via Together.ai/OpenRouter	OpenRouter	Lightweight, decent for chatbots
BAAI/bge-base-en-v1.5	768	✅ HF / Local	HuggingFace, Ollama	Strong multilingual support
DeepSeek Embed	1024	✅ via OpenRouter	OpenRouter	Matches DeepSeek-R AG pipelines
InstructorXL	768	✅ HuggingFace	HuggingFace Spaces	Instruction-based embedding model

6 Upload your documents and build your knowledge base

You may upload images, pdfs, text files and etc to build your knowledge base

Head to the home screen and click on embed a document

Upload your documents and assign them to your workspace

7 (Optional) Configure as query mode

Click on your works space settings and change to query mode

8 Test the RAG System

Guide 2 Run Local RAG Using Anythingllm and Self hosted Chat LLM Using LMStudio

Coming Soon

Links

lmstudio
- attempt https://lmstudio.ai/docs/typescript in a client app project https://github.com/lmstudio-ai/lmstudio-js
- https://hub.docker.com/r/noneabove1182/lmstudio-cuda
anythingllm
https://github.com/mudler/LocalAI

Models that work for NVIDIA RTX4080 GPU

Model	Size	Notes
TinyLLaMA (1.1B)	1.1B	Great for testing, good trade-off
LiteLLaMA (460M)	460M	Ultra-light for toy use
MobileLLaMA (1.4B/2.7B)	1.4–2.7B	Fast for real-time apps
Phi‑3‑mini (3.8B)	3.8B	Benchmarks rival larger models
Llama 3.1‑Minitron (4B)	4B	Distilled, high efficiency

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
setup-images		setup-images
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LocalRAG

Guide

System Requirements

Guide 1 Run Local RAG Using Anythingllm and OpenRouter

1 Install Anythingllm

2 Generate API Keys for OpenRouter

3 Launch AnythingLLM and configure to make use of OpenRouter

4 Create Workspace

5 Set Up Vector Database and embed your documents into your RAG

6 Upload your documents and build your knowledge base

7 (Optional) Configure as query mode

8 Test the RAG System

Guide 2 Run Local RAG Using Anythingllm and Self hosted Chat LLM Using LMStudio

Links

Models that work for NVIDIA RTX4080 GPU

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

LocalRAG

Guide

System Requirements

Guide 1 Run Local RAG Using Anythingllm and OpenRouter

1 Install Anythingllm

2 Generate API Keys for OpenRouter

3 Launch AnythingLLM and configure to make use of OpenRouter

4 Create Workspace

5 Set Up Vector Database and embed your documents into your RAG

6 Upload your documents and build your knowledge base

7 (Optional) Configure as query mode

8 Test the RAG System

Guide 2 Run Local RAG Using Anythingllm and Self hosted Chat LLM Using LMStudio

Links

Models that work for NVIDIA RTX4080 GPU

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Packages