rag-v2 is a document-grounded RAG project for LM Studio with two supported integration modes:
- LM Studio plugin mode: a prompt-preprocessor plugin that enriches chats using attached files
- MCP server mode: a stdio MCP server that exposes the same core retrieval logic as tools
The repository uses a workspace layout:
packages/core: transport-agnostic retrieval, ranking, evidence, and policy logicpackages/lmstudio-shared: LM Studio-specific shared helpers reused by plugin and MCP runtimespackages/adapter-lmstudio: LM Studio plugin adapter and prompt-preprocessor flowpackages/mcp-server: MCP server adapter and stdio entrypoint
The goal is to keep one shared RAG implementation while supporting both LM Studio plugin workflows and MCP-hosted tool workflows.
When users provide documents, rag-v2 can:
- inject full content for small inputs
- retrieve and inject only the most relevant evidence for larger inputs
- avoid wasteful retrieval on conversational, ambiguous, or likely unanswerable prompts
- combine semantic retrieval with local lexical retrieval when helpful
- rerank, dedupe, sanitize, and package evidence before it reaches the model
This makes easy cases faster, large-document cases more selective, and weak-evidence cases more grounded.
The system chooses between:
- full-content injection for small files that fit comfortably in context
- retrieval-driven injection for larger files where only the best evidence should be sent to the model
Before retrieval, prompts can be classified as:
no-retrieval-neededambiguouslikely-unanswerableretrieval-useful
This reduces unnecessary retrieval and helps avoid confident answers that are not actually grounded in the provided documents.
For retrieval-heavy prompts, the system can generate multiple query variants such as:
- original
- keywords
- decomposed
- quoted-span
These variants are fused into a shared candidate pool to improve recall.
Hybrid mode combines:
- LM Studio semantic retrieval
- local lexical retrieval over parsed document content
This is especially useful for exact phrases, rare terms, section titles, and quoted spans.
After retrieval, candidates can be:
- fused across query variants
- merged across semantic and lexical sources
- reranked with a lightweight heuristic reranker
The reranker favors answer-useful evidence, not just semantic similarity.
Instead of injecting raw retrieved text, the system:
- removes near-duplicate evidence
- builds labeled evidence blocks
- includes provenance cues such as file name and score
- limits how many evidence blocks are injected
Retrieved content is treated as untrusted input. The system can:
- sanitize noisy text
- strip obvious instruction-like spans
- apply stricter grounding behavior
Grounding modes currently include:
offwarn-on-weak-evidencerequire-evidence
packages/
core/
lmstudio-shared/
adapter-lmstudio/
mcp-server/
src/
index.ts
scripts/
eval/
manual-tests/
examples/
Important note:
src/index.tsis an intentional LM Studio plugin-root entry shim- it forwards to
packages/adapter-lmstudio/src/index.ts - it remains at the repo root so LM Studio plugin tooling can treat the repository root as the plugin root
All other legacy compatibility shims have been removed.
At a high level:
- inspect the user prompt and available documents
- optionally run the answerability gate
- choose full-content injection or retrieval
- optionally generate multiple retrieval rewrites
- optionally merge semantic and lexical candidates
- fuse and rerank candidate evidence
- dedupe and sanitize evidence blocks
- inject grounded evidence plus the user query back into the prompt or return tool results
Most LM Studio-facing orchestration lives in packages/adapter-lmstudio/src/promptPreprocessor.ts. Shared retrieval and policy logic lives in packages/core/src/.
Use this when you want document-aware prompt preprocessing directly inside LM Studio.
Run the plugin in dev mode:
npm run dev:pluginPublish or update the plugin:
npm run push:pluginUse this when you want to expose RAG functionality as MCP tools to LM Studio MCP or another local MCP-compatible host.
Run the stdio MCP server:
npm run mcp:stdioThis exposes four tools:
rag_answerrag_searchcorpus_inspectrerank_only
Important for MCP mode:
- do not print to stdout outside the MCP protocol
- use stderr for logs and debugging output
- the runtime supports inline
documents, filesystempaths, and pre-suppliedchunks
A ready-to-copy example is included at:
examples/lmstudio.mcp.json
{
"mcpServers": {
"rag-v2-local": {
"command": "npm",
"args": ["run", "mcp:stdio"],
"cwd": "C:\\Users\\user\\projects\\rag-v2"
}
}
}{
"mcpServers": {
"rag-v2-local": {
"command": "wsl.exe",
"args": [
"-d",
"Ubuntu-24.04-D",
"bash",
"-lc",
"cd /home/user/projects/temp/ai-apps/rag-v2 && npm run mcp:stdio"
]
}
}
}Replace the path values with the ones for your machine.
npm run dev:plugin
npm run push:plugin
npm run mcp:stdio
npm run typecheck
npm run evalnpm run typecheck:core
npm run typecheck:adapter
npm run typecheck:mcp
npm run typecheck:lmstudio-shared
npm run typecheck:packagestypecheck:corevalidates the shared core packagetypecheck:adaptervalidates the LM Studio adapter packagetypecheck:mcpvalidates the MCP packagetypecheck:lmstudio-sharedvalidates the LM Studio shared helper packagetypecheck:packagesvalidates all workspace packagestypecheckvalidates all packages plus the root plugin shim
npm run smoke:multi-query
npm run smoke:evidence
npm run smoke:safety
npm run smoke:rerank
npm run smoke:hybrid
npm run smoke:corrective
npm run smoke:core
npm run smoke:core-policy
npm run smoke:mcp
npm run smoke:mcp-filesystem
npm run smoke:model-rerank
npm run smoke:lmstudio-model-resolutionThese are intended to verify deterministic slices of the pipeline quickly.
Run the lightweight regression harness with:
npm run evalInputs live in eval/cases/.
Latest aggregated output is written to:
eval/results/all-latest.json
Current suites include:
basic.jsonlhard.jsonl
For live testing in LM Studio, use:
LIVE_TEST_SCRIPT.mdmanual-tests/README.mdmanual-tests/fixtures/
The LM Studio plugin exposes configuration in the LM Studio UI.
- Embedding Model
- Manual Model ID
- Auto-Unload Model
- Retrieval Limit
- Retrieval Affinity Threshold
- Answerability Gate
- Gate Confidence Threshold
- Ambiguous Query Behavior
- Multi-Query Retrieval
- Multi-Query Count
- Fusion Method
- Max Candidates Before Rerank
- Max Evidence Blocks
- Dedupe Similarity Threshold
- Hybrid Retrieval
- Hybrid Semantic Weight
- Hybrid Lexical Weight
- Hybrid Candidate Count
- Rerank Retrieved Chunks
- Rerank Top K
- Rerank Strategy
- Model-Assisted Rerank
- Model Rerank Top K
- Rerank Model Source
- Manual Rerank Model ID
- Corrective Retrieval
- Corrective Max Attempts
- Sanitize Retrieved Text
- Strip Instructional Spans
- Strict Grounding Mode
packages/adapter-lmstudio/src/promptPreprocessor.ts: main LM Studio prompt-preprocessor pipelinepackages/adapter-lmstudio/src/: LM Studio adapter logic and adapter-only shaping/typespackages/lmstudio-shared/src/: LM Studio shared model resolution, rerank, and bridge helpers used by plugin and MCP runtimespackages/core/src/: shared retrieval, ranking, evidence, and policy logicpackages/mcp-server/src/: MCP contracts, handlers, runtimes, and stdio server entrypointssrc/index.ts: intentional repo-root LM Studio plugin entry shimscripts/: smoke tests and eval runnerexamples/lmstudio.mcp.json: LM Studio MCP config examplemanual-tests/: live-test fixtures and guidance
The repository currently includes:
- answerability gating
- deterministic multi-query retrieval
- optional hybrid semantic-plus-lexical retrieval
- fusion, heuristic reranking, and optional model-assisted reranking
- evidence dedupe and packaging
- retrieved-text sanitization and grounding controls
- smoke tests for major pipeline slices
- regression eval coverage
- an SDK-backed MCP stdio server path with filesystem loading
- LM Studio-compatible MCP examples
- GitHub: AcidicSoil
- X: @d1rt7d4t4
- Discord:
the_almighty_shade(187893603920642048)