RAG v2

rag-v2 is a document-grounded RAG project for LM Studio with two supported integration modes:

LM Studio plugin mode: a prompt-preprocessor plugin that enriches chats using attached files
MCP server mode: a stdio MCP server that exposes the same core retrieval logic as tools

The repository uses a workspace layout:

packages/core: transport-agnostic retrieval, ranking, evidence, and policy logic
packages/lmstudio-shared: LM Studio-specific shared helpers reused by plugin and MCP runtimes
packages/adapter-lmstudio: LM Studio plugin adapter and prompt-preprocessor flow
packages/mcp-server: MCP server adapter and stdio entrypoint

The goal is to keep one shared RAG implementation while supporting both LM Studio plugin workflows and MCP-hosted tool workflows.

What it does

When users provide documents, rag-v2 can:

inject full content for small inputs
retrieve and inject only the most relevant evidence for larger inputs
avoid wasteful retrieval on conversational, ambiguous, or likely unanswerable prompts
combine semantic retrieval with local lexical retrieval when helpful
rerank, dedupe, sanitize, and package evidence before it reaches the model

This makes easy cases faster, large-document cases more selective, and weak-evidence cases more grounded.

Core capabilities

Two context strategies

The system chooses between:

full-content injection for small files that fit comfortably in context
retrieval-driven injection for larger files where only the best evidence should be sent to the model

Answerability gate

Before retrieval, prompts can be classified as:

no-retrieval-needed
ambiguous
likely-unanswerable
retrieval-useful

This reduces unnecessary retrieval and helps avoid confident answers that are not actually grounded in the provided documents.

Deterministic multi-query retrieval

For retrieval-heavy prompts, the system can generate multiple query variants such as:

original
keywords
decomposed
quoted-span

These variants are fused into a shared candidate pool to improve recall.

Optional hybrid retrieval

Hybrid mode combines:

LM Studio semantic retrieval
local lexical retrieval over parsed document content

This is especially useful for exact phrases, rare terms, section titles, and quoted spans.

Fusion and reranking

After retrieval, candidates can be:

fused across query variants
merged across semantic and lexical sources
reranked with a lightweight heuristic reranker

The reranker favors answer-useful evidence, not just semantic similarity.

Evidence packaging and dedupe

Instead of injecting raw retrieved text, the system:

removes near-duplicate evidence
builds labeled evidence blocks
includes provenance cues such as file name and score
limits how many evidence blocks are injected

Safer retrieved-text injection

Retrieved content is treated as untrusted input. The system can:

sanitize noisy text
strip obvious instruction-like spans
apply stricter grounding behavior

Grounding modes currently include:

off
warn-on-weak-evidence
require-evidence

Repository layout

packages/
  core/
  lmstudio-shared/
  adapter-lmstudio/
  mcp-server/
src/
  index.ts
scripts/
eval/
manual-tests/
examples/

Important note:

src/index.ts is an intentional LM Studio plugin-root entry shim
it forwards to packages/adapter-lmstudio/src/index.ts
it remains at the repo root so LM Studio plugin tooling can treat the repository root as the plugin root

All other legacy compatibility shims have been removed.

How the pipeline works

At a high level:

inspect the user prompt and available documents
optionally run the answerability gate
choose full-content injection or retrieval
optionally generate multiple retrieval rewrites
optionally merge semantic and lexical candidates
fuse and rerank candidate evidence
dedupe and sanitize evidence blocks
inject grounded evidence plus the user query back into the prompt or return tool results

Most LM Studio-facing orchestration lives in packages/adapter-lmstudio/src/promptPreprocessor.ts. Shared retrieval and policy logic lives in packages/core/src/.

Supported integration modes

1) LM Studio plugin mode

Use this when you want document-aware prompt preprocessing directly inside LM Studio.

Run the plugin in dev mode:

npm run dev:plugin

Publish or update the plugin:

npm run push:plugin

2) MCP server mode

Use this when you want to expose RAG functionality as MCP tools to LM Studio MCP or another local MCP-compatible host.

Run the stdio MCP server:

npm run mcp:stdio

This exposes four tools:

rag_answer
rag_search
corpus_inspect
rerank_only

Important for MCP mode:

do not print to stdout outside the MCP protocol
use stderr for logs and debugging output
the runtime supports inline documents, filesystem paths, and pre-supplied chunks

LM Studio MCP configuration

A ready-to-copy example is included at:

examples/lmstudio.mcp.json

Windows-native repo clone

{
  "mcpServers": {
    "rag-v2-local": {
      "command": "npm",
      "args": ["run", "mcp:stdio"],
      "cwd": "C:\\Users\\user\\projects\\rag-v2"
    }
  }
}

WSL launch through `wsl.exe`

{
  "mcpServers": {
    "rag-v2-local": {
      "command": "wsl.exe",
      "args": [
        "-d",
        "Ubuntu-24.04-D",
        "bash",
        "-lc",
        "cd /home/user/projects/temp/ai-apps/rag-v2 && npm run mcp:stdio"
      ]
    }
  }
}

Replace the path values with the ones for your machine.

Development commands

Primary commands

npm run dev:plugin
npm run push:plugin
npm run mcp:stdio
npm run typecheck
npm run eval

Package-focused typechecking

npm run typecheck:core
npm run typecheck:adapter
npm run typecheck:mcp
npm run typecheck:lmstudio-shared
npm run typecheck:packages

typecheck:core validates the shared core package
typecheck:adapter validates the LM Studio adapter package
typecheck:mcp validates the MCP package
typecheck:lmstudio-shared validates the LM Studio shared helper package
typecheck:packages validates all workspace packages
typecheck validates all packages plus the root plugin shim

Validation and testing

Smoke tests

npm run smoke:multi-query
npm run smoke:evidence
npm run smoke:safety
npm run smoke:rerank
npm run smoke:hybrid
npm run smoke:corrective
npm run smoke:core
npm run smoke:core-policy
npm run smoke:mcp
npm run smoke:mcp-filesystem
npm run smoke:model-rerank
npm run smoke:lmstudio-model-resolution

These are intended to verify deterministic slices of the pipeline quickly.

Eval suites

Run the lightweight regression harness with:

npm run eval

Inputs live in eval/cases/. Latest aggregated output is written to:

eval/results/all-latest.json

Current suites include:

basic.jsonl
hard.jsonl

Manual LM Studio validation

For live testing in LM Studio, use:

LIVE_TEST_SCRIPT.md
manual-tests/README.md
manual-tests/fixtures/

Configuration

The LM Studio plugin exposes configuration in the LM Studio UI.

Embedding and retrieval base settings

Embedding Model
Manual Model ID
Auto-Unload Model
Retrieval Limit
Retrieval Affinity Threshold

Answerability gate settings

Answerability Gate
Gate Confidence Threshold
Ambiguous Query Behavior

Multi-query and retrieval settings

Multi-Query Retrieval
Multi-Query Count
Fusion Method
Max Candidates Before Rerank
Max Evidence Blocks
Dedupe Similarity Threshold

Hybrid retrieval settings

Hybrid Retrieval
Hybrid Semantic Weight
Hybrid Lexical Weight
Hybrid Candidate Count

Reranking settings

Rerank Retrieved Chunks
Rerank Top K
Rerank Strategy
Model-Assisted Rerank
Model Rerank Top K
Rerank Model Source
Manual Rerank Model ID

Corrective retrieval settings

Corrective Retrieval
Corrective Max Attempts

Safety and grounding settings

Sanitize Retrieved Text
Strip Instructional Spans
Strict Grounding Mode

Key files

packages/adapter-lmstudio/src/promptPreprocessor.ts: main LM Studio prompt-preprocessor pipeline
packages/adapter-lmstudio/src/: LM Studio adapter logic and adapter-only shaping/types
packages/lmstudio-shared/src/: LM Studio shared model resolution, rerank, and bridge helpers used by plugin and MCP runtimes
packages/core/src/: shared retrieval, ranking, evidence, and policy logic
packages/mcp-server/src/: MCP contracts, handlers, runtimes, and stdio server entrypoints
src/index.ts: intentional repo-root LM Studio plugin entry shim
scripts/: smoke tests and eval runner
examples/lmstudio.mcp.json: LM Studio MCP config example
manual-tests/: live-test fixtures and guidance

Current status

The repository currently includes:

answerability gating
deterministic multi-query retrieval
optional hybrid semantic-plus-lexical retrieval
fusion, heuristic reranking, and optional model-assisted reranking
evidence dedupe and packaging
retrieved-text sanitization and grounding controls
smoke tests for major pipeline slices
regression eval coverage
an SDK-backed MCP stdio server path with filesystem loading
LM Studio-compatible MCP examples

Author

GitHub: AcidicSoil
X: @d1rt7d4t4
Discord: the_almighty_shade (187893603920642048)

Name		Name	Last commit message	Last commit date
Latest commit History 72 Commits
.plans		.plans
.serena		.serena
eval		eval
examples		examples
manual-tests		manual-tests
packages		packages
scripts		scripts
src		src
.gitignore		.gitignore
IMPLEMENTATION_PLAN.md		IMPLEMENTATION_PLAN.md
LIVE_TEST_SCRIPT.md		LIVE_TEST_SCRIPT.md
RAG Resources and Challenges.md		RAG Resources and Challenges.md
RAG-deep-research-report.md		RAG-deep-research-report.md
README.md		README.md
manifest.json		manifest.json
package-lock.json		package-lock.json
package.json		package.json
tsconfig.json		tsconfig.json

Folders and files

Latest commit

History

Repository files navigation

RAG v2

What it does

Core capabilities

Two context strategies

Answerability gate

Deterministic multi-query retrieval

Optional hybrid retrieval

Fusion and reranking

Evidence packaging and dedupe

Safer retrieved-text injection

Repository layout

How the pipeline works

Supported integration modes

1) LM Studio plugin mode

2) MCP server mode

LM Studio MCP configuration

Windows-native repo clone

WSL launch through wsl.exe

Development commands

Primary commands

Package-focused typechecking

Validation and testing

Smoke tests

Eval suites

Manual LM Studio validation

Configuration

Embedding and retrieval base settings

Answerability gate settings

Multi-query and retrieval settings

Hybrid retrieval settings

Reranking settings

Corrective retrieval settings

Safety and grounding settings

Key files

Current status

Author

Community and help

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

WSL launch through `wsl.exe`

Packages