Moss is the search runtime that lives inside your Conversational AI agent.
Index documents, query them semantically, and get results back in under 10 ms - fast enough for real-time conversation.
pip install inferedge-mossfrom inferedge_moss import MossClient, QueryOptions
client = MossClient("your_project_id", "your_project_key")
# Create an index and add documents
await client.create_index("support-docs", [
{"id": "1", "text": "Refunds are processed within 3-5 business days."},
{"id": "2", "text": "You can track your order on the dashboard."},
{"id": "3", "text": "We offer 24/7 live chat support."},
])
# Load and query — results in <10 ms
await client.load_index("support-docs")
results = await client.query("support-docs", "how long do refunds take?", QueryOptions(top_k=3))
for doc in results.docs:
print(f"[{doc.score:.3f}] {doc.text}") # Returned in {results.time_taken_ms}msnpm install @inferedge/mossimport { MossClient } from "@inferedge/moss";
const client = new MossClient("your_project_id", "your_project_key");
// Create an index and add documents
await client.createIndex("support-docs", [
{ id: "1", text: "Refunds are processed within 3-5 business days." },
{ id: "2", text: "You can track your order on the dashboard." },
{ id: "3", text: "We offer 24/7 live chat support." },
]);
// Load and query — results in <10 ms
await client.loadIndex("support-docs");
const results = await client.query("support-docs", "how long do refunds take?", { topK: 3 });
results.docs.forEach((doc) => {
console.log(`[${doc.score.toFixed(3)}] ${doc.text}`); // Returned in ${results.timeTakenInMs}ms
});Get your project credentials at moss.dev - free tier available.
Vector databases were built for batch analytics. Moss was built for real-time agents.
If you're building a voice bot, a copilot, or any AI system that talks to humans, you need retrieval that keeps up with conversation. A 200-500 ms round trip to a vector database kills the experience. Moss delivers results in single-digit milliseconds - fast enough that retrieval disappears from the latency budget.
End-to-end query latency (embedding + search) on 100,000 documents, 750 measured queries, top_k=5. Tested with Macbook pro (M4 Pro, 24GB).
| System | P50 | P95 | P99 | Mean |
|---|---|---|---|---|
| Moss | 3.1 ms | 4.3 ms | 5.4 ms | 3.3 ms |
| Pinecone | 432.6 ms | 732.1 ms | 934.2 ms | 485.8 ms |
| Qdrant | 597.8 ms | 775.0 ms | 1120.2 ms | 637.6ms |
| ChromaDB | 351.8 ms | 423.5 ms | 538.5 ms | 358.0 ms |
Moss includes embedding in the measurement — competitors use an external embedding service (modal). Pinecone uses cloud search.
Moss isn't a database! It's a search runtime. You don't manage clusters, tune HNSW parameters, or worry about sharding. You index documents, load them into the runtime, and query. That's it.
- Sub-10 ms semantic search - p99 of 8 ms
- Built-in embedding models - no OpenAI key required (or bring your own)
- Metadata filtering - filter by
$eq,$and,$in,$nearoperators - Document management - add, upsert, retrieve, and delete documents
- Python + TypeScript SDKs - async-first, type-safe
- Framework integrations - LangChain, DSPy, Pipecat, LiveKit, LlamaIndex
This repo contains working examples you can copy straight into your project:
examples/
├── python/ # Python SDK samples
│ ├── load_and_query_sample.py
│ ├── comprehensive_sample.py
│ ├── custom_embedding_sample.py
│ └── metadata_filtering.py
├── javascript/ # TypeScript SDK samples
│ ├── load_and_query_sample.ts
│ ├── comprehensive_sample.ts
│ └── custom_embedding_sample.ts
└── cookbook/ # Framework integrations
├── langchain/ # LangChain retriever
└── dspy/ # DSPy module
apps/
├── next-js/ # Next.js semantic search UI
├── pipecat-moss/ # Pipecat voice agent with Moss retrieval
├── livekit-moss-vercel/ # LiveKit voice agent on Vercel
└── docker/ # Dockerized examples (ECS/K8s pattern)
cd examples/python
pip install -r requirements.txt
cp ../../.env.example .env # Add your credentials
python load_and_query_sample.pycd examples/javascript
npm install
cp ../../.env.example .env # Add your credentials
npx tsx load_and_query_sample.tscd apps/next-js
npm install
cp ../../.env.example .env # Add your credentials
npm run dev # Open http://localhost:3000Sub-10 ms retrieval plugged into Pipecat's real-time voice pipeline — a customer support agent that actually keeps up with conversation.
cd apps/pipecat-moss/pipecat-quickstart
# See README for setup and Pipecat Cloud deploymentfrom inferedge_moss import MossClient, DocumentInfo, QueryOptions, MutationOptions, GetDocumentsOptions
client = MossClient(project_id, project_key)
# Index management
await client.create_index(name, documents, model_id="moss-minilm")
await client.get_index(name)
await client.list_indexes()
await client.delete_index(name)
# Document operations
await client.add_docs(name, documents, MutationOptions(upsert=True))
await client.get_docs(name)
await client.get_docs(name, GetDocumentsOptions(doc_ids=["id1", "id2"]))
await client.delete_docs(name, ["id1", "id2"])
# Search
await client.load_index(name)
results = await client.query(name, "your query", QueryOptions(top_k=5))
# results.docs[0].id, .text, .score, .metadata
# results.time_taken_msimport { MossClient, DocumentInfo } from "@inferedge/moss";
const client = new MossClient(projectId, projectKey);
// Index management
await client.createIndex(name, documents, { modelId: "moss-minilm" });
await client.getIndex(name);
await client.listIndexes();
await client.deleteIndex(name);
// Document operations
await client.addDocs(name, documents, { upsert: true });
await client.getDocs(name);
await client.getDocs(name, { docIds: ["id1", "id2"] });
await client.deleteDocs(name, ["id1", "id2"]);
// Search
await client.loadIndex(name);
const results = await client.query(name, "your query", { topK: 5 });
// results.docs[0].id, .text, .score, .metadata
// results.timeTakenInMs| Framework | Status | Example |
|---|---|---|
| LangChain | Available | examples/cookbook/langchain/ |
| DSPy | Available | examples/cookbook/dspy/ |
| Pipecat | Available | apps/pipecat-moss/ |
| LiveKit | Available | apps/livekit-moss-vercel/ |
| Next.js | Available | apps/next-js/ |
| VitePress | Available | packages/vitepress-plugin-moss/ |
| Vercel AI SDK | Coming soon | — |
| CrewAI | Coming soon | — |
┌─────────────────────────────────────────────────┐
│ Your Application │
│ (Voice bot, Copilot, Chat agent) │
└────────────────────┬────────────────────────────┘
│
┌──────────▼──────────┐
│ Moss SDK │
│(Python / TypeScript)│
└──────────┬──────────┘
│ HTTPS
┌──────────▼──────────┐
│ Moss Runtime │
│ ┌───────────────┐ │
│ │ Embedding │ │
│ │ Engine │ │
│ └───────┬───────┘ │
│ ┌───────▼───────┐ │
│ │ Search │ │
│ │ Runtime │◄─┼── Sub-10 ms queries
│ └───────────────┘ │
└─────────────────────┘
The SDKs in this repo are thin clients that talk to the Moss runtime over HTTPS. The runtime handles embedding, indexing, and search — you don't need to manage any infrastructure.
We welcome contributions! Here's where the community can have the most impact:
- New SDK bindings — Swift, Go, Elixir,...
- Framework integrations — Vercel AI SDK, CrewAI, Haystack, AutoGen
- Reranking support — plug in cross-encoder rerankers
- Doc-parsing connectors — PDF, DOCX, HTML, Markdown ingestion
- Examples and tutorials — if you build something with Moss, we'd love to feature it
See our Contributing Guide for setup instructions and our Roadmap for what's planned.
Check out issues labeled good first issue to get started.
- Discord — ask questions, share what you're building
- GitHub Issues — bug reports and feature requests
- Twitter — announcements and updates
BSD 2-Clause License — the SDKs, examples, and integrations in this repo are fully open source.

