Skip to content

spaaleks/vectory

Repository files navigation

Vectory

Vectory is a robust Node.js and LangChain.js powered service for document indexing and retrieval (RAG). It provides a unified API for semantic search, multi-provider OCR, and structured metadata extraction, allowing you to turn unstructured documents into a queryable knowledge base.


Features

  • Retrieval Augmented Generation (RAG) API: OpenAI-compatible chat completion endpoints with built-in semantic search.
  • Multi-Provider OCR: Seamless integration with Docling and Spal-OCR for high-quality PDF parsing.
  • Smart Image Handling: Automatically analyzes, filters, and uploads OCR-extracted images to S3 storage.
  • Flexible Vector Search: Hybrid search support (Trigram + FTS + Vector) using PostgreSQL and pgvector.
  • Advanced Metadata Tagging: Extensible tagging system for automatic document classification based on path patterns and file attributes.
  • Observability: Built-in Langfuse integration for tracing and monitoring chat performance.
  • YAML-Driven Configuration: Modular configuration system with support for fragments, environment variables, and hot-merging.

Docker

Vectory is designed to run in a containerized environment.

Standalone

docker run --rm \
  -p 3000:3000 \
  -v "$(pwd)/config:/app/config:ro" \
  -e CONFIG_PATH=/app/config \
  docker.io/spaleks/vectory:latest

Docker Compose

services:
  vectory:
    image: spaleks/vectory:latest
    container_name: vectory
    restart: unless-stopped
    environment:
      - LOG_LEVEL=info
      - CONFIG_PATH=/app/config
    volumes:
      - ./config:/app/config:ro
    ports:
      - "3000:3000"

Configuration

Vectory loads all RAG configs in a folder. Point CONFIG_PATH to a directory and Vectory will read every *.yml in rag/ (or top-level YAMLs if no rag/ folder exists). This lets you run multiple RAGs in one process.

Folder Layout (example)

config/
  chat_tokens.yml
  shared/
    database.yml
    model.yml
    services.yml
  rag/
    cd1.yml
    cd2.yml

Steps

  1. Adapt config/ or copy it to your own config directory.
  2. Update credentials and model keys.
  3. Add one or more RAG files under rag/ (e.g., rag1.yml, rag2.yml).
  4. Set CONFIG_PATH to your config directory (not a single file).

Example: minimal RAG file

global:
  embedding_model: "mxbai-embed-large"
  chat_model: "gemini-2.5-flash"

aggregators:
  - name: "notes"
    type: "folder"
    path: "/data/notes"
    pattern: "**/*.md"

Environment Variables

Core

  • CONFIG_PATH – Path to the config directory (loads all rag/*.yml).
  • PORT – Port to bind the HTTP service (default: 3000).
  • LOG_LEVEL – Logging verbosity (debug, info, warn, error).

Observability

  • LANGFUSE_PUBLIC_KEY
  • LANGFUSE_SECRET_KEY
  • LANGFUSE_ENDPOINT

License

MIT

About

Vectory is a robust Node.js and LangChain.js powered service for document indexing and retrieval (RAG). It provides a unified API for semantic search, multi-provider OCR, and structured metadata extraction, allowing you to turn unstructured documents into a queryable knowledge base.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors