Document Engine for AI

Vectorless is a reasoning-native document engine designed to be the foundational layer for AI applications that need structured access to documents, with the core written in Rust. It does not use vector databases, embeddings, or similarity search. Instead, it will reason through any of your structured documents — PDFs, Markdown, reports, contracts — and retrieve only what's relevant. Nothing more, nothing less.

How It Works

Quick Start

Rust

[dependencies]
vectorless = "0.1"

use vectorless::client::{EngineBuilder, IndexContext, QueryContext};

#[tokio::main]
async fn main() -> vectorless::Result<()> {
    let engine = EngineBuilder::new()
        .with_key("sk-...")
        .with_model("gpt-4o")
        .with_endpoint("https://api.openai.com/v1")
        .build()
        .await?;

    // Index a document
    let result = engine.index(IndexContext::from_path("./report.pdf")).await?;
    let doc_id = result.doc_id().unwrap();

    // Query
    let result = engine.query(
        QueryContext::new("What is the total revenue?")
            .with_doc_ids(vec![doc_id.to_string()])
    ).await?;
    println!("{}", result.content);

    Ok(())
}

Python

pip install vectorless

import asyncio
from vectorless import Engine, IndexContext, QueryContext

async def main():
    engine = Engine(api_key="sk-...", model="gpt-4o", endpoint="https://api.openai.com/v1")

    # Index a document
    result = await engine.index(IndexContext.from_path("./report.pdf"))
    doc_id = result.doc_id

    # Query
    result = await engine.query(
        QueryContext("What is the total revenue?").with_doc_ids([doc_id])
    )
    print(result.single().content)

asyncio.run(main())

Core Concepts

Semantic Tree Index

When you index a document, Vectorless builds a tree structure that mirrors the document's hierarchy:

Annual Report 2024
├── Executive Summary
│   ├── Financial Highlights
│   └── Strategic Outlook
├── Financial Statements
│   ├── Revenue Analysis        ← "What is the total revenue?" lands here
│   ├── Operating Expenses
│   └── Net Income
└── Risk Factors
    ├── Market Risks
    └── Regulatory Risks

Each node contains a summary generated by the LLM. During retrieval, the engine uses these summaries to reason about which path to follow — just like a human would scan a table of contents.

Cross-Document Graph

When multiple documents are indexed, Vectorless builds a relationship graph connecting them through shared keywords and concepts. This enables queries across your entire document collection.

# Query across all indexed documents
result = await engine.query(
    QueryContext("Compare revenue trends across all reports")
)

Workspace Persistence

Indexed documents are stored in a workspace — there's no need to reprocess files between sessions:

engine = Engine(api_key="sk-...", model="gpt-4o", endpoint="https://api.openai.com/v1")

# List all indexed documents
docs = await engine.list()
for doc in docs:
    print(f"{doc.name} ({doc.format}) — {doc.page_count} pages")

What It's For

Vectorless is designed for applications that need precise document retrieval:

Financial analysis — Extract specific figures from reports, compare across filings
Legal research — Find relevant clauses, trace definitions across documents
Technical documentation — Navigate large manuals, locate specific procedures
Academic research — Cross-reference findings across papers
Compliance — Audit trails with source references for every answer

Examples

See examples/ for complete usage patterns.

Contributing

Contributions welcome! If you find this useful, please ⭐ the repo — it helps others discover it.

Star History

License

Apache License 2.0

Name		Name	Last commit message	Last commit date
Latest commit History 494 Commits
.github		.github
docs		docs
examples		examples
python		python
rust		rust
samples		samples
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
Cargo.toml		Cargo.toml
LICENSE		LICENSE
README.md		README.md
SECURITY.md		SECURITY.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Document Engine for AI

How It Works

Quick Start

Rust

Python

Core Concepts

Semantic Tree Index

Cross-Document Graph

Workspace Persistence

What It's For

Examples

Contributing

Star History

License

About

Uh oh!

Releases

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Document Engine for AI

How It Works

Quick Start

Rust

Python

Core Concepts

Semantic Tree Index

Cross-Document Graph

Workspace Persistence

What It's For

Examples

Contributing

Star History

License

About

Topics

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Contributors

Uh oh!

Languages