You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
**Vectorless** is a reasoning-native document intelligence engine written in Rust — **no vector database, no embeddings, no similarity search**. It transforms documents into hierarchical semantic trees and uses LLMs to navigate the structure, retrieving the most relevant content through deep contextual understanding instead of vector math.
16
+
**Vectorless** is a reasoning-native document engine designed to be the foundational layer for AI applications that need structured access to documents, with the core written in Rust. It does not use vector databases, embeddings, or similarity search. Instead, it transforms documents into hierarchical semantic trees and uses the LLM itself to navigate and retrieve — purely LLM-guided, from indexing to querying.
17
17
18
+
---
18
19
19
-
## Quick Start
20
-
21
-
### Install
20
+
## Why Vectorless
22
21
23
-
```bash
24
-
pip install vectorless
25
-
```
26
-
27
-
### Index and Query
28
-
29
-
```python
30
-
import asyncio
31
-
from vectorless import Engine, IndexContext, QueryContext
22
+
Most document retrieval solutions rely on vector similarity — splitting documents into chunks, embedding them, and searching by cosine distance. This works for rough topic matching, but breaks down when you need **precision**: specific numbers, cross-section references, or multi-step reasoning across a document.
32
23
33
-
asyncdefmain():
34
-
# Create engine — api_key and model are required
35
-
engine = Engine(
36
-
api_key="sk-...",
37
-
model="gpt-4o",
38
-
)
24
+
Vectorless takes a different approach. No vectors at all. It builds a **semantic tree index** of each document — preserving the original hierarchy — and uses the LLM itself to navigate that structure. The LLM generates the tree during indexing and reasons through it during retrieval. Pure LLM guidance, end to end.
39
25
40
-
# Index a document (PDF or Markdown)
41
-
result =await engine.index(IndexContext.from_path("./report.pdf"))
QueryContext::new("What is the total revenue?").with_doc_ids(vec![doc_id.to_string()])
60
+
QueryContext::new("What is the total revenue?")
61
+
.with_doc_ids(vec![doc_id.to_string()])
79
62
).await?;
80
-
println!("Answer: {}", result.content);
63
+
println!("{}", result.content);
81
64
82
65
Ok(())
83
66
}
84
67
```
85
-
</details>
68
+
69
+
### Python
70
+
71
+
```bash
72
+
pip install vectorless
73
+
```
74
+
75
+
```python
76
+
import asyncio
77
+
from vectorless import Engine, IndexContext, QueryContext
78
+
79
+
asyncdefmain():
80
+
engine = Engine(api_key="sk-...", model="gpt-4o")
81
+
82
+
# Index a document
83
+
result =await engine.index(IndexContext.from_path("./report.pdf"))
84
+
doc_id = result.doc_id
85
+
86
+
# Query
87
+
result =await engine.query(
88
+
QueryContext("What is the total revenue?").with_doc_ids([doc_id])
89
+
)
90
+
print(result.single().content)
91
+
92
+
asyncio.run(main())
93
+
```
94
+
95
+
## Core Concepts
96
+
97
+
### Semantic Tree Index
98
+
99
+
When you index a document, Vectorless builds a tree structure that mirrors the document's hierarchy:
100
+
101
+
```
102
+
Annual Report 2024
103
+
├── Executive Summary
104
+
│ ├── Financial Highlights
105
+
│ └── Strategic Outlook
106
+
├── Financial Statements
107
+
│ ├── Revenue Analysis ← "What is the total revenue?" lands here
108
+
│ ├── Operating Expenses
109
+
│ └── Net Income
110
+
└── Risk Factors
111
+
├── Market Risks
112
+
└── Regulatory Risks
113
+
```
114
+
115
+
Each node contains a summary generated by the LLM. During retrieval, the engine uses these summaries to reason about which path to follow — just like a human would scan a table of contents.
116
+
117
+
### Cross-Document Graph
118
+
119
+
When multiple documents are indexed, Vectorless builds a relationship graph connecting them through shared keywords and concepts. This enables queries across your entire document collection.
120
+
121
+
```python
122
+
# Query across all indexed documents
123
+
result =await engine.query(
124
+
QueryContext("Compare revenue trends across all reports")
125
+
)
126
+
```
127
+
128
+
### Workspace Persistence
129
+
130
+
Indexed documents are stored in a workspace — there's no need to reprocess files between sessions:
0 commit comments