Skip to content

Commit 7341d96

Browse files
authored
Merge pull request #71 from vectorlessflow/dev
docs(readme): update project description and add workflow diagram
2 parents 5c2fca6 + c9b1cc0 commit 7341d96

3 files changed

Lines changed: 352 additions & 38 deletions

File tree

README.md

Lines changed: 102 additions & 38 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,8 @@
11
<div align="center">
22

3-
<img src="https://vectorless.dev/img/with-title.png" alt="Vectorless" width="400" style="vertical-align:middle;">
3+
<img src="https://vectorless.dev/img/with-title.png" alt="Vectorless" width="400">
44

5-
<h1>Reasoning-native Document Intelligence Engine</h1>
5+
<h1>Document Engine for AI</h1>
66

77
[![PyPI](https://img.shields.io/pypi/v/vectorless.svg)](https://pypi.org/project/vectorless/)
88
[![PyPI Downloads](https://static.pepy.tech/badge/vectorless/month)](https://pepy.tech/projects/vectorless)
@@ -13,45 +13,27 @@
1313

1414
</div>
1515

16-
**Vectorless** is a reasoning-native document intelligence engine written in Rust**no vector database, no embeddings, no similarity search**. It transforms documents into hierarchical semantic trees and uses LLMs to navigate the structure, retrieving the most relevant content through deep contextual understanding instead of vector math.
16+
**Vectorless** is a reasoning-native document engine designed to be the foundational layer for AI applications that need structured access to documents, with the core written in Rust. It does not use vector databases, embeddings, or similarity search. Instead, it transforms documents into hierarchical semantic trees and uses the LLM itself to navigate and retrieve — purely LLM-guided, from indexing to querying.
1717

18+
---
1819

19-
## Quick Start
20-
21-
### Install
20+
## Why Vectorless
2221

23-
```bash
24-
pip install vectorless
25-
```
26-
27-
### Index and Query
28-
29-
```python
30-
import asyncio
31-
from vectorless import Engine, IndexContext, QueryContext
22+
Most document retrieval solutions rely on vector similarity — splitting documents into chunks, embedding them, and searching by cosine distance. This works for rough topic matching, but breaks down when you need **precision**: specific numbers, cross-section references, or multi-step reasoning across a document.
3223

33-
async def main():
34-
# Create engine — api_key and model are required
35-
engine = Engine(
36-
api_key="sk-...",
37-
model="gpt-4o",
38-
)
24+
Vectorless takes a different approach. No vectors at all. It builds a **semantic tree index** of each document — preserving the original hierarchy — and uses the LLM itself to navigate that structure. The LLM generates the tree during indexing and reasons through it during retrieval. Pure LLM guidance, end to end.
3925

40-
# Index a document (PDF or Markdown)
41-
result = await engine.index(IndexContext.from_path("./report.pdf"))
42-
doc_id = result.doc_id
26+
<div align="center">
27+
<img src="https://vectorless.dev/img/workflow.svg" alt="Vectorless Workflow" width="720">
28+
</div>
4329

44-
# Query
45-
result = await engine.query(
46-
QueryContext("What is the total revenue?").with_doc_ids([doc_id])
47-
)
48-
print(result.single().content)
30+
<div align="center">
31+
<img src="https://vectorless.dev/img/demo.gif" alt="Vectorless Demo" width="720">
32+
</div>
4933

50-
asyncio.run(main())
51-
```
34+
## Quick Start
5235

53-
<details>
54-
<summary><b>Rust</b></summary>
36+
### Rust
5537

5638
```toml
5739
[dependencies]
@@ -69,24 +51,106 @@ async fn main() -> vectorless::Result<()> {
6951
.build()
7052
.await?;
7153

72-
// Index
54+
// Index a document
7355
let result = engine.index(IndexContext::from_path("./report.pdf")).await?;
7456
let doc_id = result.doc_id().unwrap();
7557

7658
// Query
7759
let result = engine.query(
78-
QueryContext::new("What is the total revenue?").with_doc_ids(vec![doc_id.to_string()])
60+
QueryContext::new("What is the total revenue?")
61+
.with_doc_ids(vec![doc_id.to_string()])
7962
).await?;
80-
println!("Answer: {}", result.content);
63+
println!("{}", result.content);
8164

8265
Ok(())
8366
}
8467
```
85-
</details>
68+
69+
### Python
70+
71+
```bash
72+
pip install vectorless
73+
```
74+
75+
```python
76+
import asyncio
77+
from vectorless import Engine, IndexContext, QueryContext
78+
79+
async def main():
80+
engine = Engine(api_key="sk-...", model="gpt-4o")
81+
82+
# Index a document
83+
result = await engine.index(IndexContext.from_path("./report.pdf"))
84+
doc_id = result.doc_id
85+
86+
# Query
87+
result = await engine.query(
88+
QueryContext("What is the total revenue?").with_doc_ids([doc_id])
89+
)
90+
print(result.single().content)
91+
92+
asyncio.run(main())
93+
```
94+
95+
## Core Concepts
96+
97+
### Semantic Tree Index
98+
99+
When you index a document, Vectorless builds a tree structure that mirrors the document's hierarchy:
100+
101+
```
102+
Annual Report 2024
103+
├── Executive Summary
104+
│ ├── Financial Highlights
105+
│ └── Strategic Outlook
106+
├── Financial Statements
107+
│ ├── Revenue Analysis ← "What is the total revenue?" lands here
108+
│ ├── Operating Expenses
109+
│ └── Net Income
110+
└── Risk Factors
111+
├── Market Risks
112+
└── Regulatory Risks
113+
```
114+
115+
Each node contains a summary generated by the LLM. During retrieval, the engine uses these summaries to reason about which path to follow — just like a human would scan a table of contents.
116+
117+
### Cross-Document Graph
118+
119+
When multiple documents are indexed, Vectorless builds a relationship graph connecting them through shared keywords and concepts. This enables queries across your entire document collection.
120+
121+
```python
122+
# Query across all indexed documents
123+
result = await engine.query(
124+
QueryContext("Compare revenue trends across all reports")
125+
)
126+
```
127+
128+
### Workspace Persistence
129+
130+
Indexed documents are stored in a workspace — there's no need to reprocess files between sessions:
131+
132+
```python
133+
engine = Engine(api_key="sk-...", model="gpt-4o")
134+
135+
# List all indexed documents
136+
docs = await engine.list()
137+
for doc in docs:
138+
print(f"{doc.name} ({doc.format}) — {doc.page_count} pages")
139+
```
140+
141+
## What It's For
142+
143+
Vectorless is designed for applications that need **precise** document retrieval:
144+
145+
- **Financial analysis** — Extract specific figures from reports, compare across filings
146+
- **Legal research** — Find relevant clauses, trace definitions across documents
147+
- **Technical documentation** — Navigate large manuals, locate specific procedures
148+
- **Academic research** — Cross-reference findings across papers
149+
- **Compliance** — Audit trails with source references for every answer
86150

87151
## Examples
88152

89-
See [examples](examples/) for more and stay tuned.
153+
See [examples/](examples/) for complete usage patterns.
90154

91155
## Contributing
92156

docs/static/img/demo.gif

1.47 MB
Loading

0 commit comments

Comments
 (0)