Embedded vector database + living context engine — multimodal pockets, context graph, adaptive decay, MCP server
Project description
Feather DB
Embedded vector database + self-aligned context engine
Part of Hawky.ai — AI-Native Development Tools
Feather DB is an embedded vector database and living context engine — zero-server, file-based, with a built-in knowledge graph, adaptive memory decay, LLM agent connectors, and a self-aligned ingestion engine that organises data automatically.
What's Inside
| Capability | Description |
|---|---|
| ANN Search | Sub-millisecond approximate nearest-neighbor search via HNSW |
| Multimodal Pockets | Text, image, audio vectors per entity under a single ID |
| Context Graph | Typed + weighted edges, reverse index, auto-link by similarity |
| Context Chain | One call: vector search + n-hop BFS graph expansion |
| Living Context | Recall-count stickiness — frequently accessed items resist temporal decay |
| Namespace / Entity / Attributes | Generic partition + subject + KV metadata for any domain |
| Graph Visualizer | Self-contained D3 force-graph HTML — fully offline, no CDN |
| LLM Agent Connectors | Claude, OpenAI, Gemini tool-use/function-calling with 14 Feather tools |
| MCP Server | feather-serve — connects Feather to Claude Desktop, Cursor, and any MCP client |
| LangChain / LlamaIndex | Drop-in FeatherVectorStore, FeatherMemory, FeatherRetriever adapters |
| Self-Aligned Context Engine | LLM-powered ingestion: auto-classifies, scores, links, and namespaces every record |
| Single-file persistence | .feather binary format (v5); v3/v4 files load transparently |
Installation
pip install feather-db # core
pip install "feather-db[all]" # + langchain, llamaindex, mcp extras
CLI (Rust):
cargo install feather-db-cli
Build from source:
git clone https://github.com/feather-store/feather
cd feather
python setup.py build_ext --inplace
Quick Start
import feather_db
import numpy as np
# Open or create a database
db = feather_db.DB.open("context.feather", dim=768)
# Add a vector with metadata
meta = feather_db.Metadata()
meta.content = "User prefers dark mode"
meta.importance = 0.9
db.add(id=1, vec=np.random.rand(768).astype(np.float32), meta=meta)
# Semantic search
results = db.search(np.random.rand(768).astype(np.float32), k=5)
for r in results:
print(r.id, r.score, r.metadata.content)
db.save()
Self-Aligned Context Engine (v0.7.0)
The ContextEngine wraps DB with an LLM-powered ingestion pipeline. Drop in any text — the engine classifies it, scores it, links it to related records, and stores it in the right namespace. No schema to define upfront.
from feather_db import ContextEngine, ClaudeProvider
import numpy as np, hashlib
def embed(text: str) -> np.ndarray:
# replace with your real embedder
vec = np.zeros(768, dtype=np.float32)
for i, tok in enumerate(text.split()[:768]):
vec[i % 768] += 1.0
n = np.linalg.norm(vec)
return vec / n if n > 0 else vec
engine = ContextEngine(
db_path = "knowledge.feather",
dim = 768,
provider = ClaudeProvider(), # or OpenAIProvider, GeminiProvider, OllamaProvider, None
embedder = embed,
namespace = "myapp",
)
nid = engine.ingest(
"Competitor X launched a developer SDK with MIT license — 10k GitHub stars in 24 hours."
)
The engine automatically:
- Classifies entity type (
competitor_intel,user_feedback,strategy_brief, …) - Scores importance (0–1) and confidence (0–1)
- Assigns TTL and namespace
- Suggests and creates graph edges to related records
Works offline too — pass provider=None for a built-in heuristic classifier (no API key needed):
engine = ContextEngine(db_path="k.feather", dim=768, provider=None, embedder=embed)
Supported Providers
from feather_db import ClaudeProvider, OpenAIProvider, OllamaProvider, GeminiProvider
ClaudeProvider(model="claude-haiku-4-5-20251001") # Anthropic
OpenAIProvider(model="gpt-4o-mini") # OpenAI
OpenAIProvider(model="llama-3.3-70b-versatile", # Groq
base_url="https://api.groq.com/openai/v1",
api_key=GROQ_KEY)
OllamaProvider(model="llama3.1:8b") # Ollama (local, no key)
GeminiProvider(model="gemini-2.0-flash") # Google Gemini
All providers share the same LLMProvider interface — swap at any time without changing the rest of your code.
LLM Agent Connectors (v0.6.0)
Give any LLM agent native access to Feather DB via 14 built-in tools.
Claude (tool_use)
import anthropic
from feather_db import ClaudeConnector
connector = ClaudeConnector(db=db, embedder=embed)
client = anthropic.Anthropic()
messages = [{"role": "user", "content": "What competitor moves should I watch?"}]
reply = connector.run_loop(client, messages, model="claude-opus-4-6")
print(reply)
OpenAI / Groq / vLLM (function_calling)
from openai import OpenAI
from feather_db import OpenAIConnector
connector = OpenAIConnector(db=db, embedder=embed)
client = OpenAI()
resp = client.chat.completions.create(
model="gpt-4o-mini",
messages=[{"role": "user", "content": "Find records about onboarding friction"}],
tools=connector.tools(),
)
result = connector.handle(resp.choices[0].message.tool_calls[0].function.name,
resp.choices[0].message.tool_calls[0].function.arguments)
Available Tools (14)
| Tool | Description |
|---|---|
feather_search |
Semantic vector search |
feather_context_chain |
Vector search + graph BFS expansion |
feather_get_node |
Retrieve a single record by ID |
feather_get_related |
Get all graph-linked records |
feather_add_intel |
Store a new record with metadata |
feather_link_nodes |
Create a typed weighted edge |
feather_timeline |
Time-ordered records in a range |
feather_forget |
Drop a record by ID |
feather_health |
Database health report |
feather_why |
Explain why a record was retrieved |
feather_mmr_search |
Maximal marginal relevance search |
feather_consolidate |
Merge near-duplicate records |
feather_episode_get |
Retrieve an episode by ID |
feather_expire |
Purge records past their TTL |
MCP Server (v0.6.0)
Connect Feather DB to Claude Desktop, Cursor, or any MCP-compatible client:
pip install "feather-db[mcp]"
feather-serve --db knowledge.feather --dim 768
Add to claude_desktop_config.json:
{
"mcpServers": {
"feather": {
"command": "feather-serve",
"args": ["--db", "/path/to/knowledge.feather", "--dim", "768"]
}
}
}
All 14 tools become available to Claude Desktop immediately — no code required.
LangChain Integration (v0.6.0)
from feather_db.integrations import FeatherVectorStore, FeatherMemory, FeatherRetriever
# Drop-in VectorStore
store = FeatherVectorStore(db=db, embedder=embed)
retriever = store.as_retriever(search_kwargs={"k": 5})
# Semantic conversation memory with adaptive decay
memory = FeatherMemory(db=db, embedder=embed, k=5)
# context_chain retriever
retriever = FeatherRetriever(db=db, embedder=embed, k=5, hops=2)
LlamaIndex Integration (v0.6.0)
from feather_db.integrations import FeatherVectorStoreIndex, FeatherReader
# Index documents
index = FeatherVectorStoreIndex.from_documents(documents, db=db, embed_model=embed_model)
query_engine = index.as_query_engine()
response = query_engine.query("What is our retention strategy?")
# Load existing Feather DB as LlamaIndex Documents
reader = FeatherReader(db=db)
docs = reader.load_data()
Core Features
Multimodal Pockets
Each named modality gets its own independent HNSW index and dimensionality. A single entity ID can hold text, visual, and audio vectors simultaneously.
db.add(id=42, vec=text_vec, modality="text") # 768-dim
db.add(id=42, vec=image_vec, modality="visual") # 512-dim
db.add(id=42, vec=audio_vec, modality="audio") # 256-dim
results = db.search(query_vec, k=10, modality="visual")
Context Graph
Typed, weighted edges between records. Nine built-in relationship types plus free-form strings.
from feather_db import RelType
db.link(from_id=1, to_id=2, rel_type=RelType.CAUSED_BY, weight=0.9)
db.link(from_id=1, to_id=3, rel_type=RelType.SUPPORTS, weight=0.7)
edges = db.get_edges(1) # outgoing edges
incoming = db.get_incoming(2) # reverse index
db.auto_link(modality="text", threshold=0.85, rel_type=RelType.RELATED_TO)
Built-in types: related_to, derived_from, caused_by, contradicts, supports, precedes, part_of, references, multimodal_of.
Context Chain
One call that combines semantic search with n-hop BFS graph traversal:
result = db.context_chain(query=query_vec, k=5, hops=2, modality="text")
for node in result.nodes:
print(node.id, node.score, node.hop_distance)
for edge in result.edges:
print(edge.source_id, "->", edge.target_id, edge.rel_type)
Filtered Search
from feather_db import FilterBuilder
results = db.search(
query_vec, k=10,
filter=FilterBuilder()
.namespace("acme")
.entity("user_123")
.attribute("channel", "instagram")
.importance_gte(0.5)
.build()
)
Living Context / Adaptive Decay
from feather_db import ScoringConfig
cfg = ScoringConfig(half_life=30.0, weight=0.3, min=0.0)
results = db.search(query_vec, k=10, scoring=cfg)
Formula:
stickiness = 1 + log(1 + recall_count)
effective_age = age_in_days / stickiness
recency = 0.5 ^ (effective_age / half_life_days)
final_score = ((1 - time_weight) * similarity + time_weight * recency) * importance
touch() is called automatically on every search hit.
Memory Layer (v0.6.0)
from feather_db import MemoryManager
mm = MemoryManager(db)
print(mm.health_report()) # cluster stats + stale records
diverse = mm.search_mmr(vec, k=10) # maximal marginal relevance
mm.consolidate(threshold=0.95) # merge near-duplicate records
mm.assign_tiers() # hot / warm / cold tiering by recall_count
Episodes (v0.6.0)
Group related records into named episodes:
from feather_db import EpisodeManager
em = EpisodeManager(db)
eid = em.begin_episode("onboarding_analysis")
em.add_to_episode(eid, node_id)
ep = em.get_episode(eid)
em.close_episode(eid)
Triggers & Contradiction Detection (v0.6.0)
from feather_db import WatchManager, ContradictionDetector
wm = WatchManager(db)
wm.watch(namespace="acme", callback=lambda record: print("New:", record.content))
cd = ContradictionDetector(db)
conflicts = cd.check(new_meta) # returns list of conflicting record IDs
Namespace / Entity / Attributes
meta = feather_db.Metadata()
meta.namespace_id = "acme"
meta.entity_id = "user_123"
meta.set_attribute("channel", "instagram") # use this, NOT meta.attributes['k'] = v
val = meta.get_attribute("channel")
Domain profiles for typed helpers:
from feather_db import MarketingProfile
p = MarketingProfile()
p.set_brand("nike")
p.set_user("user_8821")
p.set_channel("instagram")
p.set_ctr(0.045)
meta = p.to_metadata()
Graph Visualization
from feather_db.graph import visualize, export_graph
visualize(db, output_path="/tmp/graph.html") # self-contained D3 HTML
data = export_graph(db, namespace_filter="nike") # Python dict for D3/Cytoscape
Rust CLI
feather add --db my.feather --id 1 --vec "0.1,0.2,0.3" --modality text
feather search --db my.feather --vec "0.1,0.2,0.3" --k 5
feather link --db my.feather --from 1 --to 2
feather save --db my.feather
Performance
| Metric | Value |
|---|---|
| Add rate | 2,000–5,000 vectors/sec |
| Search latency (k=10) | 0.5–1.5 ms |
| Max vectors per modality | 1,000,000 (configurable) |
| HNSW params | M=16, ef_construction=200 |
| File format | Binary .feather v5 |
SIMD (AVX2/AVX512) optimizations are available in space_l2.h. Enable with -DUSE_AVX -march=native in setup.py.
Cloud Deployment (Azure / Docker)
Feather DB ships with a production-ready FastAPI wrapper and Gradio management dashboard you can deploy on any Linux VM.
git clone https://github.com/feather-store/feather.git
cd feather
export FEATHER_API_KEY="your-secret-key"
docker compose -f feather-api/docker-compose.yml up -d --build
| URL | Description |
|---|---|
http://<VM_IP>:8000/health |
Health check |
http://<VM_IP>:8000/docs |
Swagger / OpenAPI |
http://<VM_IP>:8000/dashboard |
Management Dashboard UI |
Full Azure deployment guide → docs/deploy-azure.md
Data is stored in a Docker named volume (
feather-data → /data) and persists across restarts and rebuilds.
Examples
| File | Description |
|---|---|
examples/context_engine_demo.py |
Self-Aligned Context Engine — all four providers + heuristic fallback |
examples/context_graph_demo.py |
Context graph — auto-link, context_chain, D3 HTML export |
examples/marketing_living_context.py |
Multi-brand namespace/entity/attribute filtering |
examples/feather_inspector.py |
Local HTTP inspector — force graph, PCA scatter, edit, delete |
Run:
python setup.py build_ext --inplace
python3 examples/context_engine_demo.py
# With a provider:
ANTHROPIC_API_KEY=sk-ant-... python3 examples/context_engine_demo.py
OLLAMA_MODEL=mistral:7b python3 examples/context_engine_demo.py
Architecture
[Generic Core — C++17]
feather::DB
├── modality_indices_ (unordered_map<string, ModalityIndex>) — one HNSW per modality
├── metadata_store_ (unordered_map<uint64_t, Metadata>) — shared metadata by ID
└── Methods: add, search, link, context_chain, auto_link, export_graph_json, …
[Python Layer — feather_db]
├── DB, Metadata, ContextType, ScoringConfig
├── Edge, IncomingEdge, ContextNode, ContextEdge, ContextChainResult
├── FilterBuilder — fluent search filter helper
├── DomainProfile — generic namespace/entity/attributes base class
├── MarketingProfile — digital marketing typed adapter
├── RelType — standard relationship type constants
├── graph.visualize() — D3 force-graph HTML exporter
├── MemoryManager — health reports, MMR, consolidate, tiering
├── WatchManager — namespace/entity watch callbacks
├── ContradictionDetector — conflict detection on ingest
├── EpisodeManager — grouped episode records
├── merge() — merge two .feather files
├── LLMProvider / ClaudeProvider / OpenAIProvider / OllamaProvider / GeminiProvider
├── ContextEngine — self-aligned LLM-powered ingestion pipeline
└── integrations/
├── ClaudeConnector — Claude tool_use with 14 Feather tools
├── OpenAIConnector — OpenAI/Groq/Mistral function_calling
├── GeminiConnector — Gemini function_calling + GeminiEmbedder
├── FeatherVectorStore / FeatherMemory / FeatherRetriever (LangChain)
├── FeatherVectorStoreIndex / FeatherReader (LlamaIndex)
└── mcp_server — feather-serve MCP endpoint
[Rust CLI]
feather-db-cli (FFI via extern "C" from src/feather_core.cpp)
File Format
[magic: 4B = "FEAT"] [version: 4B = 5]
--- Metadata Section ---
[meta_count: 4B]
for each record:
[id: 8B] [serialized Metadata including namespace/entity/attributes/edges]
--- Modality Indices Section ---
[modal_count: 4B]
for each modality:
[name_len: 2B] [name: N bytes]
[dim: 4B] [element_count: 4B]
for each element:
[id: 8B] [float32 vector: dim * 4 bytes]
v3 and v4 files load transparently — missing fields default to empty.
Known Limitations
| Issue | Detail |
|---|---|
| No concurrent writes | HNSW is not thread-safe for simultaneous adds |
| No vector deletion | HNSW marks deletions; data stays until compaction |
| Max 1M vectors/modality | Hardcoded in get_or_create_index; increase max_elements to raise |
meta.attributes['k'] = v silent no-op |
pybind11 map copy; use meta.set_attribute(k, v) |
| Rust CLI missing v0.6.0+ features | namespace/entity/context_chain/integrations are Python-only |
Contributing
- Fork the repository
- Create a feature branch
- Make your changes with tests
- Submit a pull request
License
MIT — see LICENSE
Acknowledgments
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file feather_db-0.8.0.tar.gz.
File metadata
- Download URL: feather_db-0.8.0.tar.gz
- Upload date:
- Size: 1.2 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
16acf915f6094d87e99d09027d1b45e59e80c22aadef96209262d95c1870ad62
|
|
| MD5 |
9df04291485824eb81e7277f96472860
|
|
| BLAKE2b-256 |
d7ff3f06d02c8d858be07490dbcedcef877a835cddbd476d2d5668d7511de2bd
|
File details
Details for the file feather_db-0.8.0-cp313-cp313-macosx_15_0_arm64.whl.
File metadata
- Download URL: feather_db-0.8.0-cp313-cp313-macosx_15_0_arm64.whl
- Upload date:
- Size: 1.0 MB
- Tags: CPython 3.13, macOS 15.0+ ARM64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
900797a91d0aebb98553296931864b145a9272476afb22d3fc2251a4c542b24f
|
|
| MD5 |
f5f238528c73ba2b134796fdf5986a3f
|
|
| BLAKE2b-256 |
0936c6e83cf4f2cdef67fc0223cff704114bd6a6d164b1500ad8c0e9fbe4ecab
|