Feather DB

Embedded vector database + self-aligned context engine

Part of Hawky.ai — AI-Native Development Tools

Feather DB is an embedded vector database and living context engine — zero-server, file-based, with a built-in knowledge graph, adaptive memory decay, LLM agent connectors, and a self-aligned ingestion engine that organises data automatically.

What's Inside

Capability	Description
ANN Search	Sub-millisecond approximate nearest-neighbor search via HNSW
Multimodal Pockets	Text, image, audio vectors per entity under a single ID
Context Graph	Typed + weighted edges, reverse index, auto-link by similarity
Context Chain	One call: vector search + n-hop BFS graph expansion
Living Context	Recall-count stickiness — frequently accessed items resist temporal decay
Namespace / Entity / Attributes	Generic partition + subject + KV metadata for any domain
Graph Visualizer	Self-contained D3 force-graph HTML — fully offline, no CDN
LLM Agent Connectors	Claude, OpenAI, Gemini tool-use/function-calling with 14 Feather tools
MCP Server	`feather-serve` — connects Feather to Claude Desktop, Cursor, and any MCP client
LangChain / LlamaIndex	Drop-in `FeatherVectorStore`, `FeatherMemory`, `FeatherRetriever` adapters
Self-Aligned Context Engine	LLM-powered ingestion: auto-classifies, scores, links, and namespaces every record
Single-file persistence	`.feather` binary format (v5); v3/v4 files load transparently

Installation

pip install feather-db            # core
pip install "feather-db[all]"     # + langchain, llamaindex, mcp extras

CLI (Rust):

cargo install feather-db-cli

Build from source:

git clone https://github.com/feather-store/feather
cd feather
python setup.py build_ext --inplace

Quick Start

import feather_db
import numpy as np

# Open or create a database
db = feather_db.DB.open("context.feather", dim=768)

# Add a vector with metadata
meta = feather_db.Metadata()
meta.content = "User prefers dark mode"
meta.importance = 0.9
db.add(id=1, vec=np.random.rand(768).astype(np.float32), meta=meta)

# Semantic search
results = db.search(np.random.rand(768).astype(np.float32), k=5)
for r in results:
    print(r.id, r.score, r.metadata.content)

db.save()

Self-Aligned Context Engine (v0.7.0)

The ContextEngine wraps DB with an LLM-powered ingestion pipeline. Drop in any text — the engine classifies it, scores it, links it to related records, and stores it in the right namespace. No schema to define upfront.

from feather_db import ContextEngine, ClaudeProvider
import numpy as np, hashlib

def embed(text: str) -> np.ndarray:
    # replace with your real embedder
    vec = np.zeros(768, dtype=np.float32)
    for i, tok in enumerate(text.split()[:768]):
        vec[i % 768] += 1.0
    n = np.linalg.norm(vec)
    return vec / n if n > 0 else vec

engine = ContextEngine(
    db_path  = "knowledge.feather",
    dim      = 768,
    provider = ClaudeProvider(),   # or OpenAIProvider, GeminiProvider, OllamaProvider, None
    embedder = embed,
    namespace = "myapp",
)

nid = engine.ingest(
    "Competitor X launched a developer SDK with MIT license — 10k GitHub stars in 24 hours."
)

The engine automatically:

Classifies entity type (competitor_intel, user_feedback, strategy_brief, …)
Scores importance (0–1) and confidence (0–1)
Assigns TTL and namespace
Suggests and creates graph edges to related records

Works offline too — pass provider=None for a built-in heuristic classifier (no API key needed):

engine = ContextEngine(db_path="k.feather", dim=768, provider=None, embedder=embed)

Supported Providers

from feather_db import ClaudeProvider, OpenAIProvider, OllamaProvider, GeminiProvider

ClaudeProvider(model="claude-haiku-4-5-20251001")         # Anthropic
OpenAIProvider(model="gpt-4o-mini")                        # OpenAI
OpenAIProvider(model="llama-3.3-70b-versatile",            # Groq
               base_url="https://api.groq.com/openai/v1",
               api_key=GROQ_KEY)
OllamaProvider(model="llama3.1:8b")                        # Ollama (local, no key)
GeminiProvider(model="gemini-2.0-flash")                   # Google Gemini

All providers share the same LLMProvider interface — swap at any time without changing the rest of your code.

LLM Agent Connectors (v0.6.0)

Give any LLM agent native access to Feather DB via 14 built-in tools.

Claude (tool_use)

import anthropic
from feather_db import ClaudeConnector

connector = ClaudeConnector(db=db, embedder=embed)
client    = anthropic.Anthropic()

messages = [{"role": "user", "content": "What competitor moves should I watch?"}]
reply    = connector.run_loop(client, messages, model="claude-opus-4-6")
print(reply)

OpenAI / Groq / vLLM (function_calling)

from openai import OpenAI
from feather_db import OpenAIConnector

connector = OpenAIConnector(db=db, embedder=embed)
client    = OpenAI()

resp = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[{"role": "user", "content": "Find records about onboarding friction"}],
    tools=connector.tools(),
)
result = connector.handle(resp.choices[0].message.tool_calls[0].function.name,
                          resp.choices[0].message.tool_calls[0].function.arguments)

Available Tools (14)

Tool	Description
`feather_search`	Semantic vector search
`feather_context_chain`	Vector search + graph BFS expansion
`feather_get_node`	Retrieve a single record by ID
`feather_get_related`	Get all graph-linked records
`feather_add_intel`	Store a new record with metadata
`feather_link_nodes`	Create a typed weighted edge
`feather_timeline`	Time-ordered records in a range
`feather_forget`	Drop a record by ID
`feather_health`	Database health report
`feather_why`	Explain why a record was retrieved
`feather_mmr_search`	Maximal marginal relevance search
`feather_consolidate`	Merge near-duplicate records
`feather_episode_get`	Retrieve an episode by ID
`feather_expire`	Purge records past their TTL

MCP Server (v0.6.0)

Connect Feather DB to Claude Desktop, Cursor, or any MCP-compatible client:

pip install "feather-db[mcp]"
feather-serve --db knowledge.feather --dim 768

Add to claude_desktop_config.json:

{
  "mcpServers": {
    "feather": {
      "command": "feather-serve",
      "args": ["--db", "/path/to/knowledge.feather", "--dim", "768"]
    }
  }
}

All 14 tools become available to Claude Desktop immediately — no code required.

LangChain Integration (v0.6.0)

from feather_db.integrations import FeatherVectorStore, FeatherMemory, FeatherRetriever

# Drop-in VectorStore
store     = FeatherVectorStore(db=db, embedder=embed)
retriever = store.as_retriever(search_kwargs={"k": 5})

# Semantic conversation memory with adaptive decay
memory = FeatherMemory(db=db, embedder=embed, k=5)

# context_chain retriever
retriever = FeatherRetriever(db=db, embedder=embed, k=5, hops=2)

LlamaIndex Integration (v0.6.0)

from feather_db.integrations import FeatherVectorStoreIndex, FeatherReader

# Index documents
index = FeatherVectorStoreIndex.from_documents(documents, db=db, embed_model=embed_model)
query_engine = index.as_query_engine()
response = query_engine.query("What is our retention strategy?")

# Load existing Feather DB as LlamaIndex Documents
reader = FeatherReader(db=db)
docs   = reader.load_data()

Core Features

Multimodal Pockets

Each named modality gets its own independent HNSW index and dimensionality. A single entity ID can hold text, visual, and audio vectors simultaneously.

db.add(id=42, vec=text_vec,   modality="text")    # 768-dim
db.add(id=42, vec=image_vec,  modality="visual")  # 512-dim
db.add(id=42, vec=audio_vec,  modality="audio")   # 256-dim

results = db.search(query_vec, k=10, modality="visual")

Context Graph

Typed, weighted edges between records. Nine built-in relationship types plus free-form strings.

from feather_db import RelType

db.link(from_id=1, to_id=2, rel_type=RelType.CAUSED_BY, weight=0.9)
db.link(from_id=1, to_id=3, rel_type=RelType.SUPPORTS,  weight=0.7)

edges    = db.get_edges(1)      # outgoing edges
incoming = db.get_incoming(2)   # reverse index

db.auto_link(modality="text", threshold=0.85, rel_type=RelType.RELATED_TO)

Built-in types: related_to, derived_from, caused_by, contradicts, supports, precedes, part_of, references, multimodal_of.

Context Chain

One call that combines semantic search with n-hop BFS graph traversal:

result = db.context_chain(query=query_vec, k=5, hops=2, modality="text")

for node in result.nodes:
    print(node.id, node.score, node.hop_distance)
for edge in result.edges:
    print(edge.source_id, "->", edge.target_id, edge.rel_type)

Filtered Search

from feather_db import FilterBuilder

results = db.search(
    query_vec, k=10,
    filter=FilterBuilder()
        .namespace("acme")
        .entity("user_123")
        .attribute("channel", "instagram")
        .importance_gte(0.5)
        .build()
)

Living Context / Adaptive Decay

from feather_db import ScoringConfig

cfg     = ScoringConfig(half_life=30.0, weight=0.3, min=0.0)
results = db.search(query_vec, k=10, scoring=cfg)

Formula:

stickiness    = 1 + log(1 + recall_count)
effective_age = age_in_days / stickiness
recency       = 0.5 ^ (effective_age / half_life_days)
final_score   = ((1 - time_weight) * similarity + time_weight * recency) * importance

touch() is called automatically on every search hit.

Memory Layer (v0.6.0)

from feather_db import MemoryManager

mm = MemoryManager(db)
print(mm.health_report())          # cluster stats + stale records
diverse = mm.search_mmr(vec, k=10) # maximal marginal relevance
mm.consolidate(threshold=0.95)     # merge near-duplicate records
mm.assign_tiers()                  # hot / warm / cold tiering by recall_count

Episodes (v0.6.0)

Group related records into named episodes:

from feather_db import EpisodeManager

em  = EpisodeManager(db)
eid = em.begin_episode("onboarding_analysis")
em.add_to_episode(eid, node_id)
ep  = em.get_episode(eid)
em.close_episode(eid)

Triggers & Contradiction Detection (v0.6.0)

from feather_db import WatchManager, ContradictionDetector

wm = WatchManager(db)
wm.watch(namespace="acme", callback=lambda record: print("New:", record.content))

cd = ContradictionDetector(db)
conflicts = cd.check(new_meta)     # returns list of conflicting record IDs

Namespace / Entity / Attributes

meta = feather_db.Metadata()
meta.namespace_id = "acme"
meta.entity_id    = "user_123"
meta.set_attribute("channel", "instagram")   # use this, NOT meta.attributes['k'] = v
val = meta.get_attribute("channel")

Domain profiles for typed helpers:

from feather_db import MarketingProfile

p = MarketingProfile()
p.set_brand("nike")
p.set_user("user_8821")
p.set_channel("instagram")
p.set_ctr(0.045)
meta = p.to_metadata()

Graph Visualization

from feather_db.graph import visualize, export_graph

visualize(db, output_path="/tmp/graph.html")          # self-contained D3 HTML
data = export_graph(db, namespace_filter="nike")      # Python dict for D3/Cytoscape

Rust CLI

feather add    --db my.feather --id 1 --vec "0.1,0.2,0.3" --modality text
feather search --db my.feather --vec "0.1,0.2,0.3" --k 5
feather link   --db my.feather --from 1 --to 2
feather save   --db my.feather

Performance

Metric	Value
Add rate	2,000–5,000 vectors/sec
Search latency (k=10)	0.5–1.5 ms
Max vectors per modality	1,000,000 (configurable)
HNSW params	M=16, ef_construction=200
File format	Binary `.feather` v5

SIMD (AVX2/AVX512) optimizations are available in space_l2.h. Enable with -DUSE_AVX -march=native in setup.py.

Cloud Deployment (Azure / Docker)

Feather DB ships with a production-ready FastAPI wrapper and Gradio management dashboard you can deploy on any Linux VM.

git clone https://github.com/feather-store/feather.git
cd feather
export FEATHER_API_KEY="your-secret-key"
docker compose -f feather-api/docker-compose.yml up -d --build

URL	Description
`http://<VM_IP>:8000/health`	Health check
`http://<VM_IP>:8000/docs`	Swagger / OpenAPI
`http://<VM_IP>:8000/dashboard`	Management Dashboard UI

Full Azure deployment guide → docs/deploy-azure.md

Data is stored in a Docker named volume (feather-data → /data) and persists across restarts and rebuilds.

Examples

File	Description
`examples/context_engine_demo.py`	Self-Aligned Context Engine — all four providers + heuristic fallback
`examples/context_graph_demo.py`	Context graph — auto-link, context_chain, D3 HTML export
`examples/marketing_living_context.py`	Multi-brand namespace/entity/attribute filtering
`examples/feather_inspector.py`	Local HTTP inspector — force graph, PCA scatter, edit, delete

Run:

python setup.py build_ext --inplace
python3 examples/context_engine_demo.py

# With a provider:
ANTHROPIC_API_KEY=sk-ant-... python3 examples/context_engine_demo.py
OLLAMA_MODEL=mistral:7b       python3 examples/context_engine_demo.py

Architecture

[Generic Core — C++17]
feather::DB
  ├── modality_indices_  (unordered_map<string, ModalityIndex>)  — one HNSW per modality
  ├── metadata_store_    (unordered_map<uint64_t, Metadata>)     — shared metadata by ID
  └── Methods: add, search, link, context_chain, auto_link, export_graph_json, …

[Python Layer — feather_db]
  ├── DB, Metadata, ContextType, ScoringConfig
  ├── Edge, IncomingEdge, ContextNode, ContextEdge, ContextChainResult
  ├── FilterBuilder         — fluent search filter helper
  ├── DomainProfile         — generic namespace/entity/attributes base class
  ├── MarketingProfile      — digital marketing typed adapter
  ├── RelType               — standard relationship type constants
  ├── graph.visualize()     — D3 force-graph HTML exporter
  ├── MemoryManager         — health reports, MMR, consolidate, tiering
  ├── WatchManager          — namespace/entity watch callbacks
  ├── ContradictionDetector — conflict detection on ingest
  ├── EpisodeManager        — grouped episode records
  ├── merge()               — merge two .feather files
  ├── LLMProvider / ClaudeProvider / OpenAIProvider / OllamaProvider / GeminiProvider
  ├── ContextEngine         — self-aligned LLM-powered ingestion pipeline
  └── integrations/
      ├── ClaudeConnector   — Claude tool_use with 14 Feather tools
      ├── OpenAIConnector   — OpenAI/Groq/Mistral function_calling
      ├── GeminiConnector   — Gemini function_calling + GeminiEmbedder
      ├── FeatherVectorStore / FeatherMemory / FeatherRetriever  (LangChain)
      ├── FeatherVectorStoreIndex / FeatherReader                 (LlamaIndex)
      └── mcp_server        — feather-serve MCP endpoint

[Rust CLI]
feather-db-cli (FFI via extern "C" from src/feather_core.cpp)

File Format

[magic: 4B = "FEAT"] [version: 4B = 5]
--- Metadata Section ---
[meta_count: 4B]
  for each record:
    [id: 8B] [serialized Metadata including namespace/entity/attributes/edges]
--- Modality Indices Section ---
[modal_count: 4B]
  for each modality:
    [name_len: 2B] [name: N bytes]
    [dim: 4B] [element_count: 4B]
    for each element:
      [id: 8B] [float32 vector: dim * 4 bytes]

v3 and v4 files load transparently — missing fields default to empty.

Known Limitations

Issue	Detail
No concurrent writes	HNSW is not thread-safe for simultaneous adds
No vector deletion	HNSW marks deletions; data stays until compaction
Max 1M vectors/modality	Hardcoded in `get_or_create_index`; increase `max_elements` to raise
`meta.attributes['k'] = v` silent no-op	pybind11 map copy; use `meta.set_attribute(k, v)`
Rust CLI missing v0.6.0+ features	namespace/entity/context_chain/integrations are Python-only

Contributing

Fork the repository
Create a feature branch
Make your changes with tests
Submit a pull request

License

MIT — see LICENSE

Acknowledgments

HNSW algorithm: hnswlib
Python bindings: pybind11
Rust CLI: clap
Graph visualization: D3.js

Name		Name	Last commit message	Last commit date
Latest commit History 43 Commits
.github/workflows		.github/workflows
benchmarks		benchmarks
bindings		bindings
docs		docs
examples		examples
experiments/gemini_cross_modal		experiments/gemini_cross_modal
feather-api		feather-api
feather-cli		feather-cli
feather_db		feather_db
hf-space		hf-space
include		include
p-test		p-test
src		src
tests		tests
units		units
.dockerignore		.dockerignore
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CLAUDE.md		CLAUDE.md
CONTRIBUTING.md		CONTRIBUTING.md
Dockerfile		Dockerfile
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
PHASE2_GUIDE.md		PHASE2_GUIDE.md
PHASE3_GUIDE.md		PHASE3_GUIDE.md
README.md		README.md
pyproject.toml		pyproject.toml
setup.py		setup.py

Folders and files

Latest commit

History

Repository files navigation

Feather DB

What's Inside

Installation

Quick Start

Self-Aligned Context Engine (v0.7.0)

Supported Providers

LLM Agent Connectors (v0.6.0)

Claude (tool_use)

OpenAI / Groq / vLLM (function_calling)

Available Tools (14)

MCP Server (v0.6.0)

LangChain Integration (v0.6.0)

LlamaIndex Integration (v0.6.0)

Core Features

Multimodal Pockets

Context Graph

Context Chain

Filtered Search

Living Context / Adaptive Decay

Memory Layer (v0.6.0)

Episodes (v0.6.0)

Triggers & Contradiction Detection (v0.6.0)

Namespace / Entity / Attributes

Graph Visualization

Rust CLI

Performance

Cloud Deployment (Azure / Docker)

Examples

Architecture

File Format

Known Limitations

Contributing

License

Acknowledgments

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages