Skip to main content

Embedded vector database + living context engine — multimodal pockets, context graph, adaptive decay, MCP server

Project description

Feather DB

Embedded vector database + self-aligned context engine

Part of Hawky.ai — AI-Native Development Tools

PyPI Crates.io License: MIT Website HuggingFace Space

Feather DB is an embedded vector database and living context engine — zero-server, file-based, with a built-in knowledge graph, adaptive memory decay, LLM agent connectors, and a self-aligned ingestion engine that organises data automatically.


What's Inside

Capability Description
ANN Search Sub-millisecond approximate nearest-neighbor search via HNSW
Multimodal Pockets Text, image, audio vectors per entity under a single ID
Context Graph Typed + weighted edges, reverse index, auto-link by similarity
Context Chain One call: vector search + n-hop BFS graph expansion
Living Context Recall-count stickiness — frequently accessed items resist temporal decay
Namespace / Entity / Attributes Generic partition + subject + KV metadata for any domain
Graph Visualizer Self-contained D3 force-graph HTML — fully offline, no CDN
LLM Agent Connectors Claude, OpenAI, Gemini tool-use/function-calling with 14 Feather tools
MCP Server feather-serve — connects Feather to Claude Desktop, Cursor, and any MCP client
LangChain / LlamaIndex Drop-in FeatherVectorStore, FeatherMemory, FeatherRetriever adapters
Self-Aligned Context Engine LLM-powered ingestion: auto-classifies, scores, links, and namespaces every record
Single-file persistence .feather binary format (v5); v3/v4 files load transparently

Installation

pip install feather-db            # core
pip install "feather-db[all]"     # + langchain, llamaindex, mcp extras

CLI (Rust):

cargo install feather-db-cli

Build from source:

git clone https://github.com/feather-store/feather
cd feather
python setup.py build_ext --inplace

Quick Start

import feather_db
import numpy as np

# Open or create a database
db = feather_db.DB.open("context.feather", dim=768)

# Add a vector with metadata
meta = feather_db.Metadata()
meta.content = "User prefers dark mode"
meta.importance = 0.9
db.add(id=1, vec=np.random.rand(768).astype(np.float32), meta=meta)

# Semantic search
results = db.search(np.random.rand(768).astype(np.float32), k=5)
for r in results:
    print(r.id, r.score, r.metadata.content)

db.save()

Self-Aligned Context Engine (v0.7.0)

The ContextEngine wraps DB with an LLM-powered ingestion pipeline. Drop in any text — the engine classifies it, scores it, links it to related records, and stores it in the right namespace. No schema to define upfront.

from feather_db import ContextEngine, ClaudeProvider
import numpy as np, hashlib

def embed(text: str) -> np.ndarray:
    # replace with your real embedder
    vec = np.zeros(768, dtype=np.float32)
    for i, tok in enumerate(text.split()[:768]):
        vec[i % 768] += 1.0
    n = np.linalg.norm(vec)
    return vec / n if n > 0 else vec

engine = ContextEngine(
    db_path  = "knowledge.feather",
    dim      = 768,
    provider = ClaudeProvider(),   # or OpenAIProvider, GeminiProvider, OllamaProvider, None
    embedder = embed,
    namespace = "myapp",
)

nid = engine.ingest(
    "Competitor X launched a developer SDK with MIT license — 10k GitHub stars in 24 hours."
)

The engine automatically:

  • Classifies entity type (competitor_intel, user_feedback, strategy_brief, …)
  • Scores importance (0–1) and confidence (0–1)
  • Assigns TTL and namespace
  • Suggests and creates graph edges to related records

Works offline too — pass provider=None for a built-in heuristic classifier (no API key needed):

engine = ContextEngine(db_path="k.feather", dim=768, provider=None, embedder=embed)

Supported Providers

from feather_db import ClaudeProvider, OpenAIProvider, OllamaProvider, GeminiProvider

ClaudeProvider(model="claude-haiku-4-5-20251001")         # Anthropic
OpenAIProvider(model="gpt-4o-mini")                        # OpenAI
OpenAIProvider(model="llama-3.3-70b-versatile",            # Groq
               base_url="https://api.groq.com/openai/v1",
               api_key=GROQ_KEY)
OllamaProvider(model="llama3.1:8b")                        # Ollama (local, no key)
GeminiProvider(model="gemini-2.0-flash")                   # Google Gemini

All providers share the same LLMProvider interface — swap at any time without changing the rest of your code.


LLM Agent Connectors (v0.6.0)

Give any LLM agent native access to Feather DB via 14 built-in tools.

Claude (tool_use)

import anthropic
from feather_db import ClaudeConnector

connector = ClaudeConnector(db=db, embedder=embed)
client    = anthropic.Anthropic()

messages = [{"role": "user", "content": "What competitor moves should I watch?"}]
reply    = connector.run_loop(client, messages, model="claude-opus-4-6")
print(reply)

OpenAI / Groq / vLLM (function_calling)

from openai import OpenAI
from feather_db import OpenAIConnector

connector = OpenAIConnector(db=db, embedder=embed)
client    = OpenAI()

resp = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[{"role": "user", "content": "Find records about onboarding friction"}],
    tools=connector.tools(),
)
result = connector.handle(resp.choices[0].message.tool_calls[0].function.name,
                          resp.choices[0].message.tool_calls[0].function.arguments)

Available Tools (14)

Tool Description
feather_search Semantic vector search
feather_context_chain Vector search + graph BFS expansion
feather_get_node Retrieve a single record by ID
feather_get_related Get all graph-linked records
feather_add_intel Store a new record with metadata
feather_link_nodes Create a typed weighted edge
feather_timeline Time-ordered records in a range
feather_forget Drop a record by ID
feather_health Database health report
feather_why Explain why a record was retrieved
feather_mmr_search Maximal marginal relevance search
feather_consolidate Merge near-duplicate records
feather_episode_get Retrieve an episode by ID
feather_expire Purge records past their TTL

MCP Server (v0.6.0)

Connect Feather DB to Claude Desktop, Cursor, or any MCP-compatible client:

pip install "feather-db[mcp]"
feather-serve --db knowledge.feather --dim 768

Add to claude_desktop_config.json:

{
  "mcpServers": {
    "feather": {
      "command": "feather-serve",
      "args": ["--db", "/path/to/knowledge.feather", "--dim", "768"]
    }
  }
}

All 14 tools become available to Claude Desktop immediately — no code required.


LangChain Integration (v0.6.0)

from feather_db.integrations import FeatherVectorStore, FeatherMemory, FeatherRetriever

# Drop-in VectorStore
store     = FeatherVectorStore(db=db, embedder=embed)
retriever = store.as_retriever(search_kwargs={"k": 5})

# Semantic conversation memory with adaptive decay
memory = FeatherMemory(db=db, embedder=embed, k=5)

# context_chain retriever
retriever = FeatherRetriever(db=db, embedder=embed, k=5, hops=2)

LlamaIndex Integration (v0.6.0)

from feather_db.integrations import FeatherVectorStoreIndex, FeatherReader

# Index documents
index = FeatherVectorStoreIndex.from_documents(documents, db=db, embed_model=embed_model)
query_engine = index.as_query_engine()
response = query_engine.query("What is our retention strategy?")

# Load existing Feather DB as LlamaIndex Documents
reader = FeatherReader(db=db)
docs   = reader.load_data()

Core Features

Multimodal Pockets

Each named modality gets its own independent HNSW index and dimensionality. A single entity ID can hold text, visual, and audio vectors simultaneously.

db.add(id=42, vec=text_vec,   modality="text")    # 768-dim
db.add(id=42, vec=image_vec,  modality="visual")  # 512-dim
db.add(id=42, vec=audio_vec,  modality="audio")   # 256-dim

results = db.search(query_vec, k=10, modality="visual")

Context Graph

Typed, weighted edges between records. Nine built-in relationship types plus free-form strings.

from feather_db import RelType

db.link(from_id=1, to_id=2, rel_type=RelType.CAUSED_BY, weight=0.9)
db.link(from_id=1, to_id=3, rel_type=RelType.SUPPORTS,  weight=0.7)

edges    = db.get_edges(1)      # outgoing edges
incoming = db.get_incoming(2)   # reverse index

db.auto_link(modality="text", threshold=0.85, rel_type=RelType.RELATED_TO)

Built-in types: related_to, derived_from, caused_by, contradicts, supports, precedes, part_of, references, multimodal_of.

Context Chain

One call that combines semantic search with n-hop BFS graph traversal:

result = db.context_chain(query=query_vec, k=5, hops=2, modality="text")

for node in result.nodes:
    print(node.id, node.score, node.hop_distance)
for edge in result.edges:
    print(edge.source_id, "->", edge.target_id, edge.rel_type)

Filtered Search

from feather_db import FilterBuilder

results = db.search(
    query_vec, k=10,
    filter=FilterBuilder()
        .namespace("acme")
        .entity("user_123")
        .attribute("channel", "instagram")
        .importance_gte(0.5)
        .build()
)

Living Context / Adaptive Decay

from feather_db import ScoringConfig

cfg     = ScoringConfig(half_life=30.0, weight=0.3, min=0.0)
results = db.search(query_vec, k=10, scoring=cfg)

Formula:

stickiness    = 1 + log(1 + recall_count)
effective_age = age_in_days / stickiness
recency       = 0.5 ^ (effective_age / half_life_days)
final_score   = ((1 - time_weight) * similarity + time_weight * recency) * importance

touch() is called automatically on every search hit.

Memory Layer (v0.6.0)

from feather_db import MemoryManager

mm = MemoryManager(db)
print(mm.health_report())          # cluster stats + stale records
diverse = mm.search_mmr(vec, k=10) # maximal marginal relevance
mm.consolidate(threshold=0.95)     # merge near-duplicate records
mm.assign_tiers()                  # hot / warm / cold tiering by recall_count

Episodes (v0.6.0)

Group related records into named episodes:

from feather_db import EpisodeManager

em  = EpisodeManager(db)
eid = em.begin_episode("onboarding_analysis")
em.add_to_episode(eid, node_id)
ep  = em.get_episode(eid)
em.close_episode(eid)

Triggers & Contradiction Detection (v0.6.0)

from feather_db import WatchManager, ContradictionDetector

wm = WatchManager(db)
wm.watch(namespace="acme", callback=lambda record: print("New:", record.content))

cd = ContradictionDetector(db)
conflicts = cd.check(new_meta)     # returns list of conflicting record IDs

Namespace / Entity / Attributes

meta = feather_db.Metadata()
meta.namespace_id = "acme"
meta.entity_id    = "user_123"
meta.set_attribute("channel", "instagram")   # use this, NOT meta.attributes['k'] = v
val = meta.get_attribute("channel")

Domain profiles for typed helpers:

from feather_db import MarketingProfile

p = MarketingProfile()
p.set_brand("nike")
p.set_user("user_8821")
p.set_channel("instagram")
p.set_ctr(0.045)
meta = p.to_metadata()

Graph Visualization

from feather_db.graph import visualize, export_graph

visualize(db, output_path="/tmp/graph.html")          # self-contained D3 HTML
data = export_graph(db, namespace_filter="nike")      # Python dict for D3/Cytoscape

Rust CLI

feather add    --db my.feather --id 1 --vec "0.1,0.2,0.3" --modality text
feather search --db my.feather --vec "0.1,0.2,0.3" --k 5
feather link   --db my.feather --from 1 --to 2
feather save   --db my.feather

Performance

Metric Value
Add rate 2,000–5,000 vectors/sec
Search latency (k=10) 0.5–1.5 ms
Max vectors per modality 1,000,000 (configurable)
HNSW params M=16, ef_construction=200
File format Binary .feather v5

SIMD (AVX2/AVX512) optimizations are available in space_l2.h. Enable with -DUSE_AVX -march=native in setup.py.


Cloud Deployment (Azure / Docker)

Feather DB ships with a production-ready FastAPI wrapper and Gradio management dashboard you can deploy on any Linux VM.

git clone https://github.com/feather-store/feather.git
cd feather
export FEATHER_API_KEY="your-secret-key"
docker compose -f feather-api/docker-compose.yml up -d --build
URL Description
http://<VM_IP>:8000/health Health check
http://<VM_IP>:8000/docs Swagger / OpenAPI
http://<VM_IP>:8000/dashboard Management Dashboard UI

Full Azure deployment guide → docs/deploy-azure.md

Data is stored in a Docker named volume (feather-data → /data) and persists across restarts and rebuilds.


Examples

File Description
examples/context_engine_demo.py Self-Aligned Context Engine — all four providers + heuristic fallback
examples/context_graph_demo.py Context graph — auto-link, context_chain, D3 HTML export
examples/marketing_living_context.py Multi-brand namespace/entity/attribute filtering
examples/feather_inspector.py Local HTTP inspector — force graph, PCA scatter, edit, delete

Run:

python setup.py build_ext --inplace
python3 examples/context_engine_demo.py

# With a provider:
ANTHROPIC_API_KEY=sk-ant-... python3 examples/context_engine_demo.py
OLLAMA_MODEL=mistral:7b       python3 examples/context_engine_demo.py

Architecture

[Generic Core — C++17]
feather::DB
  ├── modality_indices_  (unordered_map<string, ModalityIndex>)  — one HNSW per modality
  ├── metadata_store_    (unordered_map<uint64_t, Metadata>)     — shared metadata by ID
  └── Methods: add, search, link, context_chain, auto_link, export_graph_json, …

[Python Layer — feather_db]
  ├── DB, Metadata, ContextType, ScoringConfig
  ├── Edge, IncomingEdge, ContextNode, ContextEdge, ContextChainResult
  ├── FilterBuilder         — fluent search filter helper
  ├── DomainProfile         — generic namespace/entity/attributes base class
  ├── MarketingProfile      — digital marketing typed adapter
  ├── RelType               — standard relationship type constants
  ├── graph.visualize()     — D3 force-graph HTML exporter
  ├── MemoryManager         — health reports, MMR, consolidate, tiering
  ├── WatchManager          — namespace/entity watch callbacks
  ├── ContradictionDetector — conflict detection on ingest
  ├── EpisodeManager        — grouped episode records
  ├── merge()               — merge two .feather files
  ├── LLMProvider / ClaudeProvider / OpenAIProvider / OllamaProvider / GeminiProvider
  ├── ContextEngine         — self-aligned LLM-powered ingestion pipeline
  └── integrations/
      ├── ClaudeConnector   — Claude tool_use with 14 Feather tools
      ├── OpenAIConnector   — OpenAI/Groq/Mistral function_calling
      ├── GeminiConnector   — Gemini function_calling + GeminiEmbedder
      ├── FeatherVectorStore / FeatherMemory / FeatherRetriever  (LangChain)
      ├── FeatherVectorStoreIndex / FeatherReader                 (LlamaIndex)
      └── mcp_server        — feather-serve MCP endpoint

[Rust CLI]
feather-db-cli (FFI via extern "C" from src/feather_core.cpp)

File Format

[magic: 4B = "FEAT"] [version: 4B = 5]
--- Metadata Section ---
[meta_count: 4B]
  for each record:
    [id: 8B] [serialized Metadata including namespace/entity/attributes/edges]
--- Modality Indices Section ---
[modal_count: 4B]
  for each modality:
    [name_len: 2B] [name: N bytes]
    [dim: 4B] [element_count: 4B]
    for each element:
      [id: 8B] [float32 vector: dim * 4 bytes]

v3 and v4 files load transparently — missing fields default to empty.


Known Limitations

Issue Detail
No concurrent writes HNSW is not thread-safe for simultaneous adds
No vector deletion HNSW marks deletions; data stays until compaction
Max 1M vectors/modality Hardcoded in get_or_create_index; increase max_elements to raise
meta.attributes['k'] = v silent no-op pybind11 map copy; use meta.set_attribute(k, v)
Rust CLI missing v0.6.0+ features namespace/entity/context_chain/integrations are Python-only

Contributing

  1. Fork the repository
  2. Create a feature branch
  3. Make your changes with tests
  4. Submit a pull request

License

MIT — see LICENSE


Acknowledgments

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

feather_db-0.8.0.tar.gz (1.2 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

feather_db-0.8.0-cp313-cp313-macosx_15_0_arm64.whl (1.0 MB view details)

Uploaded CPython 3.13macOS 15.0+ ARM64

File details

Details for the file feather_db-0.8.0.tar.gz.

File metadata

  • Download URL: feather_db-0.8.0.tar.gz
  • Upload date:
  • Size: 1.2 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.7

File hashes

Hashes for feather_db-0.8.0.tar.gz
Algorithm Hash digest
SHA256 16acf915f6094d87e99d09027d1b45e59e80c22aadef96209262d95c1870ad62
MD5 9df04291485824eb81e7277f96472860
BLAKE2b-256 d7ff3f06d02c8d858be07490dbcedcef877a835cddbd476d2d5668d7511de2bd

See more details on using hashes here.

File details

Details for the file feather_db-0.8.0-cp313-cp313-macosx_15_0_arm64.whl.

File metadata

File hashes

Hashes for feather_db-0.8.0-cp313-cp313-macosx_15_0_arm64.whl
Algorithm Hash digest
SHA256 900797a91d0aebb98553296931864b145a9272476afb22d3fc2251a4c542b24f
MD5 f5f238528c73ba2b134796fdf5986a3f
BLAKE2b-256 0936c6e83cf4f2cdef67fc0223cff704114bd6a6d164b1500ad8c0e9fbe4ecab

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page