This document is the single authoritative guide for understanding, modifying, and extending Feather DB. Read it fully before making any changes.
Feather DB is a lightweight, embedded vector database and living context engine written in C++17 with Python and Rust bindings — zero-server, file-based, designed for AI context and embeddings.
Core Value Proposition:
- Sub-millisecond ANN (Approximate Nearest Neighbour) search via HNSW
- Native multimodal support (text, image, audio vectors per entity)
- Embedded Contextual Graph — typed+weighted edges, reverse index, auto-link by similarity
- Adaptive Decay / Living Context — frequently accessed items resist temporal decay
- Namespace + Entity + Attributes — generic partition + subject + KV metadata for any domain
- Graph visualizer — self-contained D3 force-graph HTML, fully offline
- Single file persistence (
.featherbinary format, v5; v3/v4 files load transparently)
Version: 0.5.0 (Context Graph + Living Context Engine)
feather/
├── include/ # C++ headers (core logic lives here)
│ ├── feather.h # ← MAIN DB CLASS (read this first)
│ ├── metadata.h # Metadata struct + ContextType enum + Edge struct
│ ├── scoring.h # Scorer + ScoringConfig (decay logic)
│ ├── filter.h # SearchFilter struct
│ ├── hnswalg.h # HNSW index algorithm (hnswlib fork)
│ ├── hnswlib.h # hnswlib base interfaces
│ ├── space_l2.h # L2/Euclidean distance (SIMD optimized)
│ ├── space_ip.h # Inner Product distance space
│ ├── bruteforce.h # Brute force fallback index
│ ├── visited_list_pool.h # Visited node pool for HNSW search
│ └── stop_condition.h # Search stop condition interface
├── src/
│ ├── feather_core.cpp # C-compatible extern "C" wrappers (for Rust CLI)
│ ├── metadata.cpp # Metadata serialize/deserialize
│ ├── filter.cpp # Filter logic
│ └── scoring.cpp # Scorer (thin wrapper; logic in .h)
├── bindings/
│ └── feather.cpp # pybind11 Python bindings (the Python API bridge)
├── feather_db/
│ ├── __init__.py # Python package exports (DB, Metadata, RelType, etc.)
│ ├── filter.py # Python FilterBuilder helper class
│ ├── domain_profiles.py # DomainProfile base + MarketingProfile adapter
│ ├── graph.py # export_graph(), visualize(), RelType constants
│ └── d3.min.js # D3.js v7.9.0 inlined for offline visualization
├── feather-cli/ # Rust CLI crate (feather-db-cli)
│ ├── src/main.rs # CLI entry point
│ ├── src/lib.rs # CLI command implementations
│ ├── build.rs # Rust build script (links C++ core)
│ └── Cargo.toml # Rust package manifest (v0.5.0)
├── feather-api/ # FastAPI Cloud wrapper
│ ├── app/main.py # FastAPI app
│ ├── app/db_manager.py # DB lifecycle management
│ ├── app/models.py # Pydantic models
│ ├── Dockerfile # Multi-stage Docker build
│ └── docker-compose.yml # Local compose stack
├── examples/ # Working Python usage examples
│ ├── context_graph_demo.py # Full context graph demo
│ ├── marketing_living_context.py # Namespace/entity/attribute filtering
│ └── feather_inspector.py # Local HTTP inspector app
├── benchmarks/ # Performance benchmarks
│ └── stress_test.py # 100k vector stress test
├── real_data/ # Real dataset files (not committed)
├── p-test/ # Rust CLI integration tests
├── setup.py # Python build (compiles C++ extension)
├── pyproject.toml # PEP 517 metadata (version: 0.5.0)
├── MANIFEST.in # Source distribution manifest
└── CHANGELOG.md # Version history
This is the single most important file. All logic flows through here.
namespace feather {
class DB {
private:
// Each named modality (e.g., "text", "visual", "audio") has its own HNSW index.
// This is the "Multimodal Pocket" design.
std::unordered_map<std::string, ModalityIndex> modality_indices_;
std::string path_;
// Centralized metadata store keyed by entity ID (uint64_t).
// A record can span multiple modality indices but shares ONE Metadata entry.
std::unordered_map<uint64_t, Metadata> metadata_store_;
// Reverse edge index (built in-memory on load)
std::unordered_map<uint64_t, std::vector<IncomingEdge>> incoming_index_;
};
}Key Design Decisions:
- Multimodal via multiple HNSW indices: each call to
add(id, vec, meta, modality)routes to the correct namedModalityIndex. New modalities are created on-demand. - Shared metadata by ID: a single
Metadataobject tracks all cross-modal data (edges, recall_count, importance, timestamps) for a given entity ID. - HNSW params:
M=16,ef_construction=200,max_elements=1,000,000per modality index (hardcoded inget_or_create_index). ef(search beam width) defaults to10. Higher = more accurate but slower.- Reverse edge index: rebuilt from
metadata_store_edges on everyload(). Not persisted separately.
[magic: 4B = 0x46454154 "FEAT"] [version: 4B = 5]
--- Metadata Section ---
[meta_count: 4B]
for each record:
[id: 8B]
[timestamp: 8B] [importance: 4B] [type: 4B]
[source_len: 2B][source: N]
[content_len: 4B][content: N]
[tags_len: 2B][tags_json: N]
[edge_count: 2B]
for each edge: [target_id: 8B][rel_len: 1B][rel_type: N][weight: 4B]
[recall_count: 4B] [last_recalled_at: 8B]
[ns_len: 2B][namespace_id: N]
[eid_len: 2B][entity_id: N]
[attr_count: 2B]
for each attr: [key_len: 2B][key: N][val_len: 4B][val: N]
--- Modality Indices Section ---
[modal_count: 4B]
for each modality:
[name_len: 2B] [name: N bytes]
[dim: 4B] [element_count: 4B]
for each element:
[id: 8B] [float32 vector: dim * 4 bytes]
Backward compatibility: v3 and v4 files load transparently — missing fields default to empty via if (is.read(...)) guards in metadata.cpp.
struct Edge {
uint64_t target_id;
std::string rel_type; // e.g. "caused_by", "supports", free-form strings ok
float weight; // [0.0-1.0]
};
struct IncomingEdge {
uint64_t source_id;
std::string rel_type;
float weight;
};
struct Metadata {
int64_t timestamp; // Unix timestamp of creation
float importance; // Relevance weight [0.0–1.0], default 1.0
ContextType type; // FACT | PREFERENCE | EVENT | CONVERSATION
std::string source; // Origin identifier (e.g., "gpt-4o", "user")
std::string content; // Human-readable text content
std::string tags_json; // JSON array string: '["tag1","tag2"]'
// Context Graph (v0.5.0)
std::vector<Edge> edges; // Typed weighted outgoing edges (replaces flat links)
// Living Context
uint32_t recall_count; // Incremented on every search hit (via touch())
uint64_t last_recalled_at; // Unix timestamp of last retrieval
// Namespace / Entity / Attributes (v0.4.0)
std::string namespace_id; // partition key
std::string entity_id; // subject key
std::unordered_map<std::string,std::string> attributes; // domain KV pairs
};CRITICAL GOTCHA: meta.attributes['k'] = v silently does nothing in Python (pybind11 returns a copy of the map). Always use meta.set_attribute(key, value) and meta.get_attribute(key).
Backward-compat: Python meta.links property still works (returns list of target IDs from edges).
| Value | int | Meaning |
|---|---|---|
FACT |
0 | Static knowledge |
PREFERENCE |
1 | User preference or setting |
EVENT |
2 | Time-bound occurrence |
CONVERSATION |
3 | Dialog turn or message |
All fields are std::optional — only set fields are evaluated.
struct SearchFilter {
optional<vector<ContextType>> types;
optional<string> source;
optional<string> source_prefix;
optional<int64_t> timestamp_after;
optional<int64_t> timestamp_before;
optional<float> importance_gte;
optional<vector<string>> tags_contains;
// v0.4.0 additions:
optional<string> namespace_id; // exact namespace match
optional<string> entity_id; // exact entity match
optional<unordered_map<string,string>> attributes_match; // all KV must match
};The Adaptive Decay formula:
stickiness = 1 + log(1 + recall_count) # grows with access frequency
effective_age = age_in_days / stickiness # sticky items age slower
recency = 0.5 ^ (effective_age / half_life_days)
final_score = ((1 - time_weight) * similarity + time_weight * recency) * importance
Default config: half_life=30 days, time_weight=0.3, min_weight=0.0
pip install feather-db # v0.5.0 on PyPI
# or from source:
python setup.py build_ext --inplacefrom feather_db import (
DB, ContextType, Metadata, ScoringConfig,
Edge, IncomingEdge,
ContextNode, ContextEdge, ContextChainResult,
FilterBuilder,
DomainProfile, MarketingProfile,
visualize, export_graph, RelType,
)import feather_db, numpy as np, time
db = feather_db.DB.open("my_context.feather", dim=768)
# --- Metadata ---
meta = feather_db.Metadata()
meta.timestamp = int(time.time())
meta.importance = 0.9
meta.type = feather_db.ContextType.FACT
meta.source = "pipeline-v1"
meta.content = "User prefers dark mode"
meta.namespace_id = "acme"
meta.entity_id = "user_123"
meta.set_attribute("channel", "instagram") # ← use this, NOT meta.attributes['k'] = v
db.add(id=42, vec=np.random.rand(768).astype(np.float32), meta=meta)
# --- Multimodal ---
db.add(id=42, vec=np.random.rand(512).astype(np.float32), modality="visual")
# --- Search ---
results = db.search(query_vec, k=10)
results = db.search(query_vec, k=5, modality="visual")
# --- Filtered search ---
from feather_db import FilterBuilder
f = FilterBuilder().namespace("acme").entity("user_123").attribute("channel", "instagram").build()
results = db.search(query_vec, k=10, filter=f)
# --- Scored search (adaptive decay) ---
cfg = feather_db.ScoringConfig(half_life=30.0, weight=0.3, min=0.0)
results = db.search(query_vec, k=10, scoring=cfg)
# SearchResult fields: r.id, r.score, r.metadata
for r in results:
print(r.id, r.score, r.metadata.content)
# --- Typed graph edges ---
db.link(from_id=1, to_id=2, rel_type="caused_by", weight=0.9)
edges = db.get_edges(1) # list[Edge]
incoming = db.get_incoming(2) # list[IncomingEdge]
# --- Auto-link by similarity ---
db.auto_link(modality="text", threshold=0.85, rel_type="related_to")
# --- Context chain (vector search + BFS graph expansion) ---
result = db.context_chain(query=query_vec, k=5, hops=2, modality="text")
for node in result.nodes:
print(node.id, node.score, node.hop_distance)
# --- Export / import ---
json_str = db.export_graph_json(namespace_filter="acme", entity_filter="")
vec = db.get_vector(id=42, modality="text") # returns np.ndarray
ids = db.get_all_ids(modality="visual") # returns list[int]
# --- Metadata-only updates (no HNSW touch) ---
db.update_metadata(id=42, meta=new_meta)
db.update_importance(id=42, importance=0.95)
# --- Salience ---
db.touch(id=42) # manual boost; called automatically on search hits
meta = db.get_metadata(42)
db.save()from feather_db import MarketingProfile
p = MarketingProfile()
p.set_brand("nike")
p.set_user("user_8821")
p.set_channel("instagram")
p.set_ctr(0.045)
p.set_roas(3.2)
meta = p.to_metadata()from feather_db.graph import visualize, export_graph
visualize(db, output_path="/tmp/graph.html") # self-contained D3 HTML
data = export_graph(db, namespace_filter="nike") # Python dictCrate on Crates.io: feather-db-cli v0.5.0
feather add --db my.feather --id 1 --vec "0.1,0.2,0.3" --modality text
feather search --db my.feather --vec "0.1,0.2,0.3" --k 5
feather link --db my.feather --from 1 --to 2
feather save --db my.featherThe Rust CLI calls the extern "C" C ABI from src/feather_core.cpp via the FFI bridge in build.rs. The CLI does not yet expose v0.4.0/v0.5.0 graph features (namespace, attributes, context_chain) — those are Python-only for now.
Key points:
DBis bound withpy::nodeleteto prevent Python from double-deleting (destructor callssave()).add()acceptspy::array_t<float>(NumPy arrays) and copies data intostd::vector<float>.search()accepts optional raw pointers toSearchFilterandScoringConfig.get_vector()returnspy::array_t<float>.meta.attributesmap mutation viameta.attributes['k'] = vsilently does nothing — pybind11 returns a copy. Usemeta.set_attribute(k, v).
# Development build (inplace, fastest)
python setup.py build_ext --inplace
# Production wheel
python setup.py sdist bdist_wheelsetup.py compiles:
bindings/feather.cpp(pybind11 entry point)src/filter.cpp,src/metadata.cpp,src/scoring.cpp- Flags:
-O3 -std=c++17
# In setup.py extra_compile_args:
"-DUSE_AVX", "-march=native", "-ffast-math"cd feather-cli
cargo build --release
cargo publish # requires cargo loginEach modality gets its own HNSW index. You cannot search text vectors against the visual index.
db.add(id=1, vec=np.rand(768), modality="text") # 768-dim
db.add(id=1, vec=np.rand(512), modality="visual") # 512-dim, independent# WRONG — silently does nothing
meta.attributes["channel"] = "instagram"
# CORRECT
meta.set_attribute("channel", "instagram")
value = meta.get_attribute("channel", default="")Every search() increments recall_count for all returned records. Call touch() manually only to boost salience outside of search.
importlib.reload() does NOT reload compiled .so files. Start a fresh Python process to pick up recompiled bindings.
Each modality index is initialized with max_elements=1,000,000. Inserting beyond this crashes. Modify get_or_create_index() in include/feather.h to change this limit.
feather::DB::~DB() calls save(). Call db.save() explicitly in long-running processes.
If a record exists in the edge list but not in the metadata store (e.g., added without metadata), export_graph_json filters those dangling edges automatically via the exported_ids set.
When adding a new feature to Feather DB, touch these files in order:
include/metadata.h— Add new field toMetadatastructsrc/metadata.cpp— Updateserialize()anddeserialize()include/feather.h— Add new method toDBclasssrc/feather_core.cpp— Addextern "C"wrapper for Rust/FFIbindings/feather.cpp— Expose to Python via pybind11feather_db/__init__.py— Export from Python packagefeather-cli/src/lib.rs— Add CLI command in Rustexamples/— Add a usage exampleCHANGELOG.md— Document the change
| Issue | Details |
|---|---|
| No concurrent writes | HNSW is not thread-safe for simultaneous addPoint calls |
| No vector deletion | HNSW marks deletions but data stays |
tags_json is a raw string |
Tag filtering uses substring search, not JSON parsing |
| Max 1M vectors per modality | Hardcoded in get_or_create_index |
meta.attributes['k'] = v no-op |
pybind11 map copy; use set_attribute() |
| Load time for large attribute DBs | v4/v5 attribute map deserialization is O(n * attrs) |
| Rust CLI missing v0.5.0 features | namespace/entity/context_chain are Python-only for now |
source repro_venv/bin/activate
python3 examples/context_graph_demo.py
python3 examples/marketing_living_context.py
python3 examples/feather_inspector.py # local inspector at http://localhost:7777
python3 benchmarks/stress_test.py
cd p-test && ./run_tests.sh # Rust CLI tests| Phase | Status | Highlights |
|---|---|---|
| Phase 1 | Done | Basic HNSW, L2 search, .feather file format |
| Phase 2 | Done | Context Engine: metadata, types, time-decay scoring, filtered search |
| Phase 3 | Done | Multimodal pockets, graph links, adaptive decay |
| Phase 4a | Done | Generic namespace/entity/attributes (v0.4.0) |
| Phase 4b | Done | FastAPI + Docker cloud wrapper in feather-api/ |
| Phase 5 | Done | Typed edges, reverse index, auto_link, context_chain, D3 visualizer (v0.5.0) |
| Phase 6 | Planned | SIMD tuning, parallel ingestion, load-time optimization |