-
Notifications
You must be signed in to change notification settings - Fork 1
Comparing changes
Open a pull request
base repository: feather-store/feather
base: master@{1day}
head repository: feather-store/feather
compare: master
- 6 commits
- 17 files changed
- 1 contributor
Commits on Apr 26, 2026
-
arXiv paper: add LongMemEval section + 4 new refs; draft blog post
Paper updates (docs/featherdb_paper.tex + .pdf): - New §4.7 'End-to-End Memory Benchmark: LongMemEval' with the 0.657 S-variant headline, configuration, per-axis breakdown, three supported claims, reproduction pointer. - Comparison table referencing Mem0/Zep/Supermemory/full-context numbers. - Bibliography: new entries for Xu et al. 2024 (LongMemEval), Rasmussen et al. 2025 (Zep), Mem0 token-efficient blog, Supermemory research page. - Recompiled with tectonic (97KB). docs/blog/longmemeval-results.md: - Public-facing draft post. Headline: 'Feather DB beats GPT-4o full-context on LongMemEval — using a free-tier model'. - Three claims, three caveats, reproduction command, per-axis table vs Zep + Supermemory, and a pointer to docs/benchmarks/longmemeval.md for the long-form report.
Configuration menu - View commit details
-
Copy full SHA for a5f0958 - Browse repository at this point
Copy the full SHA a5f0958View commit details -
Add Azure OpenAI chat provider for benchmark answerer/judge
- bench/providers_azure.py: AzureChatProvider implementing the LLMProvider.complete() interface, env-driven via AZURE_OPENAI_CHAT_{ENDPOINT,API_KEY,DEPLOYMENT,API_VERSION}. Falls back to AZURE_OPENAI_{ENDPOINT,API_KEY} so a single Azure resource works for both embeddings and chat without renaming. - bench/judges_llm.py: 'azure' / 'azure-openai' as valid provider names in _provider_from_name(). Lazy import keeps the openai SDK optional for users on Gemini/Claude. - bench/__main__.py: extend --judge-provider / --answerer-provider choices. Smoke: GPT-4o on Azure replies '43' to '75 minus 32'. Ready to run LongMemEval_S with GPT-4o answerer to measure how much of the 0.657 -> 0.816 gap to Supermemory is model-class.Configuration menu - View commit details
-
Copy full SHA for 83af5a0 - Browse repository at this point
Copy the full SHA 83af5a0View commit details -
LongMemEval_S with GPT-4o answerer: 0.693 (+3.6pp over gemini-flash)
Same retrieval pipeline (Feather + Azure text-embedding-3-small + adaptive decay), GPT-4o answerer instead of gemini-2.5-flash. Wall ~272 min, 5/500 failures (same embedder context-length issue as before), ~$7-9 total. Per-axis vs Gemini-Flash run (same retrieval): flash gpt-4o Δ overall 0.657 0.693 +3.6pp information-extraction 0.896 0.942 +4.6pp knowledge-updates 0.714 0.714 +0.0pp (unchanged) multi-session 0.583 0.606 +2.3pp temporal-reasoning 0.417 0.477 +6.0pp By question_type: single-session-user 0.941 1.000 +5.9pp (PERFECT) single-session-assistant 0.964 0.964 TIE single-session-preference 0.667 0.767 +10.0pp knowledge-update 0.714 0.714 UNCHANGED multi-session 0.583 0.606 +2.3pp temporal-reasoning 0.417 0.477 +6.0pp vs Supermemory + GPT-4o (same model class): overall 0.693 0.816 -12.3pp Supermemory leads single-session-user 1.000 0.971 +2.9pp WE WIN single-session-assistant 0.964 0.964 TIE single-session-preference 0.767 0.700 +6.7pp WE WIN knowledge-update 0.714 0.885 -17.1pp multi-session 0.606 0.714 -10.8pp temporal-reasoning 0.477 0.767 -29.0pp Diagnostic: Supermemory's lead is concentrated in the three reasoning axes (KU + multi-session + temporal). Knowledge-update is unchanged across model classes for us, indicating it's a *structural* gap (lack of LLM fact extraction at ingest), not an answerer-capability gap. Closing the gap requires Phase 9 (LLM extractors) and decay-aware retrieval (surface old + new in parallel for temporal). Updates: - bench/results/longmemeval__s__*.json: GPT-4o run added. - docs/benchmarks/longmemeval.md: TL;DR, results table, comparison table, and 'what we don't beat' section all reflect both runs. - docs/featherdb_paper.tex: §4.7 results paragraph + table updated with GPT-4o numbers. PDF recompiled. - README.md: Benchmarks table now lists both runs, GPT-4o first.Configuration menu - View commit details
-
Copy full SHA for bb57d62 - Browse repository at this point
Copy the full SHA bb57d62View commit details -
Marketing pack for v0.8.0 LongMemEval launch
Three assets ready for Claude Cowork to execute: docs/marketing/gtm-plan.md - Positioning, ICP, three-claim core message, channel strategy, 90-day KPIs, founder talking points, asset checklist. - Conversion goal: Cloud waitlist email capture. - Explicitly: no Supermemory head-to-head in launch creative. docs/blog/longmemeval-publish.md - Public-ready article (800-1200 words). Headline: 'You don't need GPT-4o full-context for AI memory — Feather DB beats it for $2.40'. - Lists Feather + GPT-4o (0.693), Feather + Gemini-Flash (0.657), full-context ceilings, naive RAG. Does NOT list Mem0/Zep/Supermemory. - Includes reproduce command, per-axis tables, Phase 9 + Cloud teaser. - Note: docs/blog/longmemeval-results.md (the original draft, with Supermemory) is left in place as the internal-only / detailed version. docs/marketing/twitter-thread.md - 7-tweet thread, image spec for the headline chart, posting timing, reply templates for predictable Qs. docs/marketing/hn-submission.md - Title (70-char), submission URL placeholder, first-comment context template, 7 reply templates for predictable HN questions. GTM hand-off: this pack is what the marketing function works from.
Configuration menu - View commit details
-
Copy full SHA for 786a705 - Browse repository at this point
Copy the full SHA 786a705View commit details -
Publish round 2: HF Dataset + README badges + arXiv submission package
HF Dataset created: https://huggingface.co/datasets/Hawky-ai/feather-db-benchmarks - 22 result JSONs (LongMemEval oracle/S, SIFT1M, synthetic) - Dataset card with schema documentation, headline numbers, reproduce command. Loadable via datasets.load_dataset(...). README header badges: - LongMemEval_S: 0.693 (GPT-4o) and 0.657 (Gemini-Flash) - SIFT1M p50 = 0.19ms - Recall@10 = 0.972 - HF benchmarks dataset link - Updated HF Space link from Sri-Vigneshwar-DJ to Hawky-ai (org-owned) docs/arxiv-submission/: - featherdb_paper.tex + featherdb_paper.pdf (verified compile) - SUBMISSION_GUIDE.md: manual upload runbook for arxiv.org since the arXiv replace-article flow is web-form-only.
Configuration menu - View commit details
-
Copy full SHA for 5df6473 - Browse repository at this point
Copy the full SHA 5df6473View commit details -
Add CONTENT_INDEX.json + front-matter for website auto-pull
The Feather website's Claude-code agent can now watch docs/CONTENT_INDEX.json as the single manifest for all publishable content. The manifest lists every blog post, marketing asset, technical report, and paper with: - canonical raw_url (raw.githubusercontent.com) - status (ready / internal) - channels (where to publish) - tags, summary, cover_image_spec - do_not_mention rules (e.g. no Supermemory head-to-heads) - headline_metrics (single source of truth for numbers) Also added rules_for_consumers (governance), watcher_recipe (how to detect changes), and always_include_links (canonical URLs). YAML front-matter added to docs/blog/longmemeval-publish.md so the blog post is also self-describing for any consumer that fetches it directly. Other files will get front-matter as they become publish-ready. The website agent flow: 1. Watch CONTENT_INDEX.json for SHA changes (15-min cadence). 2. For each item with status=ready, fetch raw_url. 3. Parse front-matter + body, render per channel. 4. Re-publish on any detected SHA change.
Configuration menu - View commit details
-
Copy full SHA for 7f812d2 - Browse repository at this point
Copy the full SHA 7f812d2View commit details
This comparison is taking too long to generate.
Unfortunately it looks like we can’t render this comparison for you right now. It might be too big, or there might be something weird with your repository.
You can try running this command locally to see the comparison on your machine:
git diff master@{1day}...master