Inspiration
Our knowledge is scattered. Notes in Obsidian. Docs in Notion. Code in VSCode. Tasks in Linear. Emails in Gmail. Calendar in Google. Each tool excels at its domain - but they don't talk to each other. When you need to answer "What's Tyler working on this week?" you open five apps, search five times, and stitch the answer together in your head.
This isn't just inconvenient - it's a fundamental limitation of how we build personal software. Tools treat data as isolated silos. The connections between them exist only in your memory.
We started building Filegraph to answer a different question: what if your filesystem was a queryable knowledge graph? What if every file, note, entity, and calendar event was a node, the relationships between them were edges, and a local AI agent could reason over the whole thing - with inspectable evidence, not hallucination?
That vision demanded a new kind of AI integration: not a chatbot bolted onto a sidebar, but an agent that genuinely inhabits the workspace. One that can see your spatial canvas, navigate your file tree, manipulate your UI, and reason over your personal semantic graph - in real time, by voice.
The Gemini Live API made the final piece possible: real-time voice interaction with function calling, closing the loop between talking about your workspace and acting on it.
What it does
Filegraph is a local-first desktop workspace - part Obsidian, part Notion, part VSCode - where your files are nodes in a semantic knowledge graph, and a Gemini-powered AI agent can observe, navigate, and directly manipulate every surface of the UI through natural voice conversation.
The workspace
- Vault-based knowledge graph - Index any folder. Every file becomes a node with facts (
name,modified,size,type) and edges (fs:contains,ref:links,data:ref). The graph is built from your plain files; backup iscp -r. - Entity system - Structured data (
@entities/,@calendar/,@finance/) lives as JSON-LD files with human-readable IDs (person:sarah:001,proj:website-redesign:001). No database. No lock-in. - Bidirectional linking -
[[wikilinks]]and entity ID references in any file create automatic backlinks across your entire vault, like Obsidian but for all file types. - Spatial canvas - A ReactFlow-powered home canvas lets you arrange files, notes, images, embeds, and code into a visual dashboard - your personal command center.
- Calendar + tasks + finance - Events, deadlines, bills, and paydays live as files too, with a Kanban/calendar/month view. Everything is temporal and queryable.
- Integrated terminal - Shell commands, dev environments, and process management built directly into the workspace, like a lightweight VSCode.
How it compares
- vs. Obsidian / Logseq / Roam - Those tools are document-centric: notes are the primitive. Filegraph is fact-centric: triples are the primitive. Documents are just one node type among many.
- vs. graph databases (Neo4j, Neptune) - Those are server-oriented, operationally heavyweight, and assume the database is the source of truth. Filegraph is local-first and file-native - the filesystem is the database.
- vs. RAG-first "chat over your docs" systems - RAG retrieves text, not relationships. Filegraph builds durable, queryable structure and exposes inspectable graph traversals. You can ask "Why?" and see the exact query path.
- vs. AI agent platforms (Claude Cowork, OpenClaw) - Those are file-aware but not graph-structured: they give AI the ability to act on files without modeling semantic relationships. Filegraph's wager is that the missing layer is a queryable semantic graph where the agent's reasoning is explicit traversals, not text generation.
The AI UI Navigator
The agent doesn't just answer questions - it acts on the interface:
- Voice-driven UI control - "Switch to the calendar app", "Open my project notes", "Show me files modified this week" - executed instantly via Gemini Live API with real-time function calling
- Spatial canvas manipulation - Add, remove, rearrange, align, group, and connect nodes on the canvas by voice. "Lay everything out in a grid" → done.
- Inspectable reasoning - The agent never guesses. Every answer traces to explicit tool calls:
resolve_entity→query_graph→ structured result. You can ask "Why?" and see the exact graph traversal - not a generated explanation, a real query path. - Visual understanding - Analyze images and videos on the canvas with Gemini vision. Ask "What's in that screenshot?" and get a grounded, tool-backed answer.
- Full graph query access -
get_facts,get_links,find_by_attribute,aggregate- the agent issues real EAV queries over your vault, not RAG over text chunks. - 50+ tool functions - Across 10 domains: vault, canvas, shell, calendar, UI, media, system, memory, search, widgets - all bridged to Gemini Live for real-time execution.
The experience:
- Press
⌘⇧Lor click the microphone icon to activate Live Mode - A pulsing audio orb appears, streaming your mic to Gemini
- Speak naturally: "Show me my recent notes and add them to the canvas"
- Watch the agent traverse the graph, find matching files, and place them as nodes - narrating its reasoning as it goes
- Interrupt anytime (VAD), ask follow-ups, or switch tasks mid-sentence
How we built it
Architecture
Architecture Diagram: http://turtle.tech/reference/filegraph-architecture-diagram Explainer Video: http://turtle.tech/reference/filegraph-explainer-video Blog Post: https://brew.build/posts/filegraph
Key technical decisions:
- Client-to-server WebSocket - Browser connects directly to Gemini Live API for lowest latency audio streaming. No proxy server needed.
- Ephemeral tokens - API keys never touch the frontend. The Rust backend provisions short-lived tokens via
get_ephemeral_tokenTauri command. - Tool bridge pattern - All 50+ existing agent tools (defined in OpenAI function-calling schema) are automatically converted to Gemini
functionDeclarationsformat. Zero duplication. - System context injection - Before each session, the agent receives a snapshot of the current UI state (active app, open files, canvas nodes, viewport, theme) so it can make contextual decisions.
- AudioWorklet processing - Mic capture runs in a dedicated audio thread via
audio-worklet-processor.js, ensuring zero UI jank during voice streaming. - Domain-based tool architecture - Tools are organized into 10 domain files (vault, canvas, UI, shell, etc.) with a barrel export. Adding a new tool = adding a definition + handler to one file.
Stack:
- Frontend: React 18, TypeScript, Vite, Tailwind CSS, shadcn/ui, ReactFlow
- Backend: Tauri 2.0 (Rust), Tokio async runtime
- AI: Gemini 2.5 Flash Native Audio (
@google/genaiSDK), Gemini Live API - Audio: Web Audio API, AudioWorklet, PCM 16kHz/24kHz
- State: Zustand stores with persistence middleware
Challenges we ran into
- Audio threading - Browser audio processing on the main thread caused UI freezes. We moved PCM capture to an AudioWorklet processor running in its own thread, which required careful message passing and buffer management.
- Tool schema translation - Our 50+ tools were defined in OpenAI's function-calling format. Gemini expects uppercase type names (
STRINGvsstring), doesn't support union types, and handles nullable differently. ThetoolBridge.tshad to normalize all of this automatically. - VAD interruption handling - When the user interrupts the agent mid-speech, we need to simultaneously: flush the audio playback queue, cancel pending tool calls, and update the session context. Getting this state machine right took significant iteration.
- Ephemeral token lifecycle - Tokens expire, sessions can receive
GoAwaymessages, and reconnection needs to preserve conversation context. We implemented auto-reconnect with session resumption to handle this gracefully. - Context window pressure - With 50+ tool declarations consuming tokens, plus system context, plus conversation history, we had to be strategic about which tools to expose per session and how to compress context.
Accomplishments that we're proud of
- Zero-config voice UI control - Press ⌘⇧L and start talking. No setup, no configuration, no training. The agent immediately understands your workspace and can act on it.
- 50+ real tool functions - This isn't a demo with 3 hardcoded actions. The agent can genuinely navigate files, manipulate canvas layouts, run shell commands, create calendar events, analyze images, manage themes, and more.
- Tool bridge with zero duplication - Every tool works identically in text chat (via OpenAI/Ollama/Anthropic) and voice mode (via Gemini Live). One definition, multiple providers.
- Sub-second response to UI actions - Voice command → tool execution → UI update happens fast enough to feel like direct manipulation.
- Hybrid text + voice - During a Live session, you can still type text commands that get injected into the conversation via
sendClientContent. Voice and text coexist seamlessly. - Local-first architecture - Your files never leave your machine. The agent reads your local filesystem, manipulates your local canvas, and stores everything as plain JSON and Markdown files.
What we learned
Everything is a node - Files, entities, relationships, memories, even agent actions. This uniform model simplifies the architecture dramatically. There's no "files vs. database" distinction - files are the database. Once you see this, you can't unsee it.
Files as persistence, graph as index - The filesystem is the source of truth. The graph is a materialized view, rebuilt from files on startup. This inverts the typical model (database as truth, files as export) and enables radical simplicity: backup is
cp -r, version control isgit, editing is any text editor.Inspectable AI reasoning builds trust - When the agent answers "Tyler is working on the Atlas project," it's not generating text - it's traversing the graph. You can ask "Why?" and see the query path:
person:tyler worksOn project:atlas. This transparency builds trust in a way RAG-based systems fundamentally can't match.Emergent schema beats schema-first - You don't define a schema upfront. The AI extracts entities from your data. The schema emerges from what's actually there. This is the opposite of traditional databases and more aligned with how knowledge actually works.
Audio UX is a different discipline entirely - With text you can scan, re-read, and skim. With voice, timing, interruption handling, and real-time feedback become the entire product. The pulsing orb, transcript overlay, and VAD state machine aren't polish - they're what make the experience feel like a conversation instead of a command line.
What's next for Filegraph
Filegraph's north star is a single app that makes Obsidian, Notion, and VSCode redundant - a local-first semantic workspace where every piece of your knowledge is connected, queryable, and voice-navigable.
Replacing Obsidian as my second brain
- Full TQL query language - A human-facing Datalog-inspired query interface (
file.modified > now() - 7d, graph traversals, aggregations) giving you Obsidian's graph power with SQL-like expressiveness - Graph visualization canvas - A live, zoomable knowledge graph where you fly through connections between notes, people, projects, and tasks
- Statement-level provenance - Every derived fact knows its source file and line. You can dispute and edit the graph at the fact level, not just the document level
- Plugin API - Register custom file parsers, viewers, and agent tools that automatically become voice-accessible
Replacing Notion for personal productivity
- Unified workspace canvas - Every workspace tab becomes a canvas. File browser clicks add nodes to the canvas instead of opening editor tabs. The canvas is the view.
- Richer data views - Table, grid, kanban, gantt, and graph views over any
.datacollection, driven by the same EAV graph index - Collaborative graph sync - CRDTs over the vault for multiplayer editing without a central server. Sync is additive; conflicts are resolved at the fact level
- Email + Drive ingestion - Pull Gmail threads and Google Drive files into the vault as queryable entities, connected to your existing knowledge graph
Replacing VSCode for building experimental vibe-coding projects
- Workspace-native terminal - Each workspace gets a persistent terminal with auto-start commands, environment variables, and process monitoring
- Dev workspace presets - One-click setup for Node, Rust, Python, Go projects - the agent scaffolds the workspace, installs dependencies, and starts the dev server
- Code-aware graph edges - Import relationships, function call graphs, and test coverage woven into the same EAV store as your notes and entities
- Screen capture integration - Feed Gemini a live capture stream so the agent can truly see your editor state, not just query file metadata
Filegraph as a BaaS
- We're exploring the concept of using Filegraph as a reusable local-first semantic runtime (storage + graph engine + query + provenance + agent substrate) that any developer can build on top of - the way PocketBase is a backend, but for knowledge-graph applications
- Google Cloud Run deployment - Move ephemeral token provisioning and multi-user session management to Cloud Run for teams and production deployments
- Mobile voice companion - Voice-control your entire desktop vault from your phone via a Gemini Live API relay, keeping the graph and files local
Built With
- jsonld
- nix
- react
- reactflow
- rust
- shadcn
- tailwindcss
- tauri
- tokio
- typescript
- vite
- vitest
- web-aduio-api
- zustand

Log in or sign up for Devpost to join the conversation.