Build a self-hosted AI meeting delegate with two modes:
- Ghost Mode: listen + transcribe + summarize
- Agent Mode: respond in-meeting with policy guardrails
What
- Finalize architecture, scope, and mode split
- Define out-of-scope for v0.1
DoD
- requirements.md updated and approved
- This 10-step plan documented
What
- Chrome extension captures tab audio
- Streams chunks to local backend via WebSocket
- Backend stores chunks and exposes basic stats
DoD
- Start capture from extension popup
- backend receives chunks (
/ws/audio) and writes.webm GET /api/sessionsshows received bytes/chunks
What
- Convert incoming audio chunks to transcribable frames
- Run faster-whisper incremental transcription
DoD
- transcript lines appear in backend logs/API in near-real-time
- configurable model (
tiny/base/small)
What
- Side panel for live transcript view
- Speaker segments + timestamps
DoD
- captions update in UI while meeting audio is streaming
What
- Post-meeting summary generation
- Decisions / Action items / Follow-ups format
DoD
- summary generated from transcript via API
- deterministic output schema
What
- Pre-meeting context form:
- meeting topic
- user role
- talking points
- constraints
DoD
- summary/responses incorporate briefing context
What
- Build response policy layer:
- when to respond
- safe fallback lines
- no unauthorized commitments
DoD
- policy tests pass on sample scenarios
- generated responses include confidence/fallback when uncertain
What
- TTS + local virtual audio routing path
- optional (off by default)
DoD
- one-click “speak response” works on local setup guide
What
- one-command backend run
- extension load instructions
- sample config and troubleshooting
DoD
- fresh machine setup in under 10 minutes
What
- produce polished demo video
- README narrative + architecture diagram + examples
DoD
- public repo ready
- demo covers Ghost mode end-to-end
- ✅ Do only Step 1 and Step 2
- ❌ Do not implement STT/summary yet