Inspiration

Every meeting ends the same way — someone has to manually write up the notes, create the tickets, send the recap email, and remember what was decided three weeks ago. We wanted to eliminate that entirely. The idea was simple: what if your meeting had an AI co-pilot that did things the moment you asked, without ever leaving the call?

What it does

CallClaw is an AI agent that joins your Google Meet as a bot. Say "Hey CallClaw" followed by a command, and it acts immediately:

  • 🔍 Web Search — looks up information and speaks the answer back into the call
  • 🎫 Create Ticket — opens a Linear issue with title, description, and priority in ~3 seconds
  • 📧 Send Email — drafts and sends a real email via Gmail API
  • 📝 Create Doc — generates a structured Notion page from the conversation
  • 🧠 Recall Memory — remembers decisions and action items from previous meetings and surfaces them on demand

The killer feature is cross-call memory — CallClaw remembers what was decided last week and uses that context to inform this week's meeting.

How we built it

The architecture uses a two-phase response pattern to eliminate awkward silence:

  • Phase 1 — The bot instantly speaks a pre-cached confirmation ("Let me look that up...") via ElevenLabs TTS
  • Phase 2 — The actual action runs in the background (Linear API, Gmail, OpenClaw agent), then the result is spoken back

Routing: Mistral Small classifies the intent in ~1s. Mistral Large formulates the spoken response. Direct API calls handle Linear and Gmail for speed (~2-3s). OpenClaw handles complex multi-step tasks like web search and Notion.

Memory: Redis stores a 2-minute sliding transcript window per bot, plus cross-call memory (last 10 meetings, 30-day TTL). Recall.ai provides the real-time WebSocket transcript stream.

Challenges

  • Audio injection into Google Meet — Recall.ai runs a headless Chromium browser. Getting PCM audio from ElevenLabs to play into the call (not just to the user) required using AudioContext in the bot's camera page.
  • Latency — Users expect near-instant responses in a live call. The two-phase pattern + direct API bypasses (skipping OpenClaw for simple actions) brought response time from ~20s down to ~3s for common commands.
  • Transcript noise — Mistral fires on every utterance. A cooldown lock in Redis prevents the bot from double-triggering on the same phrase.

What we learned

Real-time agentic systems require careful separation between fast paths (direct API calls) and slow paths (multi-step agents). The architecture decision to route simple actions (Linear, Gmail) directly rather than through an agent framework was the single biggest performance win.

Built With

Share this project:

Updates