Inspiration
Phone trees and long hold times make booking appointments frustrating. We wanted a voice AI that people could talk to naturally and that would handle scheduling in a single call. That led to Dialyn, an AI receptionist that places outbound calls and books doctor appointments over the phone.
What it does
Dialyn is an AI receptionist that places outbound calls and books doctor appointments by voice. It connects Twilio, Deepgram, OpenAI, and ElevenLabs in one real-time pipeline: Twilio carries the call audio, Deepgram transcribes it, the LLM runs the conversation, and ElevenLabs generates speech. As soon as the user finishes speaking, Dialyn streams its reply with low latency. At the end of the call, another LLM pass extracts booking details from the transcript and logs them automatically.
How we built it
We used Twilio Media Streams for real-time audio, Deepgram for streaming speech-to-text, OpenAI for the LLM, and ElevenLabs’ multi-context WebSocket for low-latency TTS. A custom call orchestrator coordinates these services and a shared receive loop so audio is streamed correctly across turns. Post-call, the LLM is called again to parse the transcript and write structured booking data to JSON storage.
Challenges we ran into
Keeping the ElevenLabs WebSocket open across turns was hard. The single-context API closed the connection after each turn, so we switched to the multi-context WebSocket API and added a shared receive loop. We also hit a Deepgram bug where mixing wall-clock and stream-relative time in utterance_end caused missed triggers. Fixing that required aligning on one time base. Setting up ngrok for local testing and handling URL changes on the free tier took extra debugging time.
Accomplishments that we're proud of
We built a full voice agent pipeline (STT to LLM to TTS) with streaming and low latency. The ElevenLabs multi-context integration keeps the WebSocket open across turns, and we implemented post-call booking extraction so appointments are created automatically when the call ends.
What we learned
Streaming voice agents depend on more than APIs—async coordination, WebSocket lifecycle, and prompt design all matter. Deepgram’s event model is subtle and needs careful handling. Production-quality TTS requires a shared receive loop and proper context management.
What's next for Dialyn
Planned next steps include user interrupt handling (barge-in), optional SMS confirmations, support for multiple clinics and Google Calendar, and a production deployment beyond ngrok for reliable HTTPS and uptime.
Built With
- deepgram
- elevenlabs
- fastapi
- javascript
- openai
- python
- react.js
- twilio
Log in or sign up for Devpost to join the conversation.