About the Project
Inspiration
The recent passing of UnitedHealthcare's CEO Brian Thompson and growing concerns about insurance claim fairness highlighted the need for more transparent and equitable insurance processes. These events inspired us to create an AI voice agent that serves as a bridge between customers and insurance companies, ensuring fair and consistent handling of auto insurance claims while maintaining the human touch that's crucial in stressful situations. We decided to start by specializing in auto insuranace and then continue building tools for more situations in the insurance ecosystem.
On the technology side, this project was born out of our desire to modernize the insurance claims process by leveraging cutting-edge AI and real-time communication technologies. We were inspired by the challenge of creating a voice assistant that not only understands natural speech but also dynamically guides users through a decision-making process. The idea was to blend the reliability of traditional telephony (via Twilio) with the innovation of AI-driven transcription and response generation (using OpenAI’s realtime API and Whisper), ultimately creating a smoother and more user-friendly experience for customers filing claims.
What We've Learned
- Real-Time Audio Processing: We gained hands-on experience with real-time audio streaming and processing, managing audio buffers, and ensuring seamless interaction between live user input and AI responses.
- WebSocket Communication: Working with WebSockets in a production-like setting taught us how to maintain robust, bidirectional communication channels for streaming data.
- API Integration: Integrating multiple external services—Twilio for telephony, OpenAI for text/audio generation, and Whisper for transcription—enhanced our understanding of asynchronous programming and error handling in a complex ecosystem.
- Dynamic Decision Trees: Designing a conversational decision tree to capture claim information improved our skills in state management and user interaction flow, ensuring the assistant can adapt to various inputs and scenarios.
How We Built the Project
Tech Stack:
The project is built on Node.js with the Fastify framework, chosen for its performance and minimal overhead. We also integrated WebSocket libraries to manage real-time connections.Twilio Integration:
The/incoming-callendpoint serves TwiML responses that instruct Twilio to stream audio data via WebSockets to the server. This forms the backbone of the voice interaction.OpenAI Realtime API & Whisper:
For each call, a dedicated WebSocket connection to OpenAI’s realtime API is established. This API handles both text generation (via a decision tree logic) and audio synthesis using thegpt-4o-mini-realtime-preview-2024-12-17model. Additionally, OpenAI Whisper processes incoming audio streams, converting them to text for natural language processing.Decision Tree Logic:
The assistant guides the user through a series of questions (starting with a policy ID and claim type) and then branches into specialized sub-flows (e.g., car accident, theft, vandalism) based on the user's input. This dynamic approach ensures that only relevant questions are asked, improving both efficiency and user experience.Audio Management:
Special handling is implemented to manage real-time audio responses. For instance, if the user starts speaking while the assistant is still delivering a response, the system can truncate the AI’s audio to prevent interruptions, ensuring a smooth conversation flow.
Challenges Faced
Real-Time Synchronization:
Coordinating audio streams between Twilio and OpenAI in real time was challenging. Managing precise timing (e.g., marking audio segments and truncating responses when needed) required careful design and testing.Asynchronous API Handling:
Dealing with asynchronous events from multiple APIs (Twilio, OpenAI Realtime, Whisper) introduced complexity in error handling and state management. Ensuring that the conversation state remained consistent across these events was a significant hurdle.Dynamic Conversation State:
Implementing a flexible, stateful decision tree that could adapt to varied user inputs (including interruptions or unclear responses) demanded a robust state management system. This was crucial to maintain the flow of conversation without losing context.Scalability and Isolation:
Ensuring that each call maintained its own isolated conversation state while sharing the same server resources was a key architectural challenge, especially under scenarios of high call volume.
Overall, this project has been an exciting journey into the integration of real-time audio processing, conversational AI, and modern web technologies. It has not only deepened our technical skills but also provided valuable insights into designing user-centric solutions in the fast-evolving landscape of AI and telecommunications.
Built With
- dotenv
- fastify
- fastify-formbody
- fastify-ws
- javascript
- ngrok
- node.js
- openai-realtime-api
- openai-whisper
- twilio
- ws
Log in or sign up for Devpost to join the conversation.