Inspiration

Every second counts in an emergency, and getting people the help they need as soon as possible can quite literally save lives. While the industry standard is to answer 911 calls within 15 seconds, data from Ontario's largest hub shows that during peak periods, only 33% of calls meet that mark. Two-thirds of emergencies are left in a digital queue, having to listen to an automated voice message on repeat, due to 911 dispatchers being severely under-resourced and understaffed. Not to mention the high rate of mental health disorders dispatcher experience due to the intense workload, with over 48.4% of dispatchers screening positive for several common ones.

One of our teammates almost got carjacked last month in Scarborough, and got to experience first-hand the wait time on a 911 call. He wasn’t in any immediate danger so he was fine waiting for ~3 minutes, but what if it was someone in critical condition on the other end of the phone? We felt that we had to do something about this, so we made Delta Dispatch to help the limited number dispatchers spend their valuable time helping those who need it more than others as quickly and efficiently as possible. Dispatchers are not able to spend their valuable time helping those who need it the most because the human bottleneck is not scalable, so it cannot be easily resolved. We propose a new system that can lighten the load of 911 dispatchers and provide aid to those that need it the most without any victims or callers being put on the sidelines.

What it does

In a (NG) 911 system, callers are put on hold if there are no free dispatchers available to take the call, with an automated message that basically says: “If this is not an emergency, hang up”. This is valuable time wasted for both the dispatcher and the caller, so we’ve redesigned the 911 queuing system and implemented an intelligent queuing system using AI agents to 1) not waste valuable time extracting key contextual information from the caller, and 2) allow dispatchers to, well, dispatch, and reduce their cognitive load as much as possible. Our system augments, not automates, the important work dispatchers undertake daily, so the 911 emergency system no longer faces the human bottleneck problem that leaves others who urgently need help on hold. The trivial and menial work such as extracting information is left to the agents using a 3-step pipelining process, and the human is left as the brains of the operation to make key decisions with human-in-the-loop.

How it works and How we Built This

To show that this could work in a real life call centre, we decided to build a 911 call simulator on top of our agentic logic to demo our proof of concept. We made a UI that simulates the user as a 911 dispatcher working in a call centre alongside other coworkers, and a live, shared queue among all of these workers. We used the Twilio API to connect a phone call directly to our server

We connected the Twilio API to a custom FastAPI-based backend through a webhook, allowing us to run custom logic on calls recieved by our number. We used three layers of AI agents orchestrated through LangGraph to transcribe calls, identify important information and evaluate them based on a custom criteria. Notably, the Gemini 3.0 Flash model allowed us to quickly and accurately transcribe calls, greatly improving the performance of our project. Using an embedding model, we upload the information to a vector database that allows us to better identify and classify incident types. By implementing a similarity check, we can deterministically verify and safeguard potential misclassifications or potential mislabellings. In addition, using vector embeddings allows us to accurately prevent duplicate reports of the same incident being made, thus reducing the workload of dispatchers.

We used Redis both as our primary data store and synchronization mechanism. While we initially did not choose to use Redis (see below), this allowed us to receive and classify multiple calls concurrently while maintaining a consistent view for dispatchers, even during high-volume scenarios.

Using Next.js and Tailwind CSS, we built the frontend of the website, and we used Tanstack Query to poll the Redis sorted set and cache requests to reduce server workload and optimize our system.

Challenges we ran into, and what we learned

  • (Redis cont.) We initially struggled with scaling the project to have more than a single agent that assessed the threat level of incoming calls and mutated the queue. We designed custom logic for getting calls in the queue that can be pushed back with a pushdown_count counter variable for each entry, which resulted in data race issues. We found that Redis has a built in sorted set data structure and a ranking function that works perfectly for distributed systems, so there was no need to reinvent the wheel. And so we pivoted from implementing our own custom logic and asyncio.Queue to reorder the elements in the queue to using Redis zset, and created a formula to compute a score for Redis to use that factors in the time waited by the caller and the severity assessed to equitably queue every user that needs help, without indefinitely pushing down less important calls (our teammate who got carjacked wouldn't like that!).

  • Another challenge we faced was dealing with Twilio API. We used Twilio API to take and record calls to simulate a 911 call. Initially, we believed that the Twilio dashboard would contain all the functionality for taking and recording calls. However, we discovered that Twilio required a webhook to receive calls. Initially, we did not plan on deploying our backend, but unfortunately we had to expose our API endpoints to take calls. This meant we either had to spend exorbitant amounts of time on deploying with a service like Vercel, but Thomson remembered the MLH demo during opening ceremonies and suggested that we use ngrok instead. ngrok is a shortcut that allows us to temporarily expose our API endpoints to the entire internet without having to deal with full deployment. This allowed us to continue our development smoothly and simulate the process of real calls.

  • After building a call center simulator powered by a multi agent distributed system with Redis and TSQ, we found that while the RAG pipeline needs some fine tuning, this project turned out to be very rewarding and had the great potential to augment and speed up urgent 911 calls. On average, our system took ~3s to go through the 3 step process using a big, pretrained (not fine tuned) model, 5x faster than the golden standard of 911 wait times. And during this wait time, our system would be able to fully understand any contextual information and get the user any help they need by relaying information and suggested actions as fast as possible to the human dispatcher, without having much need for any further back and forth between the dispatcher and the caller after a golden ~15s wait time. So we'd say we were pretty satisfied with this project, and had a lot of fun!

Accomplishments we’re proud of (from most to least)

Within 24 hours, we:

  • Designed a multi agent system to efficiently handle large volumes of incoming calls that is normally impossible for human dispatchers to synchronously and manually do so
  • Found optimizations we could apply to make our system more robust and reliable, and applied them
  • Built a simulator to prove our queuing system works in large call volume scenarios, and made it safe against data races and parallel and asynchronous calls
  • Learnt how to use technologies to enhance our tech stack & capability as engineers (Redis, Tanstack Query, Pinecone)

Next steps

For next steps, we will implement real features used in the existing (NG) 911 systems, and make it more robust and reliable to be used in the field by dispatchers :

  • Fine tuning the Ai agent models to be cheaper, faster and more predictable, like using Nvidia Nemotron to power our agents
  • We didn't have enough time, but since we are already using Redis, we want to truly make this distributed with TailNet, and not be constrained by 429s due to having ~30 simultaneous requests, a full fastAPI backend, a Redis server, a NextJS frontend AND a 5 Gemini tabs in the browser, running on the same computer for demo'ing.
  • Implement NG 911 features like geolocation and voice over IP with the web frontend via WebRTC Include DeX for identity verification
  • Scale the dispatch center horizontally for multiple servers, since we are already using async and Redis

We would definitely want to take this project beyond Deltahacks and solve this real optimization problem that can save people’s lives. If you’re interested, shoot us a DM on Linkedin!

Built With

Share this project:

Updates