Inspiration
For decades, we've seen how meetings are a cornerstone of nearly every industry. Coordination among colleagues is how efforts scale from a single CEO to tens of thousands of employees. Yet despite their importance, most meetings remain passive experiences: time-consuming, stressful, and frequently redundant. Critical ideas get buried across sprawling agendas, context from past discussions is forgotten, and participants struggle to stay aligned as conversations diverge and change context. We saw an opportunity to change that.
We wanted to build meeting software that goes beyond simply connecting people. Something that actively understands the flow of conversation, recalls what matters from previous sessions, and helps every voice contribute meaningfully, whether it's a small study group, a company-wide webinar, or a fast-paced team standup.
Zoom has become a critical component of our modern internet and business infrastructure, and having a native solution that addresses these problems will have a great impact on society.
What It Does
Our solution is a Zoom-integrated meeting copilot that transforms live conversations into actionable intelligence. It captures and transcribes meetings in real time, summarizes topics as they emerge, and lets participants query anything that's been said, all from within the Zoom interface. Beyond the current meeting, it connects to the collective knowledge of your organization: when a new topic surfaces, it consults AI agents representing team members outside the room and surfaces only the most surprising or actionable perspectives. The result is a meeting experience where context is never lost, insights arrive in real time, and every discussion is informed by the full breadth of your organization's thinking.
How We Built It
Constant iteration! Our tech stack combines real-time streaming, semantic search, and AI inference.
Zoom App SDK + Immersive View: The app runs natively inside the Zoom client as a sidebar and immersive overlay. Hosted and served from Render.
Zoom RTMS SDK: Live meeting audio is transcribed in real time and sent via Zoom's WebSocket API, producing a continuous stream of speaker-attributed transcript segments.
Elasticsearch (on Render): Powers transcript storage, indexing, and retrieval using Jina AI embeddings for semantic vector search. The Kibana API is used for agent interactions and Elastic Workflows for agentic pipelines.
AI Inference (Claude): Claude handles real-time summarization, topic detection, and natural language querying via Elasticsearch's OpenInference API.
Render: Manages all backend services, AI inference, and storage infrastructure orchestration, and serves rendered interfaces to the Zoom client. It is the operational backbone of our system.
Challenges We Ran Into
Learning how to develop for new platforms (to us) like Zoom Apps, RTMS, and ElasticSearch.
Acquiring the right permissions necessary for development.
Connecting our front-end components (Zoom App) with the back end (RTMS, Elastic Search, Render).
Making sure the goals we wanted to achieve could not already be easily done by existing commercial tools. We looked at tools like Zoom's own meeting AI summarizer, Otter AI, and Granola AI. A key differentiating factor we wanted to bring was our ability to use Zoom's real-time media streams to bring participants separated contexts and summarizations to users.
Accomplishments That We're Proud Of
Building out high-volume data pipelines via free subscriptions etc. (not without its challenges!)
Developing a low-latency Realtime Media Stream (RTMS) to inference pipeline. Extremely quick in providing topic summaries to meeting attendees.
Interactive immersive view and integration with the native client.
Render is the host system for our entire application.
Open-source implementation.
What We Learned
How to build engaging applications for Zoom.
Manage and feed data to power LLM agents, and how to use them to enable novel workflows.
Working together under pressure to deliver features that both we and Zoom were excited about.
What's Next
We would love to have our technology stack shipped in Zoom's experience. We believe real-time in-meeting context summarization and agentic knowledge retrieval are key innovations in the space of online meeting infrastructure and should be widely adopted.
Releasing a guide on best practices for Zoom's developer tools, especially Realtime Media Streams.
Building agent information siloes (for organizations where agents might be limited in what they can share, depending on who asks).
Built With
- elasticsearch
- python
- render
- zoom

Log in or sign up for Devpost to join the conversation.