Phogent

"A voice 'pho' you"

Inspiration

We saw a world where a simple customer service call is an impossible wall for those with speech and hearing disabilities, those with autism who struggle understanding speech in phone conversations, as well at those with phone anxiety. We knew this hackathon was an opportunity build a bridge from phone to text.

The Problem: The Auditory Barrier

Deaf & Hard of Hearing: Traditional calls offer no visual or text-based alternative for real-time dialogue.
Auditory Processing (ASD): Many individuals with autism, like mentor Logan, rely on lip-reading to understand speech. Without visual cues, accents or background noise make audio unintelligible.
Speech & Anxiety Barriers: Non-verbal individuals or those with severe phone anxiety often cannot use voice-only phone calls to reach essential services.

What it does

Phogent (Phone Agent) provides a dual-channel bridge that removes the requirement for verbal speech and auditory hearing. Phogent lets you make and receive real phone calls via a chat interface. You type, and an AI voice speaks your text to the caller in real time while transcribing everything the caller says back to you in real time. For you, it's texting. For the caller, it's a call.

Speech-to-Text (Transcription): Converts the caller's voice into a live text stream with ElevenLabs. This allows users to read the conversation in real-time, bypassing the need for lip-reading or hearing.
Text-to-Speech (Voice Synthesis): Allows the user to type their responses. ElevenLabs then communicates these responses to the person on the other end of the line in natural voice.
Universal Accessibility: By neutralizing the challenges that come with a phone call (i.e. accents, and social pressure), Phogent ensures that communication is defined by the message, not the medium.

How we built it

Frontend: React with Tailwind CSS for a real-time chat UI
Backend: Python (FastAPI + WebSockets) to orchestrate the data pipeline
Telephony: Twilio Programmable Voice & Media Streams for inbound/outbound calls
Voice AI: ElevenLabs for TTS and STT (μ-law 8000 Hz audio)
AI: Google Gemini for conversational response generation
Database: MongoDB to store call records and transcripts

Challenges we ran into

Coordinating API keys across the team was tricky
We struggled with GitHub merge conflicts.
We were running out of free credits in our personal Gemini accounts.
Coordinating team tasks, as we were limited on API use and who uses the AI coding agent.
Antigravity left things very buggy.

Accomplishments that we're proud of

We built a functional real-time voice pipeline from scratch in a hackathon timeframe. Beyond the code, we genuinely bonded as a team and left as closer friends.

What we learned

We learned how to integrate the ElevenLabs API for real-time TTS/STT, handle WebSocket audio streams with Twilio, and coordinate a multi-service AI pipeline end-to-end. We also learned to code effectively with Google Antigravity and to manage a busy GitHub repository.

What's next for Phogent

Add a dedicated phone number per user for persistent inbound calls
Improve interruption handling and reduce latency further
Build mobile wrappers (Android/iOS) around the existing web app
Expand visual accessibility features for users.

Built With

elevenlabs-api
fastapi
google-gemini-api
javascript
mongodb
python
react
tailwind-css
twilio-media-streams
twilio-programmable-voice
websockets

Submitted to

Hack@URI 2026
- Winner [Yconic] Neural Tide - AI Startup

Created by

Created outline of project pipeline and integration of tools. Defined motivation and purpose for the application as well as layout of necessary features that were later implemented. Visual representation of project processes. Implemented separate prompt engineering version through codex instead of AntiGravity as backup/another angle of view towards original project. MORAL SUPPORT! Also drink fetcher!

NathanFowler812 Fowler
Planning, Initial GitHub commits, Presentation

Sonya Cheteyan
Lucas Webber
edward-tavarez