Thirai

Inspiration

While Silicon Valley optimizes for 4K video calls, 3 billion people struggle to see their families clearly due to poor internet infrastructure. This became personal when video calls with my parents in India consistently degraded to pixelated messes. I realized current video calling treats every pixel equally, but humans don't - we care about faces and expressions. This sparked the idea: what if AI could intelligently prioritize what matters most in video calls?

What it does

Thirai uses AI to democratize video calling quality through two breakthrough approaches:

Automated Personalized Compression: Uses optimized SDXL VAE models to learn individual facial features and compress them intelligently while preserving recognizable details
Smart Bandwidth Allocation: Real-time face detection automatically sends high-quality face patches through WebRTC data channels while maintaining low-quality backgrounds, achieving 3x better face quality at the same bitrate

How we built it

Built a complete real-time video conferencing platform from scratch
Trained custom autoencoders and systematically evaluated 15 different ML models including SDXL, Real-ESRGAN, and proprietary approaches
Optimized SDXL VAE inference pipeline to achieve 120ms latency using CoreML and Metal Performance Shaders
Implemented MediaPipe face detection integrated with custom WebRTC data channel architecture
Created real-time compositing system that blends high-quality face patches with low-resolution background video
Cold-called researchers to access unreleased model checkpoints for comprehensive evaluation

Challenges we ran into

Custom autoencoders produced poor quality at usable compression ratios - even at 16KB, faces looked unrecognizable
SDXL VAE latent vectors were larger (50KB) than expected, barely competing with standard JPEG compression (12KB)
Achieving real-time inference on complex diffusion models required extensive optimization and architecture redesign
Balancing latency vs. quality - adding AI processing while maintaining conversational flow
WebRTC data channel synchronization with video streams for seamless compositing

Accomplishments that we're proud of

Optimized a massive diffusion model (SDXL) to run in 120ms - production-grade real-time performance
Built custom WebRTC data channel pipeline enabling intelligent bandwidth allocation
Achieved 3x better face quality at same bitrate through smart JPEG patching approach
Systematic evaluation methodology testing 15 state-of-the-art models with rigorous performance metrics
Created functioning real-time ML inference pipeline that works on consumer hardware
Demonstrated that AI can add meaningful intelligence to video calling infrastructure

What we learned

Personalized compression is harder than expected - generic JPEG often outperforms custom autoencoders on file size
The real value isn't in compression ratios but in semantic understanding of what humans prioritize
Infrastructure and engineering optimization matter more than model architecture choice
Real-time AI requires fundamental rethinking of model deployment, not just faster hardware
Current video calling solutions have significant room for AI-powered improvements
Building production-ready ML systems requires balancing multiple complex trade-offs

What's next for Thirai

Partner with telecom providers in emerging markets to deploy intelligent video calling at scale
Expand beyond faces to automatic detection of shared screens, documents, and other priority content
Develop region-specific models optimized for different lighting conditions, skin tones, and cultural contexts
Integration with existing video platforms through browser extensions and APIs
Research into attention-based models that predict where viewers will look to guide quality allocation
Build the intelligence layer that becomes standard for next-generation video communication infrastructure

Built With

coreml
mediapipe
pytorch
webrtc

Updates

Niranjan Baskaran started this project — Aug 10, 2025 08:36 PM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.