Videmo
A video messaging platform with AI-powered summarization.
Howdy!
Modern async communication relies heavily on video, but I wanted to understand what goes into building a platform like Loom from the ground up. This meant tackling video capture, upload pipelines, real-time processing, AI integration, and multi-platform deployment-all while maintaining a clean user experience.
What I Built
Cross-Platform Recording
Built a native desktop app using Electron that captures screen and camera footage, then streams uploads directly to the backend. The recorder works seamlessly across macOS, Windows, and Linux, handling video encoding and chunked uploads efficiently.
Web Dashboard
Created a Next.js web application where users can view, manage, and share their recordings. The interface displays video metadata, auto-generated subtitles, and AI summaries inline-making it easy to scan through content without watching entire videos.
Key Features:
- Clean, modern UI built with Tailwind and Radix components
- Secure authentication via Clerk
- Real-time upload progress and processing status
- Share links for easy distribution
Processing Pipeline
Designed a Node.js backend that orchestrates the entire video lifecycle:
- Receives uploads from desktop and web clients
- Stores videos in S3-compatible storage (MinIO)
- Triggers AI processing workflows
- Broadcasts real-time updates via Socket.io
- Manages metadata and user permissions
AI Integration
Integrated Whisper for accurate speech-to-text transcription and Mistral for generating concise summaries. This automation removes the manual work of documenting video content and makes recordings searchable.
Technical Architecture
Frontend Stack:
- Next.js 15 with React 19 for the web app
- Electron + Vite for the desktop recorder
- TypeScript throughout for type safety
- Radix UI primitives for accessible components
Backend Infrastructure:
- Express.js API server
- Socket.io for real-time communication
- Prisma ORM for database operations
- AWS SDK for S3-compatible storage
AI & Processing:
- Whisper API for transcription
- Mistral for summarization
- Automated subtitle generation
- Background job processing
What I Learned
Building Videmo end-to-end taught me about:
- Video processing complexity: Handling different formats, codecs, and streaming uploads
- Multi-platform development: Shipping a consistent experience across web and desktop
- Real-time architecture: Using WebSockets to keep clients synchronized during long-running operations
- AI integration: Chaining transcription and summarization models into a smooth workflow
- Performance optimization: Managing large file uploads and video streaming efficiently
Project Goals
This was never meant to be a commercial product - it’s a learning project where I could experiment with modern tooling and ship something complete. The goal was to refresh my full-stack skills while diving deep into domains I hadn’t explored much before: video processing, desktop apps, and AI pipelines.
Tech Stack
Frontend: Next.js - React 19 - TypeScript - Tailwind CSS - Radix UI
Desktop: Electron - Vite - TypeScript
Backend: Node.js - Express - Socket.io - Prisma
Storage: MinIO (S3-compatible)
AI: Whisper - Mistral
Auth: Clerk
Database: PostgreSQL (via Prisma) Current Status
Videmo is functional but intentionally kept as a side project. It successfully demonstrates the core concepts I wanted to explore: multi-platform development, video processing, and AI integration. The codebase serves as both a portfolio piece and a reference for future projects.
This project represents my approach to learning: identify an interesting problem, build a complete solution, and share what I learned along the way.