Howdy!

Modern async communication relies heavily on video, but I wanted to understand what goes into building a platform like Loom from the ground up. This meant tackling video capture, upload pipelines, real-time processing, AI integration, and multi-platform deployment-all while maintaining a clean user experience.

What I Built

Cross-Platform Recording

Built a native desktop app using Electron that captures screen and camera footage, then streams uploads directly to the backend. The recorder works seamlessly across macOS, Windows, and Linux, handling video encoding and chunked uploads efficiently.

Web Dashboard

Created a Next.js web application where users can view, manage, and share their recordings. The interface displays video metadata, auto-generated subtitles, and AI summaries inline-making it easy to scan through content without watching entire videos.

Key Features:

Clean, modern UI built with Tailwind and Radix components
Secure authentication via Clerk
Real-time upload progress and processing status
Share links for easy distribution

Processing Pipeline

Designed a Node.js backend that orchestrates the entire video lifecycle:

Receives uploads from desktop and web clients
Stores videos in S3-compatible storage (MinIO)
Triggers AI processing workflows
Broadcasts real-time updates via Socket.io
Manages metadata and user permissions

AI Integration

Integrated Whisper for accurate speech-to-text transcription and Mistral for generating concise summaries. This automation removes the manual work of documenting video content and makes recordings searchable.

Technical Architecture

Frontend Stack:

Next.js 15 with React 19 for the web app
Electron + Vite for the desktop recorder
TypeScript throughout for type safety
Radix UI primitives for accessible components

Backend Infrastructure:

Express.js API server
Socket.io for real-time communication
Prisma ORM for database operations
AWS SDK for S3-compatible storage

AI & Processing:

Whisper API for transcription
Mistral for summarization
Automated subtitle generation
Background job processing

What I Learned

Building Videmo end-to-end taught me about:

Video processing complexity: Handling different formats, codecs, and streaming uploads
Multi-platform development: Shipping a consistent experience across web and desktop
Real-time architecture: Using WebSockets to keep clients synchronized during long-running operations
AI integration: Chaining transcription and summarization models into a smooth workflow
Performance optimization: Managing large file uploads and video streaming efficiently

Project Goals

This was never meant to be a commercial product - it’s a learning project where I could experiment with modern tooling and ship something complete. The goal was to refresh my full-stack skills while diving deep into domains I hadn’t explored much before: video processing, desktop apps, and AI pipelines.

Tech Stack

Frontend:    Next.js - React 19 - TypeScript - Tailwind CSS - Radix UI
Desktop:     Electron - Vite - TypeScript
Backend:     Node.js - Express - Socket.io - Prisma
Storage:     MinIO (S3-compatible)
AI:          Whisper - Mistral
Auth:        Clerk
Database:    PostgreSQL (via Prisma)

Current Status

Videmo is functional but intentionally kept as a side project. It successfully demonstrates the core concepts I wanted to explore: multi-platform development, video processing, and AI integration. The codebase serves as both a portfolio piece and a reference for future projects.

This project represents my approach to learning: identify an interesting problem, build a complete solution, and share what I learned along the way.