Inspiration
The world is obsessed with "generative" AI tools that create fake videos from scratch. But for creators who already have hours of raw footage, the real bottleneck isn't content generation—it's the tedious manual labor of editing.
Traditional editing requires expensive software and thousands of clicks to perform simple tasks. We built PixelCut to eliminate the friction. We believe you shouldn't need a complex timeline to make a cut; you should just be able to talk to your video.
What it does
PixelCut is a "Chat-to-Edit" platform that replaces manual scrubbing with a simple conversation.
- Remove Stutters: Just say "Delete all the 'ums' and 'ahs'."
- Surgical Cuts: "Remove the intro up to where I start talking."
- Mute & Speed: "Mute the background noise" or "Speed up the boring parts by 2x."
- Text Overlays: "Add a caption that says 'Pro Tip' in the middle."
Unlike generative tools, PixelCut works on your actual pixels, ensuring 100% reality with zero hallucinations.
How we built it
We engineered a high-performance, AI-native video operating system.
- The Interface: We used Lovable to rapidly prototype a premium UI, then built the production frontend using Vite, Tailwind CSS, and TypeScript for a lightning-fast, type-safe experience.
- The Intelligence: We utilized Gemini 3 Flash as our sole engine. Its massive context window allows it to "watch" the entire video at once, and its high-speed reasoning powers our LangChain agent to execute edits in real-time.
- The Muscle: The backend is built with FastAPI (Python 3.11) on a Digital Ocean Ubuntu Droplet. We built a custom FFmpeg CLI wrapper to execute surgical, low-level commands like stream-copying (
-c copy) for near-instant trimming. - The State Engine: We used MySQL to manage persistent "Edit Ledgers," ensuring every session is saved and every edit is tracked.
Challenges we ran into
The biggest challenge was Timestamp Drift. When you remove a 5-second clip at the 10-second mark, the original "15-second" mark becomes the new "10-second" mark. We solved this by creating a persistent Edit History in MySQL, forcing Gemini 3 Flash to recalculate the relative timeline before every FFmpeg execution.
Accomplishments that we're proud of
- Flash-Speed Performance: Using only Gemini 3 Flash, we achieved near-instant video indexing and reasoning.
- Sub-Second Trimming: Optimized FFmpeg commands that avoid re-encoding, making edits feel like a text change.
- Semantic Cleanup: Successfully mapping vague requests like "remove the awkward silence" to precise FFmpeg filter chains.
What we learned
We learned that the future of AI isn't just in creating pixels, but in orchestrating battle-tested tools like FFmpeg. By leveraging Lovable for the frontend, we were able to shift 80% of our time to the core technical challenge: building a state-aware AI agent.
What's next for PixelCut
- Multi-Track Support: Handling B-roll and background music overlays via chat.
- Auto-Shorts: One-click conversion from landscape to 9:16 vertical clips with face-tracking.
- Style Transfer: "Make this video look like a 90s VHS tape" using advanced FFmpeg filter graphs.
Built With
- ai-agent
- digitalocean
- fastapi
- ffmpeg
- gemini
- langchain
- lovable
- mysql
- python
- tailwind
- typescript
- ubuntu
- vite
Log in or sign up for Devpost to join the conversation.