ManualX

Dashboard
Treadmill example
Ikea table example
Tech Stack

## Inspiration

We've all been there staring at a cryptic assembly manual, trying to figure out which screw goes where. IKEA furniture, BBQ grills, electronics, even Lego sets. The instructions feel like they're working against you. Tiny 2D
diagrams. Ambiguous arrows. Part numbers that mean nothing. One mistake, and
you're disassembling everything to start over.

We asked ourselves: What if understanding was guaranteed? What if you
could see every step in an interactive 3D model, hear clear voice guidance,
and ask questions when you're stuck? That's the experience we set out to
build.

## What it does

ManualX transforms any PDF instruction manual into an interactive 3D
assembly guide with AI voice assistance.

Upload any PDF manual
AI extracts the steps, detects components, and generates 3D models
View each step as an interactive 3D scene — rotate, zoom, see exactly
how parts connect
Listen to natural voice narration explaining each step in plain
language
Ask questions using the built-in voice assistant when you need help

No more squinting at tiny diagrams. No more guessing. Just clear, visual,
voice-guided instructions that anyone can follow.

## How we built it

Frontend:

Next.js 15 with React 19
React Three Fiber for 3D rendering
Vapi for real-time voice AI conversations
ElevenLabs for text-to-speech narration
react-pdf for side-by-side PDF viewing

Backend:

FastAPI for the REST API
Google Gemini for vision AI — detecting steps, identifying components,
generating descriptions, and estimating 3D positions
Tripo AI for converting 2D component images into 3D GLB models
rembg for background removal to clean component images before 3D generation
SQLite for data persistence

Pipeline:
PDF → Extract Images/Text → Detect Steps (Gemini) → Detect Components (Gemini)
→ Crop & Clean Images (rembg) → Generate 3D Models (Tripo)
→ Analyze Positions (Gemini) → Generate Voice Audio (ElevenLabs) → JSON
Output

## Challenges we ran into

3D Position Estimation: Getting AI to understand spatial relationships
from 2D manual images and translate them into accurate 3D coordinates was
harder than expected. We iterated on our Gemini prompts extensively to improve accuracy.

Background Removal: Our initial approach used Stable Diffusion for
"cleaning" component images, but it was actually regenerating them, causing
inconsistencies. Switching to rembg gave us clean, transparent PNGs that Tripo could work with properly.

Browser Audio Autoplay: Chrome's autoplay policies blocked our TTS from
playing automatically. We had to implement proper user interaction handling
and add manual replay controls.

Real-time Voice AI: Coordinating between the step narration TTS and Vapi's voice assistant required careful state management to prevent audio overlap
and ensure smooth transitions.

## Accomplishments that we're proud of

End-to-end automation: Drop a PDF, get a fully interactive 3D guide — no manual work required
Multi-modal AI pipeline: Successfully chained Gemini (vision), Tripo (3D generation), and ElevenLabs (voice) into a cohesive experience
Natural voice interaction: Users can have real conversations about the
assembly process, not just listen to pre-recorded audio
Clean, intuitive UI: Split-view with 3D scene and original PDF,
glassmorphism design, smooth animations

## What we learned
AI is surprisingly good at understanding instructions, but only if you’re very specific (vague diagrams in, vague results out). Creating a complex AI pipeline from scratch is really hard. Threejs is also pretty sick.

## What's next for ManualX

AR mode: View the 3D assembly overlaid on your real workspace using your phone camera
Community library: Share processed manuals so others don't have to
re-process the same IKEA instructions
Hardware integration: Connect with smart tools that can verify you're
using the right screws

Built With

elevenlabs
fastapi
nextjs
python
react-pdf
three.js
tripo3d

Updates

muhammadbalawal Safdar started this project — Jan 25, 2026 10:40 AM EST

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.