Fix-it-Flow

Inspiration

Every year, 50 million tons of e-waste hit landfills - not because devices are irreparable, but because people don't know where to start. A flickering lamp gets tossed. A coffee maker with a blown fuse gets replaced. The knowledge gap between "something's wrong" and "I can fix this" is where most of that waste is created.

We wanted to close that gap with a tool that feels less like a manual and more like a mentor. Fix-It-Flow was built for the person who has never opened a screwdriver in their life but is willing to try - if someone just tells them what to do, step by step, hands-free.

What it does

Fix-It-Flow is a voice-first repair agent that uses your device's camera and microphone to guide you through fixing broken household items, suggesting recycling methods instead of throwing them away.

It runs two flows:

Inspection Mode: You describe the problem out loud while pointing your camera at the item. Gemini Vision analyzes each live frame and describes what it sees. Featherless AI (Llama 3.1) then reasons over the visual description and your conversation to diagnose the issue. The app asks up to 3 targeted clarifying questions, then commits to a specific sustainable recommendation - repair, replace only the broken part, repurpose, donate, or recycle. If the fix is too complex for home repair, it suggests concrete alternatives like repair cafes or manufacturer spare parts rather than defaulting to "buy new."
Repair Mode: Once the device and broken part are identified, the app generates a step-by-step repair walkthrough. ElevenLabs reads each instruction aloud so your hands stay free. You say "done" to advance, "repeat" to hear a step again, or ask any follow-up question mid-repair and the AI answers in context before resuming where you left off.

How we built it

Frontend: Next.js 14 PWA with Tailwind CSS, supporting both mobile and web apps.
Voice input: Web Speech API for real-time speech-to-text and keyword detection ("done", "next", "repeat", "help", "end session").
Voice output: ElevenLabs streaming TTS with the eleven_turbo_v2_5 model for low-latency spoken guidance.
Vision: Google Gemini Vision (gemini-2.0-flash) analyzes each live camera frame and describes what is visibly broken.
Reasoning: Featherless.AI running Llama 3.1 8B takes Gemini's visual description plus the full conversation history to generate sustainability-first recommendations and repair steps.
Session persistence: AWS DynamoDB stores conversation history, captured frames, and repair state so sessions survive page refreshes.
Pipeline: Gemini runs first on each turn (vision only, ~3s), then passes its output to Featherless (reasoning, ~10s) - this ensures the LLM always has fresh visual context rather than last turn's summary.

Challenges we ran into

Latency was the hardest problem. The initial pipeline ran Gemini and Featherless sequentially with tight timeouts, causing frequent failures. We experimented with running them in parallel, but that introduced a worse problem: Featherless would receive last turn's visual context instead of the current frame. We landed on sequential execution with relaxed timeouts - Gemini runs first so Featherless always has fresh vision data.
LLM prompt tuning took more iterations than expected. Llama 3.1 8B defaulted to asking excessive clarifying questions and repeating the same generic suggestions. We introduced a question counter (hard cap at 3), explicit anti-repetition instructions (passing previous suggestions in the prompt), and a turn-aware system prompt that shifts behavior from "gather evidence" to "commit to a recommendation" as the conversation progresses.

Accomplishments that we're proud of

A fully functional voice pipeline: speak → camera captures frame → Gemini describes it → Featherless reasons about it → ElevenLabs speaks the response back - end to end in ~13 seconds.
A side-by-side desktop layout with the live camera on the left and the conversation panel sticky on the right, making the diagnostic flow easy to follow in real time.

What we learned

Multimodal LLM pipelines require careful thought about data freshness - running calls in parallel looks faster on paper but can silently degrade context quality.
The "sustainability angle" needs to be explicit in the prompt - left to default behavior, LLMs suggest the most common fix, which is often "replace it," not the most sustainable one.

What's next for Fix-it-Flow

Repair manual ingestion: upload device PDFs to S3, parse them with the LLM, and generate repair steps grounded in the actual manufacturer documentation.
Repairability score: after inspection, show a 1–10 score for how fixable the item is, what it would cost in parts, and the estimated CO₂ saved vs. buying new.