Inspiration

Modern professionals spend 40% of their day "context switching", jumping between tabs to look up terms, summarize videos, or find tutorials. Google gives us information, but we still have to do the heavy lifting to turn that information into a plan. We don't need more tabs; we need a faster bridge to execution.

What it does

  1. YouTube Smart-Tutor Mindy doesn’t just summarize a video; she lives in the timeline. Ask specific questions about what’s happening at 02:45, and Mindy provides real-time explanations, acting as your personal academic or technical tutor.

  2. Visual Deep-Dive (Tab Capture) Mindy can "see" your current tab. By capturing your screen, she analyzes the visual layout and answers questions based on the live context. If you wander off-topic, Mindy stays grounded—answering with a warning that the info isn't in the image, ensuring 100% factual transparency.

  3. Jargon Crusher (Right-Click Simplicity) Reading a PhD-level research paper or confused by Gen-Alpha slang? Highlight the text, and Mindy is just one right-click away. She crushes complex jargon into simple, human sentences instantly, so you never have to leave your reading flow.

  4. Blueprint Vision (The Executioner) Upload a photo of a broken appliance, a complex recipe, or a technical diagram. Mindy identifies the core problem and generates a step-by-step execution guide, complete with automated research links to YouTube and Google so you can start fixing, cooking, or building immediately.

How we built it

Mindy AI is built as a Chrome Extension (Manifest V3) to ensure it sits as close to the user's workflow as possible. The "brain" is powered by the Google Gemini API, specifically utilizing the Gemini 2.5 Flash models for their industry-leading speed and multimodal capabilities. Backend: JavaScript Service Workers handle the complex logic of scraping YouTube JSON3 transcripts and capturing tab buffers.

Challenges we ran into

The biggest hurdle was Multimodal Synchronization. Getting the extension to capture a live tab, convert it to a Base64 string, and send it to Gemini alongside a specific system prompt within milliseconds was a race against latency. Additionally, managing API Rate Limits was a constant battle. I had to optimize the code to ensure that only the most relevant context is sent to the API, preventing "chopped" responses and ensuring every request counts during high-pressure usage.

Accomplishments that we're proud of

I am incredibly proud of the "Symbiotic Intelligence" between the YouTube Context and Visual Deep-Dive features. Creating a system where the AI can "hear" the narrator via transcripts and "see" the screen via Vision, and then cross-reference both to answer a user's question, feels like a glimpse into the future of browsing. Also, successfully building and deploying a fully functional, high-fidelity AI tool in just 6 hours is a personal milestone.

What we learned

A ton! This was actually my first time ever building a Chrome Extension. I had to learn how extensions talk to web pages, how to stay within Google’s new security rules, and how to accurately pull YouTube timestamps in real-time.

What's next for Mindy AI

  1. PDF Brain: I want her to be able to "see" through 100-page manuals or research papers.
  2. Talk to Mindy: Adding voice commands so you can ask questions hands-free while you're cooking or fixing stuff.
  3. Save to Notion: One click to send those "Blueprints" straight to your notes so you don't lose them.
Share this project:

Updates