Inspiration
What it does
Lucid Browsing is an AI-powered browser extension that gives you total control over the web interface through voice and text commands. It transforms the browser from a passive screen into an active workspace.
Key capabilities include:
Natural Language UI Control: You can say "Remove the sidebar," "Hide all video recommendations," or "Draft an email to the person shown on my LinkedIn profile for a job application for AI engineer role" and the system automatically generates and executes the code to modify the page in real-time.
Context-Aware "Local Eyes": Unlike chatbots that hallucinate because they can't see your screen, Lucid Browsing captures the actual DOM structure of your active tab, ensuring the AI understands exactly what you are looking at.
Voice Command: Built-in speech-to-text allows for a hands-free interaction with the web.
Workflow Integration (Powered by Composio): It breaks the wall between the browser and your apps. You can say "Save this research to Notion" or "Draft an email about this," and the agent extracts the relevant content and pushes it to your external tools.
How we built it
I built a sophisticated Agentic Pipeline that splits the cognitive load across specialized AI agents: The Frontend (Chrome Extension): Built with JavaScript using lovable. It acts as the "Sensors" of the system, capturing the DOM snapshot (HTML structure) and user audio, and the "Actuator" that executes the final DOM manipulation scripts.
The Backend (FastAPI & Python): This is the brain of the operation.
The Agent Architecture (Google ADK): We utilized a multi-agent system:
-> Planner Agent: Deconstructs the user's vague intent into specific technical goals.
-> Writer Agent: Generates precise JavaScript code to manipulate the specific HTML elements found on the page.
-> Validator Agent: Runs the code in a sandbox (Daytona) to check for safety and syntax errors before it ever touches the user's browser.
-> Executor Agent: The final bridge that orchestrates the action.
Integration Layer (Composio): I used Composio to give our agents "Tools." This allows the backend to authenticate and communicate with APIs like Notion, Gmail, and Google Docs without writing custom boilerplate for every service.
Challenges we ran into
The biggest technical hurdle was the "Ghost Browser" problem. Initially, the backend would launch a headless browser to "see" the website. However, modern dynamic websites serve different content to a headless server than they do to a logged-in user. The AI would write code to remove an ad that didn't exist in the user's session. The Fix: I pivoted to a "Local Eyes" architecture. I rewrote the extension to capture a lightweight snapshot of the user's actual DOM and send it to the backend. This ensured the AI was writing code for the exact reality the user was seeing.
Accomplishments that we're proud of
-> Real-Time Latency: optimizing the pipeline so that Voice -> Action feels almost instant. -> Self-Healing Scripts: If the AI writes code that fails to find an element, the system detects the failure, analyzes the error, and rewrites the script automatically without the user noticing. -> The "Magic" Factor: There is a visceral satisfaction in seeing a messy news site instantly clean itself up just because you asked it to.
What we learned
-> Context is King: LLMs are powerful, but they are blind without accurate context. Injecting the DOM structure directly into the prompt was the breakthrough that made the automation reliable. -> Agents need Tools: Giving the agents access to Composio transformed the project from a "UI Toy" into a "Productivity Tool."
What's next for Lucid Browsing
-> Multi-Step Navigation: Moving beyond single-page changes to multi-step tasks like "Go to Amazon and find the cheapest headphones." -> The "Trust Engine": We plan to re-introduce our News Verification Agent, which scores articles for credibility in real-time. -> Community Marketplace: Allowing users to share their custom "Lens" scripts for popular websites.
Built With
- adk
- composio
- gemini
- javascript
- lovable
- python
Log in or sign up for Devpost to join the conversation.