Inspiration

We've all been there: you're deep in research mode with dozens of tabs open, trying to piece together information for a project. Bookmarks quickly become disorganized, and copying text or links into a document often loses context and formatting. In team projects, one person's research can be hard to understand for others, leaving some teammates confused or lost. I was inspired by the idea of a 'generative space,' a digital notebook that doesn't just store links but captures the essence of what you're reading. I wanted to strip away the noise of the modern web (ads, popups, tracking scripts) and create a clean, focused environment that improves understanding and collaboration.

What it does

GenSpace is a productivity ecosystem consisting of a Chrome Extension and a FastAPI Backend.

  1. Capture: Users can instantly "clip" the current webpage using the Chrome Extension (the WebDev workshop was helpful).
  2. Clean: The backend receives the raw HTML and runs it through our custom cleaning engine. It intelligently filters out non-human elements (like code snippets, scripts, and hidden styles) while preserving the semantic structure (headers, lists, tables).
  3. Curate: Clips are organized into "Spaces." Each space gets a unique URL that renders a beautiful, combined view of all the shared content, which is perfect for creating study guides, research summaries, or reading lists.

How we built it

I built GenSpace using a modern, scalable tech stack:

  • Backend: I chose Python and FastAPI for their speed and ease of use.
  • Database: I used MongoDB (via the asynchronous Motor driver) to store the variable-length HTML content and user data.
  • HTML Processing: I developed a custom html_cleaner.py module using BeautifulSoup4. It recursively traverses the DOM tree to reconstruct a sanitized version of the page. Frontend: The Chrome Extension was built with JavaScript, HTML, and CSS to interact with the browser's active tab. Deployment: The entire backend is deployed on Railway, enabling continuous integration and automatic deployments.

Challenges we ran into

  • Sanitizing HTML: The web is messy. Distinguishing between "useful" content and "junk" was difficult. I had to write complex logic to filter out code artifacts that looked like text and ensure that broken HTML didn't crash our parser.
  • Asynchronous Operations: Moving from synchronous database calls to asynchronous ones with async/await in Python required a shift in thinking, especially when handling database dependencies in FastAPI.
  • Cross-Origin Communication: Ensuring the Chrome Extension could securely communicate with my hosted API while handling CORS policies and data validation was a significant hurdle.

Accomplishments that we're proud of

I am particularly proud of my HTML reconstruction algorithm. It doesn't just strip tags; it rebuilds the document structure to ensure that the "clean" version is still readable and visually structured, not just a wall of text. I'm also proud of setting up a robust authentication system that allows users to securely register, log in, and manage their own private spaces.

What we learned

I learned a lot about the DOM structure and how to manipulate it programmatically. I also gained deep experience with FastAPI dependency injection and how to structure a scalable Python backend. On the client side, I learned the intricacies of Chrome Extension Manifest V3 and how to bridge the gap between a browser action and a server-side process.

What's next for GenSpace

AI Summarization: I plan to integrate an LLM to automatically generate summaries for each Space. Cosine Similarity Search: This will allow users to search for relevant information stored in the text, without having to type text exactly as it's written (like when using Google's Control/Command+F). PDF Export: Adding a feature to export an entire Space as a clean, formatted PDF document.

Built With

Share this project:

Updates