MathsVoice

Document scanner
Audio playback for the lecture
Lecture script from the parsed content
Built in infinite subway surfer content
Previous scanned documents in the dashboard
Home page
Working login and register page with user auth and login tokens
Processing document

MathsVoice makes STEM content easier to learn by turning math and science documents into clear, natural audio. Students upload a file, we extract the math, rewrite it in approachable language, and read it out in a voice and language they choose. It’s built to help learners with visual impairments, dyslexia, attention challenges, and anyone who benefits from listening.

Why we built it (inspiration) We’ve watched brilliant students struggle with dense notation, small text, and inaccessible PDFs. Screen readers don’t always handle math well, especially when it’s embedded in images or untagged PDFs. We wanted a tool that respects how people actually learn: with clear explanations, flexible voices, and a place to revisit past work.

What We Learned Accessibility is a design constraint, not a feature to bolt on. Clear headings, ARIA roles, keyboard navigation, large touch targets, and meaningful error messages made the app better for everyone. Math is more than text. Explaining (\frac{a}{b}) as “a divided by b” or (\int_a^b f(x),dx) as “the area under f(x) from a to b” improves comprehension. Data modeling matters early. A simple one-to-many relationship (User → Documents) unlocked a natural dashboard experience. Deployment and environment discipline is key. Pinned dependencies, secrets management, and clear start commands save hours later.

How We Built It Backend: Flask with Blueprints for clean API separation. OCR & Math Extraction: Mathpix — handles LaTeX-rich content from images and PDFs. Summarization & Simplification: Google Gemini — converts raw math text into clear, spoken-language explanations. Text-to-Speech: ElevenLabs — provides natural-sounding voices in multiple languages. Auth & Data: Flask-Login and SQLAlchemy — with User and Document models (SQLite used for development). Frontend: Vanilla HTML, CSS, and JS with a modern, accessible UI. Voice selector with curated voices by language Responsive components and keyboard-friendly interactions Dashboard listing past documents with transcript previews Deployment: Hosted on DigitalOcean using a Flask app container with environment variables and pinned dependencies for stable builds.

We convert visual math into accessible narration. For example: Inline: E=mc^2 Block: $$ \int_a^b f(x)\,dx $$ becomes “the definite integral of f from a to b, the signed area under the curve.” --- ## Architecture in a Nutshell Upload → Mathpix OCR → Gemini explanation → ElevenLabs audio → Save Document → Play and revisit in Dashboard. Data shape for a Document: - filename - upload_time - original_text (from Mathpix) - transcript (from Gemini) - voice_id (from ElevenLabs) - user_id (foreign key to User)

Challenges we faced Aligning three APIs: Ensuring robust error handling when any service times out or returns partial results. Math semantics: Preserving meaning while simplifying; avoiding oversimplification that removes key context. Auth-protected UX: Keeping the App and Dashboard behind login while keeping the Home and Login/Register routes intuitive. Frontend accessibility: Making the UI keyboard-first, with consistent focus states and clear, visible feedback. Windows path and Git friction: Avoiding checking in virtual environments and long Windows paths (e.g., .venv/Lib/...) by using .gitignore and keeping venvs out of the repo. Deployment env drift: Matching local and server Python environments and pinning versions to prevent “works on my machine” surprises.

What’s working now Upload an image/PDF, pick a voice, and get a clear narration and transcript. See your past documents in the Dashboard, in reverse-chronological order, with quick preview and actions. Clean, consistent navbar behavior: Logged in: App, Dashboard, Logout. Logged out: Login, Register.

What’s next Document detail view with full transcript and re-generation options. Deletion, pagination, and better filtering/search in the Dashboard. Optional audio persistence and download history. Improved math reading strategies for complex expressions and multi-line derivations. Cloud deployment hardening (monitoring, retries, and per-service fallbacks).