Try it live: https://decipherhv.ca/

Inspiration

Growing up I always found the way math textbooks or science textbooks teach things to be confusing and unhelpful. Often times they use very specific lingo when students don't have that yet in their vocabulary, or they go into complex proofs when it's not really necessary. It's as if the books are written for people who already know everything that's in the book. I despise how textbooks are like this, and I know that it's especially hard to get help online about them when you have a physical copy. That's why I decided to develop Decipher, I want to make learning accessible to everyone.

What it does

It's an online tool which allows students to learn better from confusing textbooks. It allows students to take pictures of their textbooks, and upload them to get AI assistance on whichever parts they find confusing. AI will first read all text from the image and display it on the screen for the user with everything displayed in proper markdown and in an easy to read format. Then the user can highlight text and select from the following three options:

  1. Simpify: For when a student may want something re-written / paraphrased, maybe the highlighted text is just worded poorly.
  2. Explain: For when a student can't grasp what the textbook is trying to tell them or why it makes sense.
  3. Knowledge Tree: For when a student wants to know what they need to learn before, to properly understand the concepts on their page. Essentially a tree of prerequisites.

How we built it

Below is the tech stack:

  • Frontend: React + Vite + Tailwind, react-markdown, KaTeX
  • Backend: Node.js + Express, multer for uploads
  • AI: Google Gemini Vision / Google Gemini via @google/genai

I first designed the Node.js backend, making sure all endpoints were working, and all the endpoints could reach Gemini API properly, I have 4 endpoints. Simplify, Explain, and Knowledge tree each use their own endpoint and use a single post request to Gemini API to get their results. OCR is more complicated, it uses three separate Gemini API post requests. The first request is to simply get exactly what's on the page, and put markdown around the math parts. The second request is to format the text as to improve readability (paragraphs/spacing) without changing the wording used. The third request is to take the output from step 2, and place it into a splice it cleanly into a JSON object so that the React front end can better handle the output.

I then designed the front end, I tried to make it as accessible and visually appealing as possible, across both desktop and mobile.

After that I deployed it on DigitalOcean, with their tools it was as easy as setting up a droplet, "sshing" into it, and setting up the project. It went very smoothly. I set up a custom domain because it looks nice, and also https is required for accessing cameras, and I needed a domain to get a certificate.

Challenges we ran into

The main challenge was how to get clean and readable extracted text from the pages. I chose to use Google Gemini Vision to complete this, specifically the gemini-2.5-flash model. I found it was pretty good at getting the raw text from the page, and putting markdown around the math equations. But it really struggled to format its output beyond. Often times it would just dump all the text into one paragraph with no other formatting beside the math markdown. That's when I decided to opt for the three request method I did, giving Gemini a much more rigid program to follow, making it's output nice and easy to read, as well as much more consistent.

Built With

Share this project:

Updates