ReVision

Landing
Uploading
Success Upload
Image Sending
Working Through
Shows Solved Problem

Inspiration

Exams. A word that every student, good or bad, fears at least to some degree. Not always because they are hard, but sometimes because being involved in other non-academic activities, such as clubs, internships, or jobs leaves us with limited opportunities to participate in study sessions, seek assistance and guidance, and fully prepare ourselves to perform at our best.

However, with ReVision, you now get the chance to scan and effectively practice a variety of problems from different topics from your device, anywhere you want! Using an integrated AI Tutor, you can receive a constant loop of instant feedback, detailed breakdowns, and personalized suggestions tailored to how to solve or approach each problem.

Whether you're tackling topics like algebra, calculus, and even some written code late at night, ReVision actively revises your solution process and helps correctly practice and refine logic. Using the power of OCR and recursive validation, ReVision transforms passive review into active learning, training students not just how to get the right answer, but also understanding why it’s right and build confidence.

What it does

ReVision is a web application that takes in a document or image from the user, extracts questions from the document builds a module for the user to solve. If the user is struggling or makes a mistake, the app makes sure to guide the user to the right path. This real-time assistance acts similar to a tutor. Given the canvas, the user is free to write what think is correct. Based on how close they are to the answer, hints are displayed at the bottom of the canvas! In the case of an irrelevant response, our app directs the student back in the right direction in a humorous way. Our app is not limited to English. It can reply in other languages, expanding its accessibility to a wide range of students. Its use cases range from helping students to even developers who are new to technologies and want active practice.

How we built it

Our React-based web application takes in an image (jpg, png, etc.) prompt of problems that you wish to work on, learn or practice solving on your own with some possible assistance. Using python's compatibility with Flask and Google Vision's Object Character Recognition features, we extracted strings of characters and symbols.

The app then calls the data and passes it into Google Gemini Pro to create a json of the questions that were recognized from the Google Vision output. This json file is then sent in through the frontend interface displayed as a question prompt among the other components (canvas, pencil, eraser, next buttons, and a dedicated area for feedback/hints). We generated a blank canvas utilizing Next.js and Flask to request inputs to the python script hosting Gemini API 2.5 Flash Lite.

The app recursively captures any changes in input from the whiteboard and sends in the updates to the LLM periodically in a time interval of 2-3 seconds. The LLM evaluates the user solution process and returns possible suggestions or feedback to guide the user towards the correct answer. The feedback is shown visually to the user as a pop-up message which is color-coded based on how close the user is to the solution (red for not close and green for correct). To handle inputs that may not make sense or may be irrelevant, the LLM guides the user towards the correct topic and methods.

Challenges we ran into

Some challenges that ran into along the way are as follows:

Not being able to handle diagrams and flow charts due to OCR limitations from API model
LLM's capacity to handle large inputs (questions/prompts) caused delays and extensive runtimes
OCR heavily restricted PDF parsing integration to input
App effectiveness relying on better models limiting us to affordable alternatives
Integrating front and back ends via routing required caution and ample testing.

Accomplishments that we're proud of

Some accomplishments that we are proud of are as follows:

Deploying a functional real-time looping algorithm featuring a LLM
Applying Google Vision OCR Parsing to address a Real-World problem
Simulating a whiteboard-style canvas that can be edited with a stylus and eraser feature
Achieving Support for Multilingual Tutoring

What we learned

Through this experience, we learned a lot about what it takes to develop a versatile multi-purpose application that uses real-time data updates and operates with recursive thought processes. In our development cycle, we have deepened our appreciation and understanding of the immense power, versatility, and applicability of Google's modern AI tools and frameworks such as GeminiAI and Cloud Vision components. We also expanded our skillset in using databases like Supabase, committing app updates through Git and GitHub, and deploying applications in cloud environments such as Vercel hosting environments.

What's next for ReVision

We are really proud and excited about the idea of committing future updates to this project (maybe even making it a start-up business idea)! Despite the struggles and the frustrations, we had a great time collaborating and learning from each other in ways that contributed to our goal of developing this project to help other people, and students, like us! We look forward towards coming up with new features and updates to this project that adjust it to a modern world of education-based and AI application deployment!

Built With

Submitted to

ShellHacks 2025

Created by

I worked on integrating the frontend and backend systems for the canvas and upload files features. Making sure that all the different systems function as intended.

Kazi Amin
I worked on the Supabase database and backend for the project and used Gemini API to connect the pdf/image OCR result from Google Vision to process the text data and give feedback. Also helped with frontend and built landing page and themed certain areas of the site.

Pranavsai Gandikota
Sophomore at UCF studying Computer Science. I love doing hackathons
I helped write and debug api routing between the front-end and back-end. I also worked on the whiteboard page UI, implemented math notation, and debugged AI prompting.

Jeremy Whatts
David Navarrete