Perch

Cover photo
Dashboard
Knowledge tree
Workspace

Inspiration

We've all left that tedium of working through a heinous algebraic expression or integral, and discovering you veered totally off course. As our experiences as tutors, teaching assistants, and as students, one thing we've found to prevent this and encourage student learning is a second pair of eyes. If you had someone there to nudge you in the right direction as you made mistakes in sequential problems, you're able to gain more out of the problem and more quickly gain feedback as to your mistakes, placing you perfectly in the zone of proximal development. Furthermore, it's helpful to create follow-ups as tutors: given that students are making a particular mistake, how can you tailor future questions towards that mistake?

What it does

Not everyone has a tutor alongside them, which is why we built Perch. Perch is a second pair of eyes that stays in the background as you're solving problems through natural handwriting, but is capable of understanding and reasoning through your mistakes. Whenever users make mistakes, they're "spell-checked" and offered a hint to double-check their work, and their mistakes are tracked. These mistakes are fed into a state-model to determine their understanding of a particular concept, and then to retrieve relevant questions from user-provided resources to probe those weaknesses.

How we built it

We used FastAPI, Next.js, and MongoDB as the primary tools for our tech stack. Lots of tools were used for individual steps: Gemini was used in tandem with other tools for document extraction, secondary handwriting processing, and question concept map generation. We used the bundled ViTs from Pix2Text for a large part of our OCR pipeline, and built our symbolic math checker off Sympy. We also relied on (a depressing amount of) energy drinks and Hippeas Bohemian Barbecue chickpea puffs.

Architecturally, we first process worksheets through Gemini 2.5 Flash, using it both as a segmentation model and OCR to extract LaTeX-formatted math equations. We coalesce these by subjects, for which a subject tree is also generated through 2.5 Flash to reference off of and traverse through. Upon PDF upload, each question is tagged with a difficulty score and a position in our subject tree which the question covers.

Inspired by behaviour science and learning research, such as the following: https://act-r.psy.cmu.edu/wordpress/wp-content/uploads/2012/12/893CorbettAnderson1995.pdf, we modeled concepts as a directed acyclic graph, and modelled student understanding as a Hidden Markov Model updated by the correctness of their responses. We then use this Hidden Markov Model to retrieve new questions until students have completed their mastery topic trees, which you can visualize and interact with to track your own progress!

For the handwriting recognition system and the feedback generation for the HMM, we use a vision feedback loop involving Gemini for reasoning & mistake bounding box identification, and a lighter Pix2Text ViT running locally in tandem with a sympy solver for validation and verification of any hallucinations. This provides a natural checking experience and allows for user hints in a pretty seamless fashion while simultaneously stashing mistakes to feed into our HMM.

Challenges we ran into

Even seemingly simple tasks were quite complicated! For instance, we spent a while debugging and trialing document extraction techniques, from Donut to LayoutLVM3, but found through trial by fire that VLMs, such as Gemini, worked best with very specific prompting to get segmentation tasks accurate. Creating the interaction of AI tools through the vision feedback loop also took quite a while. We also put a lot of care into streamlining our design and user experience, and getting things to look right was a large part of our project!

Accomplishments that we're proud of

Our group was filled with members from lots of different experience levels, so it was a lot of fun to meet each other to try and ship something exciting! We're quite happy with how our design spec turned out, as well as how our canvas integration feels as a user :) Our job was to try to implement a lot of the little features: small things, like prompting a hint if the user types "hint" in the handwriting section, allowing for a better user experience, not necessarily being the biggest or most flashy feature.

What we learned

Making things is hard, and it's easier to plan things out than to actually do them! Many of our members learned different things, from learning the essentials of responsive design to learning about basic graph theory for the backend implementation, but one thing we all improved on was in collaboration; we became much more adept at planning, splitting, and combining subtasks together by the end of the project.

What's next for Perch

There were quite a few ways we hoped to expand this idea! Integration with voice models, question generation and prompting, and expanding to more subjects (like physics, algorithms, etc) were some ideas that were next on our horizon. Another extension our group was excited about was extended RAG capabililties: what if Perch could pull in sections of notes or YouTube lectures that would help refresh users of a particular concept they erred on? Ultimately, our hope was to make Perch a platform to seamlessly sync with users' existing work and to make their lives easier, more productive, and ultimately more enriching, and these features would allow us to do so.