Dewey

Landing page
Highlight explanation
Highlight Q&A
Semantic search

Inspiration

Textbooks are a flawed teaching tool. One of the oldest mechanisms for learning, they're usually intended to be taught with a teacher or tutor who can facilitate a structured and organized presentation of their subject matter, as well as help when students become stuck.

But without this “educator layer”, textbooks are minimally useful. For millions of primary, secondary, and post-secondary learners, textbook use often means either banging your head against the binding or staring into the pages hoping to gain some understanding through sheer osmosis.

How can we make textbooks better? That was the question behind Dewey, an AI-powered textbook PDF reader that enables students to more easily find the answers to their questions.

What it does

Dewey solves two common problems students have with textbooks:

Textbooks are often dense and filled with confusing jargon. Through the ability to highlight text, summarizing it in a more comprehensible manner, Dewey provides clarification when students are stuck.
Textbooks are filled with high-level concepts and relationships that exact-match keyword searching (Ctrl-F) often fails to detect. Through semantic search, Dewey enables users to surface topics that are similar to what they're looking for.

How we built it

Dewey is a full-stack web application that uses multiple technologies to accomplish our goal.

The front-end is built as a MERN application that uses React libraries to facilitate the uploading of files and reading of textbooks. We use TailwindCSS for styling.

The back-end is built using AWS S3, Lambda, and an EC2 instance to upload, parse, and index textbook content into a retrievable database. For the NLP, we use an open-source HuggingFace text embedding model (all-MiniLM-L6-v1) for representing textbook content and Pinecone as our vector database for storing this content. We additionally use OpenAI’s GPT-3 DaVinci endpoint for Q&A text completion.

The architecture and design of this application was scoped out to allow us to build a comprehensive MVP in Treehack's condensed timeframe.

Challenges we ran into

The challenges we encountered can be split into two buckets (no pun intended).

First, building the system for parsing and indexing textbook content through AWS services. Creating an extensible system for accepting textbook content required us to, completely from scratch, learn how to use AWS cloud services, specifically S3 buckets for file storage, Lambda functions for coordinating between the client and server, and EC2 instances for the main text embedding and indexing logic. We spent the vast majority of our time learning how to use these services and coordinate them.

Second, experimenting with NLP techniques for parsing and representing textbook content for semantic search and Q&A prompting. Once we had a reliable pipeline set-up for accepting and displaying user textbook content, we spent a lot of time working on figuring out the best way to separate, structure, and embed textbook content for semantic search use. The second component of this NLP work was experimenting with the prompting for facilitating text-selection Q&A within the reader.

Accomplishments that we're proud of

Architecting and shipping the ‘file upload to vector database‘ pipeline.
The functionality and user experience design of the textbook reader.
Hitting our progress milestone, which allowed us to expand the scope of the project from a basic demo with pre-indexed textbooks into an interface where people could upload their own files.
Making it several rounds into the lightsaber tournament!

What we learned

AWS has a steep (and quite painful) learning curve. S3 is rough. Lambda is tough. EC2 is a whole 'nother animal.
App responsiveness and ease-of-use is critical to an education product like Dewey.
To get the most out of semantic search and LLMs, you must be incredibly thoughtful about users’ existing behaviors and workflows, and how they will similarly/differently use your service. Such a short timeframe could not do justice to the depth of user research we would ultimately need to scale this product to millions of students.
Palo Alto needs on-demand caffeine - like Insomnia Cookies, but for coffee.

What's next for Dewey

We plan to continue working on Dewey, expanding the capabilities of the reader to support more learning tools and resources (as well as increasing stability). To achieve our vision of providing an ‘educator layer’ for all textbooks, we anticipate shipping functionality that fulfills teacher roles and responsibilities, such as content “mini-lecturing” and assessment question generation.