Inspiration

Our inspiration for this project came from the need to make educational materials more accessible and engaging for students. Many students struggle with consuming large amounts of lecture content through traditional means, such as reading slides or textbooks. In addition, remote learning has grown by approximately 54% over the last 5 years alone. With this rise in remote learning and 30% of the world’s population being auditory learners, it’s become evident that an interactive solution is needed to bridge the gap in lecture content delivery. We envisioned a tool that not only narrates lecture slides in a concise, professor-like manner but also provides a way for students to interact with the content directly by asking questions. While this is the main inspiration behind Profound.ai, we also recognize the common disease that plagues all college students, missing class. With Profound.ai, students are able to watch lectures that were never recorded, with a sophisticated text-to-speech model and an interactive Q&A feature.

What it does

Profound AI takes a PDF file containing lecture slides and an optional Canvas URL as input. It stores these both as context knowledge, using them to present the slides in an engaging and informed manner with audio narrations. Additionally, it allows students to ask questions throughout the presentation with a "hand-raise" feature, providing interactive learning similar to a classroom environment. The project aims to make learning more accessible, engaging, and interactive for students.

What makes it powerful: The reason a qualified professor makes an impact in the classroom is two-fold. For one, professors have knowledge about the whole presentation when they are giving a lecture, not just the individual slide at hand. Second, the teacher has detailed academic knowledge about the lecture content.

We give this important context information to our LLM using RAG (Retrieval-Augmented-Generation). Profound uses such context in two ways:

  • Slide presentation transcript: Having context information about the rest of the presentation (and Canvas course) allows the individual slide narration to be more informed and natural.

  • Student question answers: We can efficiently reference the entire lecture presentation and course Canvas to find accurate, course-specific answers to student questions

How we built it

Backend Technologies: Flask, MongoDB Atlas, OpenAI Embeddings, LangChain, PDFPlumber.

Data Processing: We used pdfplumber to extract text and tables from each page of the lecture slides PDF. We also use Canvas API to scrape course content from the user-given canvas page. This extracted content is then converted into vector embeddings using the OpenAIEmbeddings model.

Document Storage: MongoDB Atlas was used to store the embeddings in a vector store using MongoDBAtlasVectorSearch for efficient retrieval.

RAG (Retrieval-Augmented Generation): We use Langchain’s RetrievalQA to perform similarity searches on our MongoDB vectorized embeddings to retrieve quick, relevant context information for our slide narrations and question answers.

Interaction: We used Flask to send information between the frontend UI and the backend endpoints.

Frontend Technologies: React, PDF.js for rendering PDF files, TailwindCSS for styling.

File Upload: The front end allows users to upload a PDF file and optionally provide Canvas URL and token for context. The file is then sent to the backend, where the content is processed.

Text-to-Speech: We use OpenAI's API to convert the generated text into an audio buffer, which is then played back to the user as an interactive narration for each slide.

User Interaction: The front end provides a "hand-raise" feature that lets students pause the narration and submit their questions. These questions are sent to the backend and processed with RAG to generate responses, which are also converted into audio.

Challenges we ran into

  • Embedding Context Documents: We had to find an efficient way to store our context information in MongoDB. Creating large documents would slow the generation process, but storing smaller documents would make each piece of context less informative. We tried different approaches and landed on separating the documents by page.

  • Finding the Right Text To Speech (TTS) Model: We struggled finding a TTS model that would convert the text efficiently and that would allow for a human-like voice. After trying many Hugging Face models, we found the OpenAI TTS model to have the most human-like voice with a conversion time.

  • Real-time Interaction: Implementing a real-time, interactive question-answering feature while managing audio playback seamlessly was technically demanding. Initially we had an audio player below the ‘Replay’ button, but thots made it so that the user would have to manually pause the audio and then click the raise hand button in order to avoid overlapping voices. In the final version, we have a reference to the Audio class and use that to play/pause and keep track of whether it is playing or paused throughout all the different functions within the code. This allows us to pause whenever we raise our hand and resume afterwards at the correct point.

  • Canvas API: When gathering all the text data from the files (only pdf and pptx files) in Canvas, we needed to use a different method than the initial parsing functionality with the uploaded image that used the pdfplumber library. For the files on Canvas, we used the PdfReader and Presentation module from the PyPDF2 and python-pptx libraries, respectively. We had to download the files from the canvas api if we wanted to parse them, but we did not want to download all the files to our local machines, so we decided to use a buffer using io.BytesIO and the requests module to stream the reads so that we could download large pdfs or presentations without needing to download them to our local machines.

Accomplishments that we're proud of

  • Created an interactive, human-like system that allows students to get both informed lectures, and instantaneous question answering.

  • Seamlessly integrated multiple technologies (React, Flask, MongoDB Atlas, OpenAI) into a unified solution that is both functional and user-friendly.

What we learned

  • NLP and Embeddings: We learned a great deal about natural language processing, embeddings, and how to improve the performance of LLMs with RAG.

  • Frontend and Backend Integration: We gained experience in managing data flows between the frontend and backend, particularly when dealing with a variety of media files like PDFs and audio buffers.

  • User Experience: Creating a complex UI functionality to handle audio interrupts and user inputs taught us about user-centered design.

  • Error Handling: We encountered and learned how to handle various issues like file upload errors, API failures, and content extraction problems, making the application more robust.

What's next for Profound AI

  • Voice Customization: Introduce options for different voices and narration styles to enhance the user experience further.

  • Multi-Language Support: Add support for multiple languages to make the platform accessible to a wider range of students globally.

  • Integrating Cloud Service Providers: Using AWS resources like Lambda and S3 can significantly speed up our wait time when uploading inputs.

  • User Profile Creation + Caching: Right now, users need to upload Canvas information for every lecture upload. However, this takes heavy latency due to the high volume of embeddings being generated. In the future, users with accounts can simply upload their Canvas info once and save it, being able to use it again and again for different lectures.

Built With

Share this project:

Updates