EduScroll

generating script and tts
default screen
saved reels
upload content
settings
practice question
AI assistant
text input for script generation
reel

Inspiration

-Procrastination is a huge issue, and a large majority of college students are chronic phone scrollers or TikTok/Instagram Reel scrollers. We decided to create an app that generates short-form videos or reels that help you learn course material based off of user uploaded images, pdf files, and text chats.

What it does

-EduScroll is an app that takes user uploaded data and having Google Gemini create scripts for a TTS api, ElevenLabs, to read it aloud for the user to listen to while showing subtitles with an engaging background to maintain focus and concentration. It saves the scripts on the phone for future use and also generates questions per video to quiz the user.

How we built it

We built our app using React Native Expo. We set it up through the terminal and used built in Expo tools to handle features such as document and image uploads. The code is mostly TypeScript with a little JavaScript. To make everything work we ran a local server to process the uploaded files and send them to the AI to create the video scripts. Once the backend was done we focused on making a clean and simple frontend with React Native and we used Gemini to design our app icon.
We layered the main system into 3 sections, being the background "attention-grabbing" video, the script generation and subtitles, and the text to speech that read aloud the script. The videos playing were created using mp4 files of YouTube videos in portrait mode that were then randomly accessed to play from a starting point within the video.
For script generation, we used Google Gemini to read user uploaded text, images, or PDF files. With a specific prompt, the AI makes a JSON object with a title, a script, and five questions. To keep the app from crashing, we clean the text to remove extra words and fix code errors. We parse it normally, but use the dirty json library if it fails. Finally, we send the script to ElevenLabs for the voiceover and put the questions in the Quiz tab.
After the script was created, it was sent to ElevenLabs to synthesize the audio through ElevenLabs' API. We are currently using the eleven flash v2.5 model to prioritize speed and low latency so that we can show the final video product to the front facing user as fast as possible. After the API returns the audio data, it is converted into base64 and saved permanently to the device's local life system as an .mp3 file, so that it can be accessed later. ## Challenges we ran into
Before any code was done, the biggest challenge was figuring out what to create. Initially, we planned on taking inspiration from Gathr to allow for better SASE networking and a general app to replace QR sign-ins. However, SASE doesn't have the same issues with a lot of other clubs related to member involvement and GBM retention, as well as the current system of sign-ins working fine. We decided to switch to something more general, and after considering about college student issues such as time management, paying attention, and relationship issues, we focused on something that could help students manage time more efficiently. While not the same as studying, we hope that it serves as a transition from active doomscrolling to actual studying by simulating the process of scrolling.
At first the AI put the entire script into one big video. This went against our goal of making short swipeable videos for each topic. We fixed this by updating our prompt to force breaks in the JSON so each video got its own text. Our next big issue was API limits. If we hit our quota the script generation would fail and show an error. We handled this by building a "fall back" system that automatically switches to another model that still has unused quota. Finally we ran into parsing errors because the parser could not read math symbols or weird line breaks. We solved this by adding text filters and tweaking the prompt again so the JSON output stays clean and easy to parse.
One challenge that we ran into while working on the text to speech was credit utilization. During the development phase, we were burning through all of our credits extremely fast because we were designing the whole system where it would send all of the generated scripts to ElevenLabs and convert them to audio files. This eats up thousands of credits before we were able to get the audio working with the video. To minimize the utilization, we added temporary code and commented out existing code so that only one script is sent to ElevenLabs for audio synthesis. Using the one audio file that is returned back to us, we were able to solve the problem without burning through the credit utilization. ## Accomplishments that we're proud of
While not the biggest achievement when compared to the time spent towards other features later, the initial completion of the background video playing was a major step. While it was initially grainy video, no audio, and had placeholder icons and text for the features, it was really motivating to see an outline come together. ## What we learned
Creating this app, made us realize how important performance is to the end customers. As an app that needs to handle infinite scroll of generated educational content, we need to make sure that the video is shown to the user without any buffering or delay. Loading between videos means game over for us. Throughout the entire development phase, each iteration and feature we create, we always asked "how well will this perform realistically?" Many of our design ideas were either scrapped or changed as the direct result of poor performance. ## What's next for EduScroll
We currently have sections planned out for a notifications and filter feature. The notifications will be able to send reminders about upcoming events based off uploaded syllabi as well as allowing for manual user input to change tentative events. The filter feature would allow for easy navigation through uploaded content through keyword search based off what class you'd want to focus on, instead of just singular topics grouped together in the saved feature.