ReadAble

Home Page
UI when using ReadAble
Looking through pass books

Inspiration 💡

Have you ever seen a new word, and wondered what it meant? Have you ever had a conversation with someone, and had to awkwardly nod along as you pretend to understand certain things you don't quite know? If this is you, then you can imagine how painful this experience might be for youth, learning the language. Being unable to comprehend many things we know might be infuriating and agonizing. This is very apparent when a student tries reading a new book, and they have to turn to using a device with google translate open half the time just to understand it. This is where ReadAble comes in to play!

What it does❔

ReadAble is a modern webapp that allows students to have a digital reading assistance at standby. By using a camera, ReadAble is able to follow along your book, and provide real-time help when needed. This could include reading the book verbally (TTS) and providing real-time definitions of words you may not understand, all while storing your books for future usage. Students can go back to previous books if they wish to reread, or they can provide their prescanned books to other users.

How we built it⚙️

For our project we used a combination of multiple libraries and languages to make it possible. On the backend, we are using python to accomplish all of the image processing and audio. Using Opencv we are able to get access to the webcam from which we scrape each frame. Using the frame, we check if text is detected using the easyocr library. Then we convert the detected image into text. Using the pyttsx3 library we can take the text and pass some parameters like speed and volume to create an audio file. This audio file is manipulated by pygame to be able to play, pause, and resume on command. To get the highlighted sentences while the audio is running we use the bounding box info from the ocr output to create rectangles around each sentence as it's running. Using the same bounding box info and hand/finger location data, we can use the media pipe library to track which word your finger is the closest to. Once a word is detected, we use the PyMultiDictionary library to pull the defenition. Once all this information is calculated and displayed on the frame, the frame gets sent to the front-end. Using flask we setup a local host for our front-end website which was created through html, css/scss and Javascript. Through the flask server we are able to send the opencv frames to display on the website, alongside retrieve user input data to change parameters in our code and store data.

Challenges we ran into❗

Integrating Backend Features (new to this so it took some time to understand the logic behind it as well as the tricky syntax, merging the code also took a lot of time)
Dealing with execution speeds for ML models (finger pointing looked through every word, and applying every definitions to every word made it slow)
Breaking down text to speech word by word vs line by line (library limitations)

Accomplishments that we're proud of🏆

Major accomplishments that were able to help motivate us to continue were seeing the progress in real-time. For example, the first time we got hand recognition to show in real-time on screen was a truly exciting experience. Obviously there was much more work to be done on the backend, but seeing some level or progress being shown to be working was a great accomplishment.

What we learned📚

How frontend and backend integration works
How to implement finger pointer control with ML
How to get real-time hand recognition and display it
How to search for word definitions with Python

What's next for ReadAble

Further word descriptions
Whether word is a noun or verb
Synonyms
Antonyms
Multilanguage support
Mobile App
Adding better UI for stored books page to include sorting

Built With

css
html
javascript
python
visual-studio-code

Submitted to

Ignition Hacks 2024

Created by

I worked on Backend word definitions and storage. I was also delegated to create the video, and work primarily on the DevPost.

Aryaen Sharma
I worked on the Frontend of the website using HTML, Sass, & JS while also integrating backend features using Flask.

Manjot Dola
I worked on the backend focusing on TTS, text recognition, and hand recognition/tracking. This was my first time using opencv, pyttsx3, easyocr and the mediapipe library, which were all crucial for this application.

Harsharan Rakhra

Updates

Aryaen Sharma started this project — Aug 18, 2024 09:35 AM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.