Inspiration
This project began when I had to read a long paper for my research and I felt that it would be easier if I could have the PDF read out loud using a text to speech service.
However, I was incredibly disappointed to find that the first 5+ results from a Google search for "pdf text to speech" were paid services. Additionally many services advertised as free had text limits and required monthly subscriptions to use fully.
Worldwide, more than 253 million people have moderate to severe visual impairment with 36 million people being blind with these numbers increasing year over year. Additionally, 89% of visually impaired people live in low and middle income countries. (International Agency for the Prevention of Blindness)
For many of these people, PDF's , the basic file format in which information is transmitted in the 21 century are virtually inaccessible since expensive monthly services are out of reach for many.
People with visual impairments struggle enough as it is and not allowing them to partake in this information ecosystem hinders their ability to learn and grow in our world. We were inspired by all the dreamers, scientists, and innovators for for whom knowledge should be free and accessible.
For them, we developed Free PDF Reader. A web-application where users can upload PDFs and have them read outloud to them using advanced text-to-speech. The application is free to use with no character limits. It has a high-contrast minimal design specifically designed for users with visual impairments in mind.
What it does
Our application is hosted online at https://free-pdf-reader.herokuapp.com/. Selecting the "Upload File" button will open an interface where a user can select and upload a pdf. After pressing "Submit", audio for the generated speech will be generated and buttons for playing back this speech will appear.
The speech can be played/paused with the spacebar or the play and pause buttons. The volume can be increased with the left and right arrow keys or the Vol+ and Vol- buttons.
How we built it
When the user uploads a file using the form and clicks submit, the backend will store the file, parse it into strings which are passed through the GTTS (Google Text to Speech) method which converts the string into audio, saves the audio temporarily, and returns the location to the audio file for the website to play.
We split the work where one implemented the file upload using multer and pdf parsing as strings while another worked on getting Google Text To Speech (GTTS) working, after we finished all functionalities, we merged them together and implemented a custom themed UI using Semantic-Ui.
We personally added high contrast colors such as neon green, cyan, and magenta over a black background with large fonts so it is easier to read. We also used a mixture of icons and text for ease of reading. To ensure that our website can be read with ease, we tested it with our friend in conditions which will simulate visual impairment.
Challenges we ran into
Our primary challenges were our lack of experience in git and overall apart from a messy workflow, we did not hit any major problems when working on this project.
Accomplishments that we're proud of
We're proud that we developed an accessible web-application that will help visual impaired people to partake in the modern information ecosystem.
What we learned
We learned more about the various text-to-speech libraries in the nodeJs, how to upload and manipulate files and customize the HTML audio player. Moreover, we learned to refine our workflow. Additionally, we learned how to develop an accessible web application.
What's next for Free PDF Reader
The website should be optimized to process longer text and audio files as there are both space and time limitations to what we can work on. More UI options should be implemented to accommodate more visually impaired people, such as multiple colorblind options and voice feedback.
To scale globally, there are a few things that will need to be done. Language support for PDF parsing and text generation will need to be expanded beyond English. It would be a great to support the other 9 most spoken languages: Mandarin, Spanish, Hindi, Arabic, Portuguese, Bengali, Russian, Japanese, and Lahnda which would support 3.65 billion people or about 51% of the world population .
Scaling this service so that it could be used globally for more people would also require reducing the cost of the API service. It's currently free, however at scale, requests would increase and the response time would increase. To handle this, a foundation could be established to pay for the cost of hosting (compute and storage) for the text to speech service. Further on, an in-house text to speech library could be written that could be hosted at a lower cost than external API's.
Social Impact and Best UI
Built With
- ejs
- express.js
- gtts
- heroku
- multer
- node.js
- semantic-ui



Log in or sign up for Devpost to join the conversation.