Inspiration
We thought back to our classes in high school and remembered the presence of infographics in our education. Infographics were a vital component of most language classes, like fr*nch or Spanish, but also relevant for science and health classes as well. However, finding infographics was a tedious and difficult task, one which we aimed to automate.
What it does
Our program crawled through Reddit memes (we used this as a substitute for infographics because they were really easy to access and shared similar structures) and used OCR to develop search tags for the images. We then uploaded the images and the tag data to a MongoDB database which our flask backend would call, and then send to our nodeJS front end. We didn't have time to translate our binary data from our backend into viewable images in our front end, or display them, but we finished making a UI to implement those features in. A user would search for tags and correlated "infographics" memes would show up.
How we built it
We used nodejs for a front-end API, a flask server for our back-end API, and used https requests to get data from the back end to the front end. We connected our back end to a MongoDB which we seeded with data from a scrape and upload script.
Challenges we ran into
We ran into issues finding websites to get infographics from and for testing purposes defaulted to Reddit meme pages. We also found some issues with designed our tag system and only had time to make the meta-data and not call it in our front end.
Accomplishments that we're proud of
-Use of OCR with pytesseract -creating a two API microservice architecture rather than a centralized one API for the front end and back end -web scraping with selenium
What's next for Infographic_Crawler
We hope to finish our UI and search feature, and then push this to a heroku deployment server so the application can be widely useable on the internet
Log in or sign up for Devpost to join the conversation.