Infographic_Crawler

Inspiration

We thought back to our classes in high school and remembered the presence of infographics in our education. Infographics were a vital component of most language classes, like fr*nch or Spanish, but also relevant for science and health classes as well. However, finding infographics was a tedious and difficult task, one which we aimed to automate.

What it does

Our program crawled through Reddit memes (we used this as a substitute for infographics because they were really easy to access and shared similar structures) and used OCR to develop search tags for the images. We then uploaded the images and the tag data to a MongoDB database which our flask backend would call, and then send to our nodeJS front end. We didn't have time to translate our binary data from our backend into viewable images in our front end, or display them, but we finished making a UI to implement those features in. A user would search for tags and correlated "infographics" memes would show up.

How we built it

We used nodejs for a front-end API, a flask server for our back-end API, and used https requests to get data from the back end to the front end. We connected our back end to a MongoDB which we seeded with data from a scrape and upload script.

Challenges we ran into

We ran into issues finding websites to get infographics from and for testing purposes defaulted to Reddit meme pages. We also found some issues with designed our tag system and only had time to make the meta-data and not call it in our front end.

Accomplishments that we're proud of

-Use of OCR with pytesseract -creating a two API microservice architecture rather than a centralized one API for the front end and back end -web scraping with selenium

What's next for Infographic_Crawler

We hope to finish our UI and search feature, and then push this to a heroku deployment server so the application can be widely useable on the internet

Built With

Updates

Kkobbannii Zeller started this project — Oct 24, 2021 12:09 PM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.