Inspiration
Everyone wants more followers on Instagram. After all, Instagram is designed to draw people in based on the addictive nature of getting attention in the form of likes and followers.
What it does
It uses a support vector machine trained on the accounts that a user follows in order to grow the user's reach and impact on Instagram. The support vector machine produces a probability representative of whether a particular Instagram account will follow back, allowing the original user to only follow users likely to return the favor. Additionally, a crawler is continuously working in the background, traversing your extended network to push more account data to MongoDB to identify promising accounts. Another crawler pulls data from this MongoDB collection and follows and unfollows users based on the model's predictions in order to provide a stream of passive growth to the user's Instagram account.
How I built it
The web application is built in flask, HTML, and CSS. Because of rate limitation, it is important to be efficient with the requests that are sent to Instagram, so all of the data collected from requests is saved to a corresponding MongoDB collection. The MongoDB collections are central to the application's functionality, as they ensure that a request is not wasted on a duplicate request. The crawlers use task scheduling and a combination of selenium and http requests to simulate human browsing. They access Instagram’s private API endpoints by emulating a legitimate user’s session cookies and request headers. Additionally, in order to access a larger volume of requests, the load is split between the web private API and the Android Private API. The machine learning model is built with scikit-learn, and uses post count, followers, following, mutual followers, and much more to generate the most accurate model possible.
Challenges I ran into
Instagram rate limitation was the single greatest challenge of this project. Instagram's business model relies on their mountains of data, so they do not like people poking around in it. In order to mitigate these challenges, the crawlers use careful timing, and all requests sent use generated session headers and cookies in order to pose as typical user browsing.
Accomplishments that I'm proud of
I am proud of getting around rate limitation for the most part. Also, the front end design of the website is very clean and intuitive.
What I learned
I learned a lot more about web scraping, due to the challenge of getting at Instagram's data.
What's next for Lead Dog
This could potentially be developed into a live commercial web app, as many businesses are looking to expand their online reach. All of the other solutions currently available are not customized to the user's account, but instead spam follow requests to accounts that post under specific hashtags. This model should lead to a higher conversion rate, as it evaluates the actual accounts and seeks out accounts that are already in the user's extended network.

Log in or sign up for Devpost to join the conversation.