Inspiration
We wanted to dip our toes into machine learning, since it seems like a really powerful technology that none of us had ever used before. We thought that generating a caption based on an image would be a pretty realistic task for an AI to do, and it would be easy to adapt into a cool project.
What it does
Our project is a web app where you can upload a photo, generate a caption for it, and then automatically tweet it to our twitter bot account.
How we built it
Our application is a web app built using Flask. For caption generation, we used the keras deep learning library. We first used this locally to train our model from a dataset of images and captions, then we imported the trained model to the web app and used it to generate captions. The images and captions are then tweeted using the twitter API.
Challenges we ran into
The biggest challenge we encountered was limited computing power to train the AI model. We didn't have access to cloud servers to train our model, so we had to do it locally, which took 4+ hours. This meant we were only able to go through two iterations of our model. Near the end of the project, when trying to upload our code to a Heroku server, we ran into trouble with large dependencies and struggled to get our code running on the server. We also had a lot of trouble making our web page look good.
Accomplishments that we're proud of
We got a semi functional machine learning model to work, and we successfully integrated it into a website. This was our first real hackathon, and we did this all with technologies that none of us had ever used before, so we're happy to just get a functional, cohesive result.
What we learned
We learned how to use Flask, and in general how to make a website backend. We learned how to make a website front end using HTML and CSS. We learned how to train a machine learning model from a dataset using keras. We learned how to use the Twitter API.
What's next for Image Caption AI
In the future, we would like to train our model on a larger, more diverse dataset, which would massively improve our captions. If possible we would use a dataset of image captions from social media, so our captions would be more appropriate. We would also like to train multiple models to generate captions with different moods, so the user can select the most appropriate.
Log in or sign up for Devpost to join the conversation.