Flickr8K Image Captioning

What is Image Captioning ?

Image Captioning is the task of describing the content of an image in words. This task lies at the intersection of computer vision and natural language processing. Most image captioning systems use an encoder-decoder framework, where an input image is encoded into an intermediate representation of the information in the image, and then decoded into a descriptive text sequence.
Here a CNN+LSTM architecture based approach has been taken to generate captions corresponding to an input image

To perform Image Captioning we will require two deep learning models combined into one for the training purpose
CNNs extract the features from the image of some vector size aka the vector embeddings. The size of these embeddings depend on the type of pretrained network being used for the feature extraction
LSTMs are used for the text generation process. The image embeddings are concatenated with the word embeddings and passed to the LSTM to generate the next word

pip install numpy
pip install pandas
pip install matplotlib
pip install seaborn
pip install tensorflow
pip install textwrap
pip install tqdm

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
README.md		README.md
flickr8k-image-captioning-using-cnns-lstms.ipynb		flickr8k-image-captioning-using-cnns-lstms.ipynb