Skip to content

jsngalloway/ImageTextSearch

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

22 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Image Text Search

A project created at the Cypher V Hackathon (3/23/2019)

Jesse Galloway, Isabella Fernandez, Yang Zhang, Reeves Trott

Overview:

This project takes a list of links to images, extracts all the text found in the image, and allows the user to search the database to find the original image containing that text.

Uses:

  • Search for images based on text
  • Censor images based on contained text
  • Accessibility of images for the visually impared

1. Parse Image Database:

We were able to obtain a database of some of the most popular images on the internet here: https://www.kaggle.com/sayangoswami/reddit-memes-dataset/version/1#db.json We parsed this db in python using Yang's python script in to a list of links (imagelist.txt)

2. Google Cloud Vision

We used Google cloud vision to perform OCR on each image, creating a database of each image url, along with the text contained in the picture. Our sample database for this project is database.zip

3. Search Functionality

In Java, we loaded the entire database into an searchable array, which returned results based on relevance using a priority queue.

Future Directions:

This method could be employed to index and search images on a large scale by image hosting sites, social media, or archives. This utility also has the potential to increase accessibility of images for those who are visually impared.

Notes:

Before proceeding further, please install pandas, the scipy extension of numpy, and mpl_toolkits into your Python library. Then add the external libraries onto the project if working in Eclipse.

About

Pulls relevant text out of images such as screencaps which are then added to a searchable databse

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors