Semantic Image Search with DocArray + Transformers
Semantic Pet Vision is a lightweight semantic image search engine that retrieves similar pet images (cats 🐱 and dogs 🐶) based on their visual meaning, not just filenames or keywords.
By leveraging Hugging Face vision encoders and DocArray, the system transforms images into embeddings and performs similarity search. A simple query like:
"cute ginger kitten"
returns the most semantically relevant image from the dataset.
- Indexing – Images are encoded into embeddings using CLIP.
- Storage – Embeddings are stored in
doc.tagsusing DocArray. - Search – Natural language queries are encoded & compared with cosine similarity.
- Result – The closest image is returned as
top_match.jpg.
- Semantic Search → go beyond keywords, search by meaning
- Pet Focused → trained on cats & dogs for fun experimentation
- DocArray Powered → efficient storage + similarity search
- MIT Licensed → free to use, modify, and expand
Run the main script to:
- Create documents from sample images
- Generate embeddings
- Perform a query search
python main.pyExample query:
Query: cute ginger kitten
Top match image saved to top_match.jpg
Though fun-sized, this project is a microcosm of real-world systems.
- 🔍 Content-based image retrieval (e.g., “find me similar products”)
- 🐕 Pet adoption search engines (search by photo)
- 📷 Duplicate detection in large photo collections
This project demonstrates semantic retrieval in computer vision, bridging representation learning and information retrieval. It’s a practical, compact example of applying deep embeddings for multimodal search tasks.
- Expand dataset beyond cats & dogs
- Integrate a web-based UI for interactive search
- Experiment with multi-modal queries (text + image)
This project is licensed under the MIT License - free for academic, and personal use.