GitHub - emna-khemiri/runway-search-engine: Text-to-image search engine for fashion runway photos using CLIP and FAISS.

Runway Search Engine

This project implements a fashion runway search engine that allows users to search for runway images using text queries (e.g., "dress with floral pattern"). The system leverages the CLIP model for semantic text-to-image matching and FAISS for efficient similarity search. Background-removed images are used for feature extraction to focus on clothing, while original runway images are displayed in the results.

Search Workflow

The search functionality combines CLIP (Contrastive Language-Image Pretraining) and FAISS (Facebook AI Similarity Search) to deliver fast and accurate results. Here's how it works:

Image Feature Extraction:
- Images with removed backgrounds are processed using the CLIP model (CLIP-ViT-L-14-laion2B-s32B-b82K).
- Each image is converted to a 768-dimensional feature embedding, normalized for consistent similarity comparisons.
- Features are extracted once and cached (features.npy, paths.pkl) to avoid redundant computation.
FAISS Indexing:
- The extracted image features are added to a FAISS IndexFlatL2 index, which uses L2 distance for efficient nearest-neighbor search.
- The index enables fast retrieval of the most similar images to a given query, even with large datasets.
Query Processing:
- A user submits a text query via the FastAPI /search endpoint.
- The query is encoded into a 768-dimensional CLIP embedding using the CLIP model's text encoder and normalized.
Similarity Search:
- The query embedding is searched against the FAISS index to find the top k closest image embeddings.
- FAISS returns the indices of the matching images, which are mapped to their corresponding image paths.
Result Delivery:
- The matching image paths (from background-removed images) are converted to URLs pointing to the original runway images (in fw25).
- The FastAPI server returns a JSON response with a list of image URLs.
- The frontend fetches and displays the original runway images.

Why CLIP and FAISS?

CLIP: Encodes text and images into a shared embedding space, enabling semantic matching between natural language queries and visual content.
FAISS: Provides highly efficient similarity search, making it scalable for large image collections.
Background Removal: Using background-removed images for feature extraction ensures the model focuses on clothing, avoiding irrelevant matches based on runway backgrounds.

Example

Query: "fur coat"
Process:
- CLIP encodes the query into a feature vector.
- FAISS finds the images with the closest feature vectors.
- The system returns URLs to the original runway images.
Result: A list of runway images featuring fur coats.

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
backend		backend
frontend		frontend
.gitignore		.gitignore
Preprocessing.py		Preprocessing.py
README.md		README.md
boho-chic.png		boho-chic.png
fur-coat.png		fur-coat.png
requirements.txt		requirements.txt
text-to-image-search-workflow.png		text-to-image-search-workflow.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Runway Search Engine

Search Workflow

Why CLIP and FAISS?

Example

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Runway Search Engine

Search Workflow

Why CLIP and FAISS?

Example

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages