Inspiration
We were inspired by a common travel pain: luggage is hard to identify on crowded conveyor belts, and misrouting or mix-ups are frequent. We wanted a way to “Google search” for your bag using everyday language—powered by vision models.
What It Does
Luggage Vision detects bags from live camera feeds or uploaded videos, generates identifying descriptions, stores bag records, and lets users search semantically to locate the closest match. It also supports basic lifecycle tracking (e.g., CHECKIN → CONVEYOR).
Additional System
We also developed a staff-facing risk scoring system that predicts which bags are at risk of delay or misrouting before they are lost. The system models airport baggage flow as a graph with congestion-aware routing, allowing employees to intervene proactively. This complements our passenger-facing CV system by focusing on prevention rather than post-loss recovery.
How We Built It
- FastAPI backend orchestrates detection, captioning, embedding, and persistence.
- Roboflow YOLO detects luggage in each frame.
- BLIP produces descriptive captions to capture distinguishing details.
- CLIP converts text (and optionally image) into embeddings for similarity search.
- Supabase + pgvector stores bag metadata and performs fast vector search.
- Next.js frontend provides live webcam capture, upload workflow, results display, and search UI.
Challenges We Ran Into
- Getting consistent descriptions across varying lighting, angles, and occlusions.
- Tuning detection and cropping so BLIP focuses on the bag (not background clutter).
- Making semantic search “feel right” with short, user-style queries.
- Managing latency for real-time webcam processing.
Accomplishments We’re Proud Of
- End-to-end pipeline from live detection → description → vector search.
- Natural language search that works with imperfect, human descriptions.
- A clean split between frontend UX and backend ML services for easy iteration.
What We Learned
- Description quality heavily impacts semantic search—good prompts and clean crops matter.
- Vector search needs careful indexing and embedding consistency to be useful at scale.
- Real-time CV is mostly an engineering problem: batching, throttling, and sensible defaults.
What’s Next for Luggage Vision
- Multi-camera tracking and re-identification across checkpoints.
- Stronger status flows (e.g., LOADED, IN_TRANSIT, ARRIVED) + event timelines.
- User-confirmation loop to improve accuracy over time (active learning).
- In the future, we hope to integrate the passenger-facing CV system with the internal risk scoring model for a seamless preventive and recovery workflow. We also plan to explore more advanced predictive algorithms using real-time congestion and flight data to improve risk accuracy. Finally, extending the system to larger airports or multiple hubs could evaluate scalability and broader operational impact.
Built With
- blip
- clip
- fastapi
- figma
- huggingface
- javascript
- nextjs
- pytorch
- roboflow
- sam3
- supabase
Log in or sign up for Devpost to join the conversation.