Inspiration

Recycling confusion leads to contamination and lower rates. We wanted a hands-free system that identifies waste and guides disposal in real time, adapting to different facilities and their bin configurations.

What it does

TossWise is an AI-powered smart trash bin that uses computer vision and natural language to classify waste. It detects when someone approaches, captures images, identifies trash items (or bags), and provides voice guidance on which bin to use. For bags, it asks about contents and classifies accordingly. It supports location-specific bin layouts and answers recycling questions.

How we built it

Computer Vision: YOLOv8 for person detection; captures 3 frames and selects the sharpest using Laplacian variance AI Classification: Google Gemini Vision analyzes images to identify trash items and determine bin placement Voice Interface: ElevenLabs TTS for feedback; Google Speech Recognition for user questions Web Configuration: Flask app with camera capture to configure location-specific bin layouts (Atlanta, Budapest, Hong Kong, Singapore) Architecture: Background threading for image analysis to keep the UI responsive; location-aware bin rules from JSON

Start / Setup | v User enters location → takes picture of the bins

| v gemini API checks the bins available and creates a json file | v user hits start | v YOLOv8 loop (frame every 0.1 s) | +--> No trash? → Loop | +--> Trash detected | v Capture image + metadata (JSON) | v Send to Google Vision | +--> Not trash → Loop | +--> Trash confirmed | v Google API decides bin (via picture taken and json created) | v Send instruction text to ElevenLabs | v Receive audio → play to user | v Send servo command to Arduino (open bin) | v Close bin after timeout → back to YOLOv8 loop

Challenges we ran into

API Compatibility: Gemini model availability issues — implemented dynamic model selection with fallbacks Accuracy: Initial misclassifications (e.g., gloves as trash) — refined prompts to emphasize visual analysis and ignore personal items Performance: UI freezing during analysis — moved processing to background threads Voice Integration: ElevenLabs API changes and model deprecation — updated to newer SDK and free-tier compatible models Arduino hardware to open the trashcan: It was our first time working on a hardware so we had to learn how to operate arduino to be able to open the right trashcan for the right detection

Accomplishments that we're proud of

Real-time Processing: 1–2 second response time from detection to voice feedback Multi-Modal AI: Combines vision, voice input, and voice output Location Flexibility: Web app configures bin layouts per location; main software adapts automatically Smart Bag Handling: Detects multiple bags and asks about each one separately Robust Error Handling: Fallback mechanisms for API failures and model selection User Experience: Natural voice interaction with question answering and "repeat" functionality

What we learned

Vision AI Limitations: Visual analysis requires careful prompting; condition (clean vs. dirty) matters for classification API Integration: Managing multiple APIs (Gemini, ElevenLabs) with version changes and rate limits Real-time Systems: Balancing accuracy and speed; background processing to maintain responsiveness User Interaction Design: Voice interfaces need clear prompts and error handling for missed inputs Location-Specific Rules: Recycling rules vary; JSON-based configuration enables flexibility

What's next for TossWise

Hardware Integration: Connect to physical bins with automated sorting mechanisms Mobile App: Companion app for configuration and usage tracking Multi-Language Support: Expand to support multiple languages for global deployment Analytics Dashboard: Track recycling accuracy and contamination rates Machine Learning Improvements: Fine-tune models on facility-specific data for better accuracy Accessibility Features: Add visual indicators and haptic feedback for users with hearing impairments Batch Processing: Handle multiple items simultaneously for faster throughput Cloud Deployment: Scale to support multiple facilities with centralized management

Built With

  • elevenlabs
  • flask
  • google-gemini-vision
  • opencv
  • pyhon
  • speachrecognition
  • yolov8
Share this project:

Updates