See++ is a web app that lets users analyze their surroundings using a webcam, voice input, and AI — powered by Google's Gemini multimodal model. You can ask questions by clicking a button or speaking, and See++ will respond with both on-screen and spoken answers.
- 📷 Uses your webcam to capture live images
- 🔊 Converts AI responses to speech
- 🗣️ Accepts spoken input through voice recognition
- 🔄 Flip between front and rear cameras
- 🧠 Sends queries and images to Gemini for intelligent answers
git clone https://github.com/ParthParikh04/See.gitpython -m venv venv
source venv/bin/activate Note: Make sure this virtual environment (venv) is created within the project directory.
pip install -r requirements.txt- Visit https://makersuite.google.com/app
- Sign in with your Google account
- Click your profile icon (top right) → "API Key"
- Copy the key
Create a .env file in the root of the project directory with the following contents:
GEMINI_API_KEY=your_api_key_here🔐 Important: Keep your
.envfile secret — never share it or commit it to version control.
python app.pyGo to: http://localhost:5000
Want to test See++ on your phone? You can use ngrok to tunnel your localhost and expose it to the internet:
-
Install ngrok
If you haven't already, install it from https://ngrok.com/download, or use Homebrew:
brew install ngrok
-
Sign in and authenticate
Create an account on ngrok and get your auth token from your dashboard. Then run:
ngrok config add-authtoken your_token_here
-
Start your Flask app
In one terminal tab, run:
python app.py
-
Expose your local server with ngrok
In a new terminal tab, run:
ngrok http 5000
-
Get the public URL
ngrok will display a forwarding URL like:
Forwarding https://abc123.ngrok.io -> http://localhost:5000 -
Visit on your phone
Open the
https://abc123.ngrok.iolink on your phone (make sure your phone and computer are both online).
📸 Your phone will ask for access to the camera and mic — allow these for full functionality.
- Allow access to your webcam and microphone when prompted
- Click a button to:
- 📄 Extract text from the camera view
- 🖼️ Describe the current scene
- 🎤 Use your voice to ask a question
- The app will:
- Capture a webcam frame
- Send it (along with your question) to the Gemini API
- Display and speak the response
- Python 3.7+
- A modern browser (Chrome recommended)
- Internet access
- A Gemini API key from Google
- Parth Parikh
- Andria Wang
- Colby Brown
- James Martin
MIT — use freely, modify creatively, and share responsibly!