Your AI Tactical Analyst — See the Game. Hear the Coach. Speak the Play.
Built for the Gemini Live Agent Challenge · Live Agents 🗣️
What if you could have a Jarvis for soccer — an AI analyst that watches the match alongside you, listens to your questions, and breaks down every tactical detail with its voice in real-time?
Phantom Coach is a multimodal AI coaching assistant powered by the Gemini 2.5 Flash Live API. You stream any soccer match and have a natural voice conversation with an AI that operates at the level of a UEFA Pro License tactical analyst. Ask it what you see, interrupt it mid-sentence, tell it to switch to the tactical board, and watch as it draws corrections and simulates plays on a live 2D pitch — without ever touching your mouse.
This isn't a chatbot that analyzes screenshots. It's a live, bidirectional session — Phantom Coach sees every frame, tracks every player, and speaks back to you with grounded tactical analysis. No typing. No waiting. Just talk.
Stream video and have a real-time voice conversation with Gemini 2.5 Flash. The AI watches the match, identifies formations, detects pressing triggers, and narrates tactical insights — all while you can interrupt and redirect naturally.
An auto-generated, real-time tactical map powered by computer vision. Players are tracked via YOLOv8 + ByteTrack, positions are calibrated via RANSAC homography, and the board updates live as the match progresses.
Voice-command the AI to simulate plays directly on the 2D board — "Show me where the left winger should be" — and watch as the AI animates player movements, draws correction arrows, and highlights passing lanes.
A state machine with 4 distinct modes (Live Analysis → Transition → Tactical Board → Waiting for Confirmation) gates which tools and prompts are active, ensuring the AI stays grounded and contextually aware at all times.
The backend agent pipeline detects turnovers, pressing triggers, line-breaking passes, and formation shifts in real-time — feeding grounded data into Gemini to prevent hallucination.
Full support for natural interruption. Ask a question mid-analysis, redirect the AI to a different part of the pitch, or tell it to switch views — all via voice, handled seamlessly by the Live API.
| Layer | What It Does |
|---|---|
| User | Voice input via microphone, video feed, browser interaction |
| Frontend | React/Vite/TypeScript UI with VideoPlayer, TacticalBoard, GhostOverlay, CommandCenter. Zustand for state. MultimodalStreamer sends audio + JPEG frames over WebSocket |
| Backend | FastAPI on Cloud Run. GeminiLiveClient manages bidirectional Gemini sessions. EventBus-driven agent pipeline: VisionTrackingAgent → StandardizerEngine → TacticalAnalysisAgent |
| Computer Vision | YOLOv8 (detection, pose, segmentation), ByteTrack with appearance Re-ID, RANSAC + Kalman pitch calibration, team classification |
| Tactical Analytics | Voronoi pitch control, formation detection, Expected Threat (xT) model, moment indexing (turnovers, shots, set-pieces, PPDA) |
| Google Cloud | Gemini 2.5 Flash Live API, Cloud Run (hosting), Firebase Auth + Firestore + Storage |
| Category | Technologies |
|---|---|
| Frontend | React 19, Vite 7, TypeScript, Tailwind CSS 4, Zustand, Framer Motion, Lucide Icons |
| Backend | FastAPI, Python 3.11, Uvicorn |
| AI / ML | Gemini 2.5 Flash (Live API via google-genai SDK), YOLOv8 (Ultralytics), ByteTrack |
| Computer Vision | OpenCV, RANSAC Homography, Kalman Filtering, Voronoi Tessellation, DBSCAN Clustering |
| Cloud | Google Cloud Run, Firebase (Authentication, Cloud Firestore, Firebase Storage) |
| DevOps | Docker, automated deployment via deploy.sh |
| Service | How It's Used |
|---|---|
| Gemini 2.5 Flash Live API | Bidirectional streaming — real-time voice + vision analysis via google-genai SDK |
| Google Cloud Run | Container hosting for the FastAPI backend with auto-scaling (1–5 instances, 4 vCPU, 4 GB RAM, always-on CPU) |
| Firebase Authentication | User sign-in and session management |
| Cloud Firestore | Persistent storage for coaching projects, tactical sessions, and match data |
| Firebase Storage | Video uploads and extracted frame storage |
| Cloud Build | Container image building, fully automated via deploy.sh |
- Node.js 18+ & npm
- Python 3.11+
- Google Cloud project with billing enabled
- Gemini API Key from Google AI Studio
- Firebase project (Auth, Firestore, Storage enabled)
git clone https://github.com/luminousyinyang/Phantom_Coach.git
cd Phantom_Coachcd backend
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
pip install -r requirements.txtCreate backend/.env:
GEMINI_API_KEY=your_gemini_api_key
GOOGLE_APPLICATION_CREDENTIALS=./Service_Account_Key.json
FIRESTORE_DATABASE_ID=your_firestore_database_id
FIREBASE_STORAGE_BUCKET=your_firebase_storage_bucketPlace your Google Cloud Service Account JSON file as backend/Service_Account_Key.json.
Start the backend:
uvicorn main:app --reloadcd frontend
npm installCreate frontend/.env:
VITE_FIREBASE_API_KEY=your_firebase_api_key
VITE_FIREBASE_AUTH_DOMAIN=your_project.firebaseapp.com
VITE_FIREBASE_PROJECT_ID=your_project_id
VITE_FIREBASE_STORAGE_BUCKET=your_project.appspot.com
VITE_FIREBASE_MESSAGING_SENDER_ID=your_sender_id
VITE_FIREBASE_APP_ID=your_app_id
VITE_GEMINI_API_KEY=your_gemini_api_keyStart the frontend:
npm run devThe app will be available at http://localhost:5173.
Phantom Coach deploys to Google Cloud Run as a single containerized service (backend serves the built frontend).
A single-command deployment script is included:
bash deploy.shWhat deploy.sh does:
- Builds the React frontend (
npm run build) - Copies the build output into the backend directory
- Submits the Docker image to Cloud Build (
gcloud builds submit) - Deploys to Cloud Run with the included environment variables
- Frontend and backend are now fully deployed!
See deploy.sh for the full deployment automation script.
Phantom_Coach/
├── frontend/ # React + Vite + TypeScript
│ └── src/
│ ├── components/ # UI: VideoPlayer, TacticalBoard, GhostOverlay, CommandCenter, etc.
│ ├── services/ # MultimodalStreamer (WebSocket audio/video streaming)
│ ├── store/ # Zustand state management
│ ├── context/ # Firebase Auth context
│ └── types/ # TypeScript definitions
├── backend/ # FastAPI + Python
│ ├── api/ # REST + WebSocket route handlers
│ ├── intelligence/ # GeminiLiveClient, AgentStateMachine, EventBus, tool declarations
│ ├── agents/ # VisionTrackingAgent, StandardizerEngine, TacticalAnalysisAgent
│ ├── services/ # CV (tracker, calibrator, classifier), Tactical (semantics, xT, indexing)
│ ├── main.py # FastAPI application entrypoint
│ ├── Dockerfile # Production container
│ └── requirements.txt
├── assets/ # Architecture diagram (.mmd + .png)
├── deploy.sh # Automated Cloud Run deployment
└── README.md
