Skip to content

luminousyinyang/Phantom_Coach

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

87 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

⚽ Phantom Coach

Your AI Tactical Analyst — See the Game. Hear the Coach. Speak the Play.

Built for the Gemini Live Agent Challenge · Live Agents 🗣️


What is Phantom Coach?

What if you could have a Jarvis for soccer — an AI analyst that watches the match alongside you, listens to your questions, and breaks down every tactical detail with its voice in real-time?

Phantom Coach is a multimodal AI coaching assistant powered by the Gemini 2.5 Flash Live API. You stream any soccer match and have a natural voice conversation with an AI that operates at the level of a UEFA Pro License tactical analyst. Ask it what you see, interrupt it mid-sentence, tell it to switch to the tactical board, and watch as it draws corrections and simulates plays on a live 2D pitch — without ever touching your mouse.

This isn't a chatbot that analyzes screenshots. It's a live, bidirectional session — Phantom Coach sees every frame, tracks every player, and speaks back to you with grounded tactical analysis. No typing. No waiting. Just talk.


Features

🎙️ Live Voice Analysis

Stream video and have a real-time voice conversation with Gemini 2.5 Flash. The AI watches the match, identifies formations, detects pressing triggers, and narrates tactical insights — all while you can interrupt and redirect naturally.

📐 2D Tactical Board

An auto-generated, real-time tactical map powered by computer vision. Players are tracked via YOLOv8 + ByteTrack, positions are calibrated via RANSAC homography, and the board updates live as the match progresses.

🎯 Tactical Simulations

Voice-command the AI to simulate plays directly on the 2D board — "Show me where the left winger should be" — and watch as the AI animates player movements, draws correction arrows, and highlights passing lanes.

🧠 Context-Aware Coaching

A state machine with 4 distinct modes (Live Analysis → Transition → Tactical Board → Waiting for Confirmation) gates which tools and prompts are active, ensuring the AI stays grounded and contextually aware at all times.

⚡ Real-Time Tactical Alerts

The backend agent pipeline detects turnovers, pressing triggers, line-breaking passes, and formation shifts in real-time — feeding grounded data into Gemini to prevent hallucination.

🗣️ Barge-In & Interruption Support

Full support for natural interruption. Ask a question mid-analysis, redirect the AI to a different part of the pitch, or tell it to switch views — all via voice, handled seamlessly by the Live API.


Architecture

System Architecture

Layer What It Does
User Voice input via microphone, video feed, browser interaction
Frontend React/Vite/TypeScript UI with VideoPlayer, TacticalBoard, GhostOverlay, CommandCenter. Zustand for state. MultimodalStreamer sends audio + JPEG frames over WebSocket
Backend FastAPI on Cloud Run. GeminiLiveClient manages bidirectional Gemini sessions. EventBus-driven agent pipeline: VisionTrackingAgent → StandardizerEngine → TacticalAnalysisAgent
Computer Vision YOLOv8 (detection, pose, segmentation), ByteTrack with appearance Re-ID, RANSAC + Kalman pitch calibration, team classification
Tactical Analytics Voronoi pitch control, formation detection, Expected Threat (xT) model, moment indexing (turnovers, shots, set-pieces, PPDA)
Google Cloud Gemini 2.5 Flash Live API, Cloud Run (hosting), Firebase Auth + Firestore + Storage

Tech Stack

Category Technologies
Frontend React 19, Vite 7, TypeScript, Tailwind CSS 4, Zustand, Framer Motion, Lucide Icons
Backend FastAPI, Python 3.11, Uvicorn
AI / ML Gemini 2.5 Flash (Live API via google-genai SDK), YOLOv8 (Ultralytics), ByteTrack
Computer Vision OpenCV, RANSAC Homography, Kalman Filtering, Voronoi Tessellation, DBSCAN Clustering
Cloud Google Cloud Run, Firebase (Authentication, Cloud Firestore, Firebase Storage)
DevOps Docker, automated deployment via deploy.sh

Google Cloud Services Used

Service How It's Used
Gemini 2.5 Flash Live API Bidirectional streaming — real-time voice + vision analysis via google-genai SDK
Google Cloud Run Container hosting for the FastAPI backend with auto-scaling (1–5 instances, 4 vCPU, 4 GB RAM, always-on CPU)
Firebase Authentication User sign-in and session management
Cloud Firestore Persistent storage for coaching projects, tactical sessions, and match data
Firebase Storage Video uploads and extracted frame storage
Cloud Build Container image building, fully automated via deploy.sh

Getting Started

Prerequisites

  • Node.js 18+ & npm
  • Python 3.11+
  • Google Cloud project with billing enabled
  • Gemini API Key from Google AI Studio
  • Firebase project (Auth, Firestore, Storage enabled)

1. Clone the Repository

git clone https://github.com/luminousyinyang/Phantom_Coach.git
cd Phantom_Coach

2. Backend Setup

cd backend
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate
pip install -r requirements.txt

Create backend/.env:

GEMINI_API_KEY=your_gemini_api_key
GOOGLE_APPLICATION_CREDENTIALS=./Service_Account_Key.json
FIRESTORE_DATABASE_ID=your_firestore_database_id
FIREBASE_STORAGE_BUCKET=your_firebase_storage_bucket

Place your Google Cloud Service Account JSON file as backend/Service_Account_Key.json.

Start the backend:

uvicorn main:app --reload

3. Frontend Setup

cd frontend
npm install

Create frontend/.env:

VITE_FIREBASE_API_KEY=your_firebase_api_key
VITE_FIREBASE_AUTH_DOMAIN=your_project.firebaseapp.com
VITE_FIREBASE_PROJECT_ID=your_project_id
VITE_FIREBASE_STORAGE_BUCKET=your_project.appspot.com
VITE_FIREBASE_MESSAGING_SENDER_ID=your_sender_id
VITE_FIREBASE_APP_ID=your_app_id
VITE_GEMINI_API_KEY=your_gemini_api_key

Start the frontend:

npm run dev

The app will be available at http://localhost:5173.


Cloud Deployment

Phantom Coach deploys to Google Cloud Run as a single containerized service (backend serves the built frontend).

Automated Deployment

A single-command deployment script is included:

bash deploy.sh

What deploy.sh does:

  1. Builds the React frontend (npm run build)
  2. Copies the build output into the backend directory
  3. Submits the Docker image to Cloud Build (gcloud builds submit)
  4. Deploys to Cloud Run with the included environment variables
  5. Frontend and backend are now fully deployed!

See deploy.sh for the full deployment automation script.


Project Structure

Phantom_Coach/
├── frontend/                  # React + Vite + TypeScript
│   └── src/
│       ├── components/        # UI: VideoPlayer, TacticalBoard, GhostOverlay, CommandCenter, etc.
│       ├── services/          # MultimodalStreamer (WebSocket audio/video streaming)
│       ├── store/             # Zustand state management
│       ├── context/           # Firebase Auth context
│       └── types/             # TypeScript definitions
├── backend/                   # FastAPI + Python
│   ├── api/                   # REST + WebSocket route handlers
│   ├── intelligence/          # GeminiLiveClient, AgentStateMachine, EventBus, tool declarations
│   ├── agents/                # VisionTrackingAgent, StandardizerEngine, TacticalAnalysisAgent
│   ├── services/              # CV (tracker, calibrator, classifier), Tactical (semantics, xT, indexing)
│   ├── main.py                # FastAPI application entrypoint
│   ├── Dockerfile             # Production container
│   └── requirements.txt
├── assets/                    # Architecture diagram (.mmd + .png)
├── deploy.sh                  # Automated Cloud Run deployment
└── README.md

License

MIT

About

Gemini Live Agent Hackathon

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors