BerkeleyBets: Multi-Sport Player Performance Prediction Platform

💡 Inspiration

Inspired by the presence of sports betting in the lives of many young adults. BerkeleyBets seeks to bring a betting experience that is not only fun but safe and risk-free to the population at UC Berkeley. The sports betting world is dominated by gut feelings and systems with artificially inflated accuracy metrics. We wanted to build a sports prediction system that actually works by prioritizing statistical integrity over impressive-looking numbers.

🏗️ What We Built

BerkeleyBets is a full-stack platform that predicts individual player performance across the NBA, NFL, and MLB featuring:

  • 15 position-specific ML models across three sports
  • Temporal validation framework preventing data leakage
  • Real-time prediction API serving player projections
  • Interactive React interface with detailed player profiles
  • Cross-sport unified architecture

🛠️ How We Built It

Backend: Python with scikit-learn, pandas, TimeSeriesSplit for temporal validation, Node.js/Express API Frontend: React 18, Vite, responsive CSS design ML Pipeline: RandomForestRegressor models, joblib serialization, rolling averages instead of season totals Data Integrity: Chronological train/test splits, verification scripts, comprehensive temporal validation

🚧 Challenges We Faced

The Data Leakage Crisis

Our biggest challenge: NBA models producing perfect predictions because they were returning known season averages instead of making genuine predictions. We completely rebuilt with proper temporal validation and verification scripts.

Cross-Sport Complexity

Each sport has different statistics and game structures. We built sport-specific feature engineering while maintaining a unified API architecture.

Model Performance vs. Reality

Balancing accuracy with realistic expectations. We focused on proper uncertainty quantification rather than chasing perfect metrics.

📚 What We Learned

  • Statistical integrity matters more than impressive metrics — R² of 0.65 with no leakage beats R² of 0.95 with future data
  • Position-specific modeling is essential — point guards and centers have fundamentally different performance drivers
  • Engineering practices matter in ML — proper testing and verification saved us from deploying broken models
  • User experience in analytics — complex predictions need intuitive presentation

🎯 What's Next

Real-time data integration, advanced features (weather, injuries), betting recommendations, mobile optimization, and performance monitoring with automated retraining.

🏆 Technical Achievements

  • Zero data leakage across all models
  • Sub-200ms API responses
  • Realistic player rankings (Ohtani #1, Judge/Soto top tier)
  • Scalable architecture ready for production

BerkeleyBets proves that reliable sports prediction requires more than advanced algorithms — it demands rigorous statistical practices and solid engineering.

Built With

Share this project:

Updates