BerkeleyBets: Multi-Sport Player Performance Prediction Platform
💡 Inspiration
Inspired by the presence of sports betting in the lives of many young adults. BerkeleyBets seeks to bring a betting experience that is not only fun but safe and risk-free to the population at UC Berkeley. The sports betting world is dominated by gut feelings and systems with artificially inflated accuracy metrics. We wanted to build a sports prediction system that actually works by prioritizing statistical integrity over impressive-looking numbers.
🏗️ What We Built
BerkeleyBets is a full-stack platform that predicts individual player performance across the NBA, NFL, and MLB featuring:
- 15 position-specific ML models across three sports
- Temporal validation framework preventing data leakage
- Real-time prediction API serving player projections
- Interactive React interface with detailed player profiles
- Cross-sport unified architecture
🛠️ How We Built It
Backend: Python with scikit-learn, pandas, TimeSeriesSplit for temporal validation, Node.js/Express API Frontend: React 18, Vite, responsive CSS design ML Pipeline: RandomForestRegressor models, joblib serialization, rolling averages instead of season totals Data Integrity: Chronological train/test splits, verification scripts, comprehensive temporal validation
🚧 Challenges We Faced
The Data Leakage Crisis
Our biggest challenge: NBA models producing perfect predictions because they were returning known season averages instead of making genuine predictions. We completely rebuilt with proper temporal validation and verification scripts.
Cross-Sport Complexity
Each sport has different statistics and game structures. We built sport-specific feature engineering while maintaining a unified API architecture.
Model Performance vs. Reality
Balancing accuracy with realistic expectations. We focused on proper uncertainty quantification rather than chasing perfect metrics.
📚 What We Learned
- Statistical integrity matters more than impressive metrics — R² of 0.65 with no leakage beats R² of 0.95 with future data
- Position-specific modeling is essential — point guards and centers have fundamentally different performance drivers
- Engineering practices matter in ML — proper testing and verification saved us from deploying broken models
- User experience in analytics — complex predictions need intuitive presentation
🎯 What's Next
Real-time data integration, advanced features (weather, injuries), betting recommendations, mobile optimization, and performance monitoring with automated retraining.
🏆 Technical Achievements
- Zero data leakage across all models
- Sub-200ms API responses
- Realistic player rankings (Ohtani #1, Judge/Soto top tier)
- Scalable architecture ready for production
BerkeleyBets proves that reliable sports prediction requires more than advanced algorithms — it demands rigorous statistical practices and solid engineering.
Built With
- api
- espnapi
- javascript
- node.js
- npm
- python
- react18
- vite
Log in or sign up for Devpost to join the conversation.