End-to-end reference implementation of a capacity planning system that:
- clusters workload patterns from telemetry (heterogeneous DB fleet)
- predicts baseline and peak compute needs via quantile boosted-tree models (P50/P95)
- adds explainability + drift-aware retraining to reduce overprovisioning and prevent slowdowns
python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
python scripts/generate_synthetic.py --days 90 --n_dbs 120
python scripts/train.py
python scripts/predict.py --input data/raw/telemetry.csv --output reports/predictions.csv
python scripts/drift_check.py --maybe_retrainSee docs/ for overview + schema + extension ideas.
MIT
See: docs/design_decisions.md
After training, generate SHAP explanations:
python scripts/explain_shap.py --quantile 0.95Artifacts: reports/shap/
Generate plots:
python scripts/plot_metrics.pyArtifacts: reports/plots/
These images are generated after running:
python scripts/train.py python scripts/plot_metrics.py python scripts/explain_shap.py --quantile 0.95





