Skip to content

tejashrikelhe/ML-based-database-capacity-planning

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ML Capacity Planning for Heterogeneous Databases (Telemetry -> Clusters -> P50/P95)

End-to-end reference implementation of a capacity planning system that:

  1. clusters workload patterns from telemetry (heterogeneous DB fleet)
  2. predicts baseline and peak compute needs via quantile boosted-tree models (P50/P95)
  3. adds explainability + drift-aware retraining to reduce overprovisioning and prevent slowdowns

Quickstart

python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt

python scripts/generate_synthetic.py --days 90 --n_dbs 120
python scripts/train.py
python scripts/predict.py --input data/raw/telemetry.csv --output reports/predictions.csv
python scripts/drift_check.py --maybe_retrain

Docs

See docs/ for overview + schema + extension ideas.

License

MIT

Architecture

Architecture Diagram

Demo Pipeline

Demo GIF

Design Decisions

See: docs/design_decisions.md

Explainability (SHAP)

After training, generate SHAP explanations:

python scripts/explain_shap.py --quantile 0.95

Artifacts: reports/shap/

Metrics Visualization

Generate plots:

python scripts/plot_metrics.py

Artifacts: reports/plots/

Example Outputs (After Training)

Metrics

MAE by Quantile Pred vs True

SHAP Explainability

SHAP Summary SHAP Waterfall

These images are generated after running:

python scripts/train.py
python scripts/plot_metrics.py
python scripts/explain_shap.py --quantile 0.95

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages