Ridge regression hyperparameter sweep (log-spaced
- Loads the Diabetes regression dataset from
scikit-learn - Splits data by order: first 90% train, last 10% validation
- Trains and evaluates:
- Linear regression (closed-form) on original features
- Linear regression (closed-form) on polynomial features (degree-2 cross terms + originals)
- Ridge regression (closed-form) over a sweep of 50
$\lambda$ values on both feature sets
- Saves two plots:
outputs/figures/ridge_mse_original_features.pngoutputs/figures/ridge_mse_poly_features.png
- Selects the best ridge model by lowest validation MSE
- Uses the best polynomial ridge model to predict disease progression for a provided patient profile (Exercise 3.9 style)
GitHub Markdown doesn’t reliably support image resizing via , so these are embedded with HTML and a fixed width.
python -m venv .venv
# macOS/Linux
source .venv/bin/activate
# Windows (PowerShell)
.\.venv\Scripts\Activate.ps1pip install -r requirements.txtpython scripts/run_all.pyThis will:
- print train/validation MSEs for linear and ridge models
- save the two plot images into
outputs/figures/ - write a metrics summary JSON to
outputs/metrics/run_summary.json
python scripts/predict_patient.pyAfter running scripts/run_all.py, you should see:
outputs/figures/ridge_mse_original_features.pngoutputs/figures/ridge_mse_poly_features.pngoutputs/metrics/run_summary.json(created/overwritten each run)
-
Closed-form solutions:
- Linear regression uses a pseudo-inverse for stability.
- Ridge regression uses
np.linalg.solveon the normal equations with ridge penalty.
-
Intercept regularization:
- By default, the intercept term is not regularized (
regularize_intercept=False), which is the common ridge convention.
- By default, the intercept term is not regularized (
-
Polynomial features:
- The feature map includes all pairwise products
$x_i x_j$ for$i<j$ , and then appends the original features.
- The feature map includes all pairwise products
-
Scaling for the patient example:
- The provided patient vector is in the original feature scale.
- The script reproduces the scaling used by
sklearn.datasets.load_diabetes(scaled=True)with this feature-wise transform:
- Scaling parameters are computed from
load_diabetes(scaled=False).
This project uses the Diabetes dataset shipped with scikit-learn:
MIT (see LICENSE).

