Machine Learning: Classification and Regression Project

This project demonstrates a complete pipeline of implementing and analyzing classical machine learning models from scratch. It covers both classification (LDA, QDA) and regression (OLS, Ridge, Gradient-based Ridge, Polynomial Regression) tasks, using real and synthetic datasets.

📌 Overview

The goal of this project is to:

Implement key ML algorithms manually without relying on scikit-learn.
Compare generative classifiers (LDA vs QDA).
Analyze the effect of intercept terms and regularization in regression.
Study the bias–variance tradeoff via Ridge Regression.
Verify equivalence of closed-form vs gradient-based optimization.
Explore non-linear regression using polynomial feature mapping.

All code is in a single script.py file, with reusable functions for each problem.

⚙️ Methods Implemented

1. Linear Discriminant Analysis (LDA)

Learns per-class means and a shared covariance matrix.
Assumes equal covariance for all classes → linear boundaries.

2. Quadratic Discriminant Analysis (QDA)

Learns per-class means and individual covariance matrices.
Allows class-specific covariance → quadratic boundaries.

3. Ordinary Least Squares Regression (OLS)

Implemented with and without intercept.
Shows why bias terms drastically improve regression performance.

4. Ridge Regression (Closed-form)

Penalizes large weights to improve generalization.
Sweeps λ ∈ [0, 1] and identifies λ* that minimizes test MSE.

5. Ridge Regression (Gradient Descent)

Uses scipy.optimize.minimize with gradient computation.
Confirms equivalence to closed-form ridge regression.

6. Polynomial Regression

Maps a single feature to polynomial basis up to degree 6.
Compared with/without regularization (λ = λ* from Ridge).
Demonstrates overfitting in unregularized high-degree polynomials.

📊 Results Summary

Classification
- LDA Accuracy: 97%
- QDA Accuracy: 96%
- LDA uses linear boundaries; QDA allows curved boundaries.
Regression
- OLS without intercept → MSE ≈ 106,775 (very poor).
- OLS with intercept → MSE ≈ 3,708 (huge improvement).
- Ridge Regression optimal λ* = 0.06 (min test MSE).
- Gradient Descent Ridge curves overlap closed-form Ridge.
Polynomial Regression
- Without regularization: overfits as degree increases.
- With λ = 0.06: stable test MSE, best around degree p = 3.

📈 Figures

Figures generated by the script:

LDA vs QDA Decision Boundaries
Shows difference in classification regions.
Ridge Regression MSE vs λ (Closed-form)
Visualizes bias–variance tradeoff.
Ridge: Closed-form vs Gradient Descent
Demonstrates numerical equivalence.
Polynomial Regression with/without Regularization
Shows effect of λ on controlling overfitting.

🛠️ Tech Stack

Python 3.10
NumPy (linear algebra, array operations)
SciPy (optimization)
Matplotlib (plots & figures)

🚀 How to Run

# install dependencies
pip install numpy scipy matplotlib

# run the main script
python script.py

🎯 Key Takeaways

Classification: Both LDA and QDA work well; LDA slightly higher accuracy.
Intercept: Essential in regression for proper model fit.
Ridge Regularization: λ* ≈ 0.06 balances bias and variance, improving test error.
Optimization: Gradient descent Ridge matches closed-form solution, proving implementation correctness.
Polynomial Expansion: Without regularization → overfitting; with λ* → stable generalization.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
data		data
figures		figures
src		src
.DS_Store		.DS_Store
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Machine Learning: Classification and Regression Project

📌 Overview

⚙️ Methods Implemented

1. Linear Discriminant Analysis (LDA)

2. Quadratic Discriminant Analysis (QDA)

3. Ordinary Least Squares Regression (OLS)

4. Ridge Regression (Closed-form)

5. Ridge Regression (Gradient Descent)

6. Polynomial Regression

📊 Results Summary

📈 Figures

🛠️ Tech Stack

🚀 How to Run

🎯 Key Takeaways

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Machine Learning: Classification and Regression Project

📌 Overview

⚙️ Methods Implemented

1. Linear Discriminant Analysis (LDA)

2. Quadratic Discriminant Analysis (QDA)

3. Ordinary Least Squares Regression (OLS)

4. Ridge Regression (Closed-form)

5. Ridge Regression (Gradient Descent)

6. Polynomial Regression

📊 Results Summary

📈 Figures

🛠️ Tech Stack

🚀 How to Run

🎯 Key Takeaways

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages