This repository is part of my structured learning path in Machine Learning, where I aim to understand not just how algorithms work, but why they work — by diving deep into the mathematics, logic, and code implementation of classification methods like Decision Trees and K-Nearest Neighbors (KNN).
🔍 Mathematical Foundations:
- Entropy
- Gini Impurity
- Information Gain
- Euclidean Distance
🧠 Machine Learning Concepts:
- Splitting criteria and tree growth
- Lazy vs eager learning
- Overfitting and generalization in classification
⚙️ Algorithms Implemented:
- Decision Tree Classifier (custom and
sklearn) - K-Nearest Neighbors (KNN)
- Data preprocessing, model evaluation & visualization
Classification-Decision-Trees-and-KNN/
├── dataset.csv # Sample dataset used for training/testing
├── decision_tree.py # Custom Decision Tree classifier implementation
├── knn.py # K-Nearest Neighbors classifier implementation
├── utils.py # Helper functions (entropy, gini, info gain, etc.)
├── notebook.ipynb # Jupyter Notebook with explanation & visualizations
├── requirements.txt # List of Python dependencies
└── README.md # Project documentation- Python 3
- NumPy & Pandas — data manipulation
- Matplotlib & Seaborn — visualization
- Scikit-learn — for model comparison & validation
The notebook includes:
- Decision boundaries for KNN
- Feature splits in Decision Trees
- Comparative accuracy metrics
- Confusion matrices
This project helped me:
- Grasp the intuition behind classification algorithms
- Understand how mathematical metrics guide decisions
- Reinforce programming skills by writing logic from scratch
- Use ML libraries with confidence and clarity
- Clone the repo:
git clone https://github.com/Yeeyash/Classification-Decision-Trees-and-KNN.git
cd Classification-Decision-Trees-and-KNN- Install dependencies:
requirments.txt
numpy
pandas
matplotlib
seaborn
scikit-learn
jupyter
pip install -r requirements.txt- Launch the notebook:
jupyter notebook Classification_DecisionTrees_and_KNN.ipynbLinkedin: Yash Ghansham Thakare