ML: From-Scratch Implementations & Deep Learning

A collection of machine learning algorithms built from first principles using NumPy, alongside deep learning implementations and fine-tuning experiments. This repository is designed to be a modular foundation for my learning and future model development. Books and Resources

Getting Started

Download the repo

git clone https://github.com/p-retrover/basic_machine_learning && cd basic_machine_learning

Environment Setup

Please make sure that you have Python3.14 installed on your system. If you're a linux/mac user, you can install from your package manager. If you're a windows user, you can either download it from microsoft store, or from a very nice package manager called scoop.

To install scoop, open your terminal and run
Set-ExecutionPolicy -ExecutionPolicy RemoteSigned -Scope CurrentUser
Invoke-RestMethod -Uri https://get.scoop.sh | Invoke-Expression
Then to install python3.14
scoop bucket add versions
scoop install python314
Then explicitly tell poetry to use the correct python version.
poetry env use python314
Then proceed below.

This project uses Poetry for consistent dependency management. To set up your local environment:

Install Dependencies:

poetry install

Register the Kernel: Run this command to link the virtual environment to your Jupyter interface:

poetry run python -m ipykernel install --user --name basic-ml-env --display-name "Python (Basic ML)"

Launch Jupyter:

poetry run jupyter lab

Select Kernel: Once in Jupyter Lab, open any .ipynb file and select "Python (Basic ML)" from the kernel dropdown menu (top right).

Note: To remove this kernel later, run:
poetry run jupyter kernelspec uninstall basic-ml-env
Also if you want to have virtual environments inside each of your project folder, you can set
poetry config virtualenvs.in-project true
If the project already has a Poetry env elsewhere:
poetry env remove --all
poetry install
This creates the env inside .venv/

PyTorch Configuration: CPU vs. GPU

By default, this repository is configured to use the CPU-only version of PyTorch.

Why CPU?

Compatibility: Ensures the project runs out-of-the-box on most systems..
Disk Space: The CPU binary is significantly smaller (~200MB) compared to the CUDA-enabled version (~5GB+), preventing unnecessary bloat in your .cache or .venv.
Efficiency: Since we use Transfer Learning (Freezing the backbone), training the final layer is fast enough on modern CPUs.

How to switch to CUDA (If you have an NVIDIA GPU): If you want to leverage hardware acceleration, you need to modify the pyproject.toml file:

Change the Source: Remove the pytorch-cpu source and the specific source constraints for torch and torchvision.
Update Dependencies:

# remove this:
[[tool.poetry.source]]
name = "pytorch-cpu"
url = "https://download.pytorch.org/whl/cpu"
priority = "explicit"

# and this:
[tool.poetry.dependencies]
torch = { version = "^2.10.0", source = "pytorch-cpu" }
torchvision = { version = "^0.25.0", source = "pytorch-cpu" }

Re-install:

poetry lock --no-update
poetry install

Running Online

For quick testing without local setup, you can upload the .ipynb files directly to Google Colab.

Dataset and Implementation Management

To keep the repository clean and lightweight, raw data files are excluded via .gitignore.

To test the implementations:

Navigate to the README.md file within the specific project directory, you'll get the instructions there.
Download the datasets from the provided links.
Create a data/ folder in that project's directory to keep the downloaded datasets.

Projects

Project Name	Description	Link
Fine-Tuning ResNet-18	Transfer learning on CIFAR-10 with PyTorch.	View Project
NN from Scratch	3-layer MLP implemented using pure NumPy.	View Project
KNN from Scratch	Distance-based classification on Fashion-MNIST.	View Project
Logistic Regression	Binary classification using Gradient Descent.	View Project

Future Roadmap

Classical Machine Learning

Principal Component Analysis (PCA) from scratch using Eigen-decomposition and Singular Value Decomposition (SVD).
Support Vector Machines (SVM) implementation using the Dual Lagrangian formulation.
Gaussian Mixture Models (GMM) and the Expectation-Maximization (EM) algorithm for soft clustering.
Decision Trees & Random Forests using Gini Impurity and Information Gain metrics.

Deep Learning & Computer Vision

Implementation of Neural Network backpropagation using only NumPy.
Development of CNN layers (Convolution, Pooling, Flatten) in pure NumPy to understand spatial feature extraction.
Generative Adversarial Networks (GANs): Training a simple DCGAN to generate handwritten digits or CIFAR-like images.
Implementation of Optimization Algorithms (Adam, RMSProp, and Adagrad) from scratch and comparing their convergence.

Natural Language Processing (NLP)

Word Embeddings: Implementing a Word2Vec (Skip-gram) model to visualize semantic vector spaces.
Attention Mechanism: Building the Scaled Dot-Product Attention block from scratch (the "math" heart of Transformers).
Transformer Fine-tuning: Leveraging Hugging Face transformers to fine-tune a BERT or DistilBERT model for Sentiment Analysis.
Parameter-Efficient Fine-Tuning (PEFT): Experimenting with LoRA or QLoRA for local LLM adaptation.

MLOps & Deployment

Containerization: Writing Dockerfiles for each app.py to ensure they run seamlessly on any Linux distribution.
CI/CD for ML: Using GitHub Actions to automate unit tests for NumPy matrix dimensions.
Hugging Face Integration: Creating a unified "Space" to host a gallery of all project interfaces.

AI Disclosure

This project was developed with the assistance of Gemini, and other LLMs.

How AI was utilized:

Architectural Guidance: Collaborating on the modular structure of the repository to ensure it remains scalable for future algorithm additions.
Debugging & Stability: AI was instrumental in identifying and resolving numerical instability issues (e.g., handling NaN values during Gradient Descent and implementing a numerically stable Sigmoid function).
Feature Enhancement: Assistance in implementing advanced visualization techniques, such as using Principal Component Analysis (PCA) to represent high-dimensional decision boundaries in 2D space.
Documentation: Streamlining README and report formatting to ensure technical clarity for external reviewers.

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
KNN_from_scratch		KNN_from_scratch
fine_tuning_pretrained_model		fine_tuning_pretrained_model
logistic_regression		logistic_regression
ml_notes		ml_notes
nn_from_scratch		nn_from_scratch
.gitignore		.gitignore
README.md		README.md
folder_structure.md		folder_structure.md
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ML: From-Scratch Implementations & Deep Learning

Getting Started

Environment Setup

PyTorch Configuration: CPU vs. GPU

Running Online

Dataset and Implementation Management

Projects

Future Roadmap

Classical Machine Learning

Deep Learning & Computer Vision

Natural Language Processing (NLP)

MLOps & Deployment

AI Disclosure

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

ML: From-Scratch Implementations & Deep Learning

Getting Started

Environment Setup

PyTorch Configuration: CPU vs. GPU

Running Online

Dataset and Implementation Management

Projects

Future Roadmap

Classical Machine Learning

Deep Learning & Computer Vision

Natural Language Processing (NLP)

MLOps & Deployment

AI Disclosure

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages