A complete MLOps platform built from scratch — experiment tracking, model registry, containerized serving, Kubernetes deployment, CI/CD pipeline, and monitoring.
Developer pushes code
↓
GitHub Actions (CI/CD)
├── Train model (MLflow tracked)
├── Register model (MLflow Registry)
├── Build Docker image (versioned)
└── Verify predictions
↓
Docker image on ECR/DockerHub
↓
Kubernetes deployment (Minikube/GKE/EKS)
├── 2 replicas
├── Health + readiness probes
└── FastAPI serving predictions
↓
Prometheus scraping /metrics
↓
Grafana dashboards
| Component | Tool |
|---|---|
| Experiment Tracking | MLflow |
| Model Registry | MLflow Registry |
| Model Serving | FastAPI + Uvicorn |
| Containerization | Docker |
| Container Registry | ECR / DockerHub |
| Infrastructure as Code | Terraform |
| Orchestration | Kubernetes |
| CI/CD | GitHub Actions |
| Metrics | Prometheus |
| Dashboards | Grafana |
| Cloud | AWS (S3, ECR) |
mlops-lab/
├── .github/
│ └── workflows/
│ └── mlops-pipeline.yml # CI/CD pipeline
├── manifests/
│ ├── app/
│ │ ├── deployment.yaml # K8s deployment
│ │ └── service.yaml # K8s service
│ ├── monitoring/
│ │ └── servicemonitor.yaml # Prometheus scrape config
│ └── namespaces/
│ ├── mlops.yaml # mlops namespace
│ └── monitoring.yaml # monitoring namespace
├── scripts/
│ ├── train.py # Train + track with MLflow
│ ├── register_model.py # Register best model
│ ├── serve.py # FastAPI serving API
│ ├── save_model.py # Save model to local file
│ ├── local-start.sh # Start local environment
│ └── local-stop.sh # Stop local environment
├── terraform/
│ ├── local/ # Minikube + K8s + monitoring
│ │ ├── main.tf
│ │ ├── k8s.tf
│ │ └── monitoring.tf
│ └── aws/ # S3 + ECR + IAM
│ ├── main.tf
│ ├── variables.tf
│ └── aws.tf
├── Dockerfile # Container definition
├── requirements.txt # Full dev dependencies
└── requirements_serve.txt # Minimal serving dependencies
1. Clone repository:
git clone [email protected]:atharvspathak/mlops-lab.git
cd mlops-lab2. Create Python virtual environment:
cd terraform/local
terraform init
terraform apply3. Start Minikube:
minikube start --driver=docker --cpus=2 --memory=20484. Deploy to K8s:
cd terraform/local
terraform applyStart everything:
# Full command
bash scripts/local-start.sh
# With alias (add to ~/.bashrc first)
mlops-startActivate Python environment:
# Full command
source ~/mlops-lab/venv/bin/activate && cd ~/mlops-lab
# With alias
mlopsStop everything:
# Full command
bash scripts/local-stop.sh
# With alias
mlops-stopAdd to ~/.bashrc for convenience:
alias mlops='source ~/mlops-lab/venv/bin/activate && cd ~/mlops-lab'
alias mlops-start='~/mlops-lab/scripts/local-start.sh'
alias mlops-stop='~/mlops-lab/scripts/local-stop.sh'
alias tf='terraform'- WSL2 (Ubuntu 22.04)
- Docker
- Minikube
- kubectl
- Terraform
- Python 3.11
Start everything:
mlops-startActivate Python environment:
mlopsTrain a model:
python scripts/train.pyRegister best model:
python scripts/register_model.pyTest prediction:
curl -X POST http://localhost:8080/predict \
-H "Content-Type: application/json" \
-d '{"sepal_length": 5.1, "sepal_width": 3.5, "petal_length": 1.4, "petal_width": 0.2}'Stop everything:
mlops-stop| Service | URL | Credentials |
|---|---|---|
| iris-serving API | http://localhost:8080 | - |
| API Docs (Swagger) | http://localhost:8080/docs | - |
| MLflow UI | http://localhost:5000 | - |
| Grafana | http://localhost:3000 | admin / mlops123 |
| Prometheus | http://localhost:9090 | - |
Push to main branch triggers:
- Train model + track with MLflow
- Register best model
- Build Docker image tagged with
{run_number}-{commit_sha} - Push to DockerHub
- Health check + prediction verification
cd terraform/aws
terraform init
terraform applyProvisions:
- S3 bucket for model artifacts
- ECR repository for Docker images
- IAM user for GitHub Actions
GET /healthPOST /predict
{
"sepal_length": 5.1,
"sepal_width": 3.5,
"petal_length": 1.4,
"petal_width": 0.2
}Response:
{
"species_id": 0,
"species_name": "setosa",
"confidence": 0.9985
}Prometheus scrapes /metrics from iris-serving pods every 15s.
Key metrics:
http_requests_total— total requests by endpointhttp_request_duration_seconds— request latencypython_gc_objects_collected_total— GC stats
Atharv Pathak — DevOps → MLOps transition