MLOps Premium Prediction Project

A complete end-to-end MLOps solution for insurance premium prediction using segmented regression models with Kubernetes orchestration.

📋 Table of Contents

Overview
Architecture
Features
Installation
Usage
Project Structure
Development
Deployment
Monitoring
Troubleshooting
License
Acknowledgments

🔍 Overview

This project implements a comprehensive MLOps workflow for predicting insurance premiums. The system uses a segmented approach, training separate models for different premium ranges to improve prediction accuracy. The entire pipeline is containerized and can be deployed using Kubernetes, with monitoring, experiment tracking, and model versioning built-in.

✨ Why Segmented Regression?

Insurance premiums often exhibit different patterns across price ranges. By training specialized models for each segment (e.g., very low, low, medium, high, very high premiums), we can achieve better prediction accuracy compared to a single model approach.

🏗️ Architecture

The system consists of several interconnected components forming a complete MLOps pipeline:

                                            ┌───────────────┐
                                            │  Data Sources │
                                            └───────┬───────┘
                                                   │
┌───────────────────────────────────────────────┐ │ ┌───────────────────────┐
│                  Data Pipeline                 │ │ │                       │
│ ┌─────────────┐  ┌────────────┐  ┌──────────┐ │◄┘ │    MLflow Tracking    │
│ │ Data Ingest │─►│Validation & │─►│ Feature  │ │   │ ┌──────────────────┐ │
│ │  (DVC)      │  │Preprocessing│  │Engineering│ │   │ │  Experiments     │ │
│ └─────────────┘  └────────────┘  └──────────┘ │   │ └──────────────────┘ │
└─────────────────────┬─────────────────────────┘   │ ┌──────────────────┐ │
                      │                             │ │  Parameters       │ │
                      ▼                             │ └──────────────────┘ │
┌────────────────────────────────────────────┐     │ ┌──────────────────┐ │
│           Model Training Pipeline           │     │ │  Metrics         │ │
│ ┌────────┐  ┌────────┐  ┌────────────────┐ │     │ └──────────────────┘ │
│ │Segment │  │ Model  │  │Feature Selection│ │────► ┌──────────────────┐ │
│ │Creation│─►│Training│─►│& Evaluation     │ │     │ │  Artifacts       │ │
│ └────────┘  └────────┘  └────────────────┘ │     │ └──────────────────┘ │
└────────────────────────┬───────────────────┘     └───────────────────────┘
                         │
                         ▼
┌─────────────────────────────────────────────┐    ┌───────────────────────┐
│             Model Registry                   │    │                       │
│ ┌─────────────┐  ┌──────────┐  ┌──────────┐ │    │      Monitoring       │
│ │ Model       │  │Versioning│  │Deployment│ │    │ ┌──────────────────┐ │
│ │ Artifacts   │─►│& Metadata│─►│Approval  │ │    │ │  Data Drift      │ │
│ └─────────────┘  └──────────┘  └──────────┘ │    │ └──────────────────┘ │
└────────────────────────┬────────────────────┘    │ ┌──────────────────┐ │
                         │                         │ │  Model            │ │
                         ▼                         │ │  Performance      │ │
┌────────────────────────────────────────────┐     │ └──────────────────┘ │
│             Serving Infrastructure          │     │ ┌──────────────────┐ │
│ ┌───────────┐  ┌──────────┐  ┌───────────┐ │     │ │  System          │ │
│ │ FastAPI   │  │Kubernetes│  │Prometheus │ │────► │  Metrics          │ │
│ │ Endpoints │─►│Deployment│─►│& Grafana  │ │     │ └──────────────────┘ │
│ └───────────┘  └──────────┘  └───────────┘ │     └───────────────────────┘
└────────────────────────────────────────────┘

🌟 Features

Segmented Modeling: Trains specialized models for different premium ranges to improve accuracy
Automated Pipeline: Complete data-to-deployment workflow with minimal manual intervention
Experiment Tracking: MLflow integration for tracking model parameters, metrics, and artifacts
Containerized Deployment: Docker containers for consistent, reproducible deployment
Kubernetes Orchestration: Scalable and resilient deployment with Kubernetes
Monitoring & Alerting: Prometheus and Grafana for real-time monitoring of model and system performance
Model Registry: Versioning for models with comparison and approval workflows
CI/CD Ready: Infrastructure for continuous integration and deployment

🚀 Installation

Prerequisites

Python 3.9+
Docker and Docker Compose
Kubernetes (Docker Desktop with Kubernetes enabled or a separate cluster)
Git

Quick Setup

# Clone the repository
git clone https://github.com/your-username/premium-prediction-mlops.git
cd premium-prediction-mlops

# Run setup script
chmod +x setup.sh
./setup.sh

# Activate virtual environment
source venv/bin/activate

Manual Setup

If you prefer to set up the project manually:

# Create virtual environment
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install dependencies
pip install -r requirements.txt

# Create required directories
mkdir -p data/{raw,processed} models logs artifacts

# Initialize DVC
dvc init

📊 Usage

Data Preparation

# Place training data in data/raw directory
# Track it with DVC
dvc add data/raw/train_data.csv
git add data/raw/train_data.csv.dvc
git commit -m "Add training data"

Training Models

# Edit configuration in config/config.yaml if needed
# Run training
python -m src.train

# View experiment results in MLflow
mlflow ui
# Open http://localhost:5000 in your browser

Local Deployment

# Build and start services with Docker Compose
docker-compose up -d

# API documentation available at
# http://localhost:8000/docs

Making Predictions

# Single prediction
curl -X POST "http://localhost:8000/predict" \
     -H "Content-Type: application/json" \
     -d '{
        "features": {
            "Age": 35,
            "Vehicle_Age": 5,
            "Credit_Score": 720,
            "Annual_Income": 65000,
            "Previous_Claims": 1,
            "Insurance_Duration": 3
        }
     }'

📁 Project Structure

premium-prediction-mlops/
├── config/                # Configuration files
│   └── config.yaml        # Main configuration
├── data/                  # Data directory (managed by DVC)
│   ├── raw/               # Raw data files
│   └── processed/         # Processed data files
├── kubernetes/            # Kubernetes deployment manifests
├── models/                # Trained model storage
├── notebooks/             # Jupyter notebooks for exploration
├── src/                   # Source code
│   ├── api/               # API service
│   │   └── app.py         # FastAPI application
│   ├── data/              # Data processing code
│   │   └── data_processor.py
│   └── models/            # Model-related code
│       ├── model_trainer.py
│       └── model_predictor.py
├── tests/                 # Unit and integration tests
├── Dockerfile             # Docker container definition
├── docker-compose.yml     # Local deployment configuration
├── requirements.txt       # Python dependencies
├── setup.sh               # Setup script
└── README.md              # Project documentation

🧑‍💻 Development

Development Workflow

Create a feature branch:

git checkout -b feature/your-feature-name

Make your changes and run tests:
```
pytest tests/
```
Format and lint your code:
```
make format
make lint
```

Commit your changes:

git commit -m "Add meaningful commit message"

Push to your fork and create a pull request

Using the Makefile

The Makefile in the root directory provides a set of commands to automate and standardize common development, testing, deployment, and operational tasks. It acts as a central control panel for the project, encapsulating complex command sequences into simple, memorable targets (make <command>). This promotes consistency, reduces errors, and makes it easier for developers to interact with the various stages of the MLOps lifecycle.

Key commands include:

make setup: Set up the development environment.
make format: Format code using black and isort.
make lint: Run linters (flake8, black, isort).
make test: Run the test suite.
make train: Train the models.
make serve: Run the API server locally.
make docker-build: Build the Docker image.
make docker-run: Run the Docker container locally.
make deploy: Deploy the application to Kubernetes.
make monitor: Deploy monitoring components.
make clean: Clean up generated files.
make help: Show all available commands.

Refer to the Makefile itself for the full list and details.

Code Style

Follow PEP 8 guidelines
Use type hints
Write docstrings for all functions and classes
Include unit tests for new functionality

🌐 Deployment

Kubernetes Deployment

# Apply Kubernetes manifests
kubectl apply -f kubernetes/namespace.yaml
kubectl apply -f kubernetes/configmap.yaml
kubectl apply -f kubernetes/deployment.yaml
kubectl apply -f kubernetes/service.yaml
kubectl apply -f kubernetes/ingress.yaml
kubectl apply -f kubernetes/monitoring.yaml

# Check deployment status
kubectl get all -n mlops-premium

Production Considerations

Use proper secrets management
Configure TLS certificates for HTTPS
Implement authentication for the API
Set appropriate resource limits
Configure backup and recovery procedures

📊 Monitoring

Key Metrics to Monitor

Model Performance: RMSE, MAE, R² for each segment
Prediction Latency: Response time for predictions
Data Drift: Changes in feature distributions
System Metrics: CPU, memory, network usage

Accessing Monitoring Dashboards

# Port-forward Prometheus
kubectl port-forward -n mlops-premium svc/prometheus-service 9090:9090

# Port-forward Grafana
kubectl port-forward -n mlops-premium svc/grafana-service 3000:3000

# Access in browser
# Prometheus: http://localhost:9090
# Grafana: http://localhost:3000 (default: admin/admin)

Model Retraining

Trigger model retraining when:

Data drift is detected
Model performance degrades
On a regular schedule (e.g., monthly)

python -m src.train --log-artifacts --register-model

🔧 Troubleshooting

Common Issues

API Issues: Check logs with docker-compose logs premium-api or kubectl logs -n mlops-premium deployment/premium-model-api
Training Issues: Check logs in the logs/ directory
Kubernetes Issues: Use kubectl describe pod -n mlops-premium <pod-name> for detailed information

For more detailed troubleshooting, refer to the CHECKLIST.md and QUICKSTART.md files.

📜 License

MIT License

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
artifacts		artifacts
config		config
data		data
deploy		deploy
grafana/provisioning		grafana/provisioning
images		images
kubernetes		kubernetes
logs		logs
mlruns		mlruns
models		models
prometheus		prometheus
src		src
tests		tests
.DS_Store		.DS_Store
.coverage		.coverage
.dockerignore		.dockerignore
.dvcignore		.dvcignore
.gitattributes		.gitattributes
.gitignore		.gitignore
CHECKLIST.md		CHECKLIST.md
Dockerfile		Dockerfile
LICENSE		LICENSE
Makefile		Makefile
QUICKSTART.md		QUICKSTART.md
README.md		README.md
RELEASE_NOTES.md		RELEASE_NOTES.md
demp.py		demp.py
docker-compose.yml		docker-compose.yml
kubernetes_readme.md		kubernetes_readme.md
local-grafana-pv.yaml		local-grafana-pv.yaml
local-pv.yaml		local-pv.yaml
pytest.ini		pytest.ini
requirements.txt		requirements.txt
setup.sh		setup.sh
setup_log.txt		setup_log.txt
tox.ini		tox.ini

Folders and files

Latest commit

History

Repository files navigation

MLOps Premium Prediction Project

📋 Table of Contents

🔍 Overview

✨ Why Segmented Regression?

🏗️ Architecture

🌟 Features

🚀 Installation

Prerequisites

Quick Setup

Manual Setup

📊 Usage

Data Preparation

Training Models

Local Deployment

Making Predictions

📁 Project Structure

🧑‍💻 Development

Development Workflow

Using the Makefile

Code Style

🌐 Deployment

Kubernetes Deployment

Production Considerations

📊 Monitoring

Key Metrics to Monitor

Accessing Monitoring Dashboards

Model Retraining

🔧 Troubleshooting

Common Issues

📜 License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages