A complete end-to-end MLOps solution for insurance premium prediction using segmented regression models with Kubernetes orchestration.
- Overview
- Architecture
- Features
- Installation
- Usage
- Project Structure
- Development
- Deployment
- Monitoring
- Troubleshooting
- License
- Acknowledgments
This project implements a comprehensive MLOps workflow for predicting insurance premiums. The system uses a segmented approach, training separate models for different premium ranges to improve prediction accuracy. The entire pipeline is containerized and can be deployed using Kubernetes, with monitoring, experiment tracking, and model versioning built-in.
Insurance premiums often exhibit different patterns across price ranges. By training specialized models for each segment (e.g., very low, low, medium, high, very high premiums), we can achieve better prediction accuracy compared to a single model approach.
The system consists of several interconnected components forming a complete MLOps pipeline:
βββββββββββββββββ
β Data Sources β
βββββββββ¬ββββββββ
β
βββββββββββββββββββββββββββββββββββββββββββββββββ β βββββββββββββββββββββββββ
β Data Pipeline β β β β
β βββββββββββββββ ββββββββββββββ ββββββββββββ βββ β MLflow Tracking β
β β Data Ingest βββΊβValidation & βββΊβ Feature β β β ββββββββββββββββββββ β
β β (DVC) β βPreprocessingβ βEngineeringβ β β β Experiments β β
β βββββββββββββββ ββββββββββββββ ββββββββββββ β β ββββββββββββββββββββ β
βββββββββββββββββββββββ¬ββββββββββββββββββββββββββ β ββββββββββββββββββββ β
β β β Parameters β β
βΌ β ββββββββββββββββββββ β
ββββββββββββββββββββββββββββββββββββββββββββββ β ββββββββββββββββββββ β
β Model Training Pipeline β β β Metrics β β
β ββββββββββ ββββββββββ ββββββββββββββββββ β β ββββββββββββββββββββ β
β βSegment β β Model β βFeature Selectionβ ββββββΊ ββββββββββββββββββββ β
β βCreationβββΊβTrainingβββΊβ& Evaluation β β β β Artifacts β β
β ββββββββββ ββββββββββ ββββββββββββββββββ β β ββββββββββββββββββββ β
ββββββββββββββββββββββββββ¬ββββββββββββββββββββ βββββββββββββββββββββββββ
β
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββ βββββββββββββββββββββββββ
β Model Registry β β β
β βββββββββββββββ ββββββββββββ ββββββββββββ β β Monitoring β
β β Model β βVersioningβ βDeploymentβ β β ββββββββββββββββββββ β
β β Artifacts βββΊβ& MetadataβββΊβApproval β β β β Data Drift β β
β βββββββββββββββ ββββββββββββ ββββββββββββ β β ββββββββββββββββββββ β
ββββββββββββββββββββββββββ¬βββββββββββββββββββββ β ββββββββββββββββββββ β
β β β Model β β
βΌ β β Performance β β
ββββββββββββββββββββββββββββββββββββββββββββββ β ββββββββββββββββββββ β
β Serving Infrastructure β β ββββββββββββββββββββ β
β βββββββββββββ ββββββββββββ βββββββββββββ β β β System β β
β β FastAPI β βKubernetesβ βPrometheus β ββββββΊ β Metrics β β
β β Endpoints βββΊβDeploymentβββΊβ& Grafana β β β ββββββββββββββββββββ β
β βββββββββββββ ββββββββββββ βββββββββββββ β βββββββββββββββββββββββββ
ββββββββββββββββββββββββββββββββββββββββββββββ
- Segmented Modeling: Trains specialized models for different premium ranges to improve accuracy
- Automated Pipeline: Complete data-to-deployment workflow with minimal manual intervention
- Experiment Tracking: MLflow integration for tracking model parameters, metrics, and artifacts
- Containerized Deployment: Docker containers for consistent, reproducible deployment
- Kubernetes Orchestration: Scalable and resilient deployment with Kubernetes
- Monitoring & Alerting: Prometheus and Grafana for real-time monitoring of model and system performance
- Model Registry: Versioning for models with comparison and approval workflows
- CI/CD Ready: Infrastructure for continuous integration and deployment
- Python 3.9+
- Docker and Docker Compose
- Kubernetes (Docker Desktop with Kubernetes enabled or a separate cluster)
- Git
# Clone the repository
git clone https://github.com/your-username/premium-prediction-mlops.git
cd premium-prediction-mlops
# Run setup script
chmod +x setup.sh
./setup.sh
# Activate virtual environment
source venv/bin/activateIf you prefer to set up the project manually:
# Create virtual environment
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
# Install dependencies
pip install -r requirements.txt
# Create required directories
mkdir -p data/{raw,processed} models logs artifacts
# Initialize DVC
dvc init# Place training data in data/raw directory
# Track it with DVC
dvc add data/raw/train_data.csv
git add data/raw/train_data.csv.dvc
git commit -m "Add training data"# Edit configuration in config/config.yaml if needed
# Run training
python -m src.train
# View experiment results in MLflow
mlflow ui
# Open http://localhost:5000 in your browser# Build and start services with Docker Compose
docker-compose up -d
# API documentation available at
# http://localhost:8000/docs# Single prediction
curl -X POST "http://localhost:8000/predict" \
-H "Content-Type: application/json" \
-d '{
"features": {
"Age": 35,
"Vehicle_Age": 5,
"Credit_Score": 720,
"Annual_Income": 65000,
"Previous_Claims": 1,
"Insurance_Duration": 3
}
}'premium-prediction-mlops/
βββ config/ # Configuration files
β βββ config.yaml # Main configuration
βββ data/ # Data directory (managed by DVC)
β βββ raw/ # Raw data files
β βββ processed/ # Processed data files
βββ kubernetes/ # Kubernetes deployment manifests
βββ models/ # Trained model storage
βββ notebooks/ # Jupyter notebooks for exploration
βββ src/ # Source code
β βββ api/ # API service
β β βββ app.py # FastAPI application
β βββ data/ # Data processing code
β β βββ data_processor.py
β βββ models/ # Model-related code
β βββ model_trainer.py
β βββ model_predictor.py
βββ tests/ # Unit and integration tests
βββ Dockerfile # Docker container definition
βββ docker-compose.yml # Local deployment configuration
βββ requirements.txt # Python dependencies
βββ setup.sh # Setup script
βββ README.md # Project documentation
-
Create a feature branch:
git checkout -b feature/your-feature-name
-
Make your changes and run tests:
pytest tests/
-
Format and lint your code:
make format make lint
-
Commit your changes:
git commit -m "Add meaningful commit message" -
Push to your fork and create a pull request
The Makefile in the root directory provides a set of commands to automate and standardize common development, testing, deployment, and operational tasks. It acts as a central control panel for the project, encapsulating complex command sequences into simple, memorable targets (make <command>). This promotes consistency, reduces errors, and makes it easier for developers to interact with the various stages of the MLOps lifecycle.
Key commands include:
make setup: Set up the development environment.make format: Format code using black and isort.make lint: Run linters (flake8, black, isort).make test: Run the test suite.make train: Train the models.make serve: Run the API server locally.make docker-build: Build the Docker image.make docker-run: Run the Docker container locally.make deploy: Deploy the application to Kubernetes.make monitor: Deploy monitoring components.make clean: Clean up generated files.make help: Show all available commands.
Refer to the Makefile itself for the full list and details.
- Follow PEP 8 guidelines
- Use type hints
- Write docstrings for all functions and classes
- Include unit tests for new functionality
# Apply Kubernetes manifests
kubectl apply -f kubernetes/namespace.yaml
kubectl apply -f kubernetes/configmap.yaml
kubectl apply -f kubernetes/deployment.yaml
kubectl apply -f kubernetes/service.yaml
kubectl apply -f kubernetes/ingress.yaml
kubectl apply -f kubernetes/monitoring.yaml
# Check deployment status
kubectl get all -n mlops-premium- Use proper secrets management
- Configure TLS certificates for HTTPS
- Implement authentication for the API
- Set appropriate resource limits
- Configure backup and recovery procedures
- Model Performance: RMSE, MAE, RΒ² for each segment
- Prediction Latency: Response time for predictions
- Data Drift: Changes in feature distributions
- System Metrics: CPU, memory, network usage
# Port-forward Prometheus
kubectl port-forward -n mlops-premium svc/prometheus-service 9090:9090
# Port-forward Grafana
kubectl port-forward -n mlops-premium svc/grafana-service 3000:3000
# Access in browser
# Prometheus: http://localhost:9090
# Grafana: http://localhost:3000 (default: admin/admin)Trigger model retraining when:
- Data drift is detected
- Model performance degrades
- On a regular schedule (e.g., monthly)
python -m src.train --log-artifacts --register-model- API Issues: Check logs with
docker-compose logs premium-apiorkubectl logs -n mlops-premium deployment/premium-model-api - Training Issues: Check logs in the
logs/directory - Kubernetes Issues: Use
kubectl describe pod -n mlops-premium <pod-name>for detailed information
For more detailed troubleshooting, refer to the CHECKLIST.md and QUICKSTART.md files.
MIT License