This project implements a CI/CD/CT (Continuous Integration, Continuous Deployment, Continuous Training) pipeline using Azure Web App, FastAPI, MLflow, DVC, and Azure Blob Storage. It automates data versioning, model training, experiment tracking, and model deployment
-
CI/CT/CD:
- CI (Continuous Integration): Automates code testing and validation on each commit
- CT (Continuous Training): Triggers model retraining when new data is available
- CD (Continuous Deployment): Deploys the latest trained model to production automatically
-
Storage:
- Azure Blob Storage:
Raw/→ Stores incoming raw dataData Versions/→ Logs and stores training, testing, and manipulated data using DVCModel Artifacts/→ Stores model artifacts (model.pkl,best_params.json, etc) via MLflow
- Azure SQL Server & DB: Stores model metadata (metrics, params, name, duration, timestamp)
- Azure Blob Storage:
-
Model Tracking & Versioning:
- DVC: Manages data versioning
- MLflow: Tracks models, logs artifacts & metadata
- MLflow UI: Deployed separately on Azure Web App
-
Automation & Security:
- Docker: Containerizes final serving block of code
- GitHub Actions: Automates workflows
- Azure Key Vault & GitHub Secrets: Secures credentials
- Azure Functions & Event Triggers: Detects raw data changes to trigger pipeline
- Azure Web App - Hosts FastAPI services for model deployment.
- Azure Blob Storage - Stores raw data, processed datasets, and model artifacts.
- Azure SQL Server & DB - Stores model metadata (metrics, parameters, training history).
- Azure Functions & Event Triggers - Automates training when new data is added.
- Python - Core programming language for data processing, training, and inference.
- FastAPI - Serves the trained model via API.
- MLflow - Logs models, artifacts, and experiment metadata.
- DVC (Data Version Control) - Manages dataset versions.
- GitHub Actions - Automates CI/CD pipelines.
- Docker - Containerizes ML services (FastAPI, MLflow UI).
- Azure Key Vault - Manages credentials securely.
- MLflow UI - Deployed on Azure Web App to track model experiments. Here its source: https://github.com/Senan25/mlflow_track_server
- Azure Monitor - Logs application events and metrics.
- YAML - Stores constants, parameters, and pipeline configurations.
- GitHub Secrets - Stores sensitive credentials securely.