This repository implements an end-to-end MLOps pipeline for predicting student burnout using Survival Analysis. The system processes student interaction logs from the OULAD dataset, trains a Cox Proportional Hazards model, and serves predictions via a decoupled architecture on Google Cloud Platform (Vertex AI and App Engine).
Unlike traditional classification which predicts a binary "dropout" state, this project treats burnout as a time-to-event problem.
-
Target Variables:
-
time: Duration (in weeks) from registration to withdrawal or the end of the observation period. -
event: Binary indicator (1 for withdrawal, 0 for censored).
-
-
Modeling Choice: Cox Proportional Hazards (Cox PH). This semi-parametric model handles right-censored data and provides interpretable coefficients (hazard ratios) for behavioral features.
The pipeline aggregates granular activity logs from the Open University Learning Analytics Dataset (OULAD) at the student level.
| Feature | Description |
|---|---|
total_clicks |
Overall digital engagement (VLE interactions) |
avg_clicks |
Average clicks per day |
std_clicks |
Variance in activity (identifies erratic "cramming" patterns) |
active_weeks |
Consistency of portal interaction |
mean_score |
Average academic performance across assessments |
late_submissions |
Behavioral markers of struggle or procrastination |
assessments_attempted |
Total volume of evaluated work |
-
Temporal Aggregation: Converts daily interaction dates into relative weekly offsets.
-
Activity Variance: Calculates standard deviation of clicks to identify erratic study patterns.
-
Submission Logic: Compares
date_submittedvsdate(deadline) to flag late assessments. -
Censoring Calculation: Determines "survival" time based on
date_unregistrationor registration offsets.
The model is implemented using the lifelines library.
-
Training (
training/train_model.py): Fits theCoxPHFitterwith an L2 penalizer (0.1) for regularization. -
Evaluation (
training/evaluate_model.py): Uses the Concordance Index (C-index) to measure the model's ability to correctly rank students by risk.- Performance: ~0.776 C-index.
-
Artifacts: The trained model is serialized using
joblibintomodel.pkl.
The system uses a decoupled architecture to separate inference logic from the user interface.
A custom FastAPI server provides the prediction interface within a Docker environment:
-
Endpoint
/predict: Accepts student feature instances and returns partial hazard scores. -
Endpoint
/health: Required for Vertex AI liveness checks. -
Runtime:
uvicornserving the FastAPI app on port 8080.
-
Registry: The container image is pushed to Google Artifact Registry.
-
Model Registry: Uploads the model artifact and defines serving routes.
-
Endpoint Deployment: Deploys the model to an
n1-standard-2instance for low-latency inference.
The frontend is a Flask application deployed on Google App Engine (Standard Environment).
-
Backend (
main.py): Uses the Vertex AI Python SDK to communicate with the deployed endpoint. -
Logic:
-
Collects features from the UI.
-
Casts inputs to float to prevent serialization errors.
-
Calls the Vertex AI Endpoint.
-
Banding: Maps raw hazard scores to risk categories:
-
HIGH:
$\text{Hazard} \ge 1.2$ -
MEDIUM:
$0.7 \le \text{Hazard} < 1.2$ -
LOW:
$\text{Hazard} < 0.7$
-
-
-
Configuration: Scaled automatically with a maximum of 2 instances via
app.yaml.
.
├── app_engine/ # GAE Service (Flask UI + SDK Client)
│ ├── main.py # App logic and Vertex AI proxy
│ ├── app.yaml # App Engine configuration
│ └── templates/ # HTML/CSS frontend
├── deployment/ # MLOps Layer (Vertex AI Endpoint)
│ ├── app/main.py # FastAPI inference wrapper
│ ├── Dockerfile # Container definition
│ └── deploy.py # Vertex AI deployment script
├── training/ # Research & Model Development
│ ├── build_survival_dataset.py
│ ├── train_model.py
│ └── evaluate_model.py
├── artifacts/ # Processed data and model binaries
└── README.md
-
Preprocessing: Run
python training/build_survival_dataset.py. -
Training: Run
python training/train_model.py. -
Inference Deployment:
-
Build and push Docker image to Artifact Registry.
-
Execute
python deployment/deploy.py.
-
-
UI Deployment:
-
Update
ENDPOINT_IDinapp_engine/main.py. -
Run
gcloud app deployfrom theapp_engine/directory.
-