Skip to content

kun101/student-burnout-prediction

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Student Burnout Early Intervention System

This repository implements an end-to-end MLOps pipeline for predicting student burnout using Survival Analysis. The system processes student interaction logs from the OULAD dataset, trains a Cox Proportional Hazards model, and serves predictions via a decoupled architecture on Google Cloud Platform (Vertex AI and App Engine).

1. Methodology: Survival Analysis

Unlike traditional classification which predicts a binary "dropout" state, this project treats burnout as a time-to-event problem.

  • Target Variables:

    • time: Duration (in weeks) from registration to withdrawal or the end of the observation period.

    • event: Binary indicator (1 for withdrawal, 0 for censored).

  • Modeling Choice: Cox Proportional Hazards (Cox PH). This semi-parametric model handles right-censored data and provides interpretable coefficients (hazard ratios) for behavioral features.

2. Data Engineering & Features

The pipeline aggregates granular activity logs from the Open University Learning Analytics Dataset (OULAD) at the student level.

Feature Set

Feature Description
total_clicks Overall digital engagement (VLE interactions)
avg_clicks Average clicks per day
std_clicks Variance in activity (identifies erratic "cramming" patterns)
active_weeks Consistency of portal interaction
mean_score Average academic performance across assessments
late_submissions Behavioral markers of struggle or procrastination
assessments_attempted Total volume of evaluated work

Preprocessing Pipeline (training/build_survival_dataset.py)

  1. Temporal Aggregation: Converts daily interaction dates into relative weekly offsets.

  2. Activity Variance: Calculates standard deviation of clicks to identify erratic study patterns.

  3. Submission Logic: Compares date_submitted vs date (deadline) to flag late assessments.

  4. Censoring Calculation: Determines "survival" time based on date_unregistration or registration offsets.

3. Model Implementation & Evaluation

The model is implemented using the lifelines library.

  • Training (training/train_model.py): Fits the CoxPHFitter with an L2 penalizer (0.1) for regularization.

  • Evaluation (training/evaluate_model.py): Uses the Concordance Index (C-index) to measure the model's ability to correctly rank students by risk.

    • Performance: ~0.776 C-index.
  • Artifacts: The trained model is serialized using joblib into model.pkl.

4. Inference Pipeline (MLOps)

The system uses a decoupled architecture to separate inference logic from the user interface.

Custom Prediction Container (deployment/)

A custom FastAPI server provides the prediction interface within a Docker environment:

  • Endpoint /predict: Accepts student feature instances and returns partial hazard scores.

  • Endpoint /health: Required for Vertex AI liveness checks.

  • Runtime: uvicorn serving the FastAPI app on port 8080.

Vertex AI Deployment (deployment/deploy.py)

  1. Registry: The container image is pushed to Google Artifact Registry.

  2. Model Registry: Uploads the model artifact and defines serving routes.

  3. Endpoint Deployment: Deploys the model to an n1-standard-2 instance for low-latency inference.

5. UI & Application Layer (app_engine/)

The frontend is a Flask application deployed on Google App Engine (Standard Environment).

  • Backend (main.py): Uses the Vertex AI Python SDK to communicate with the deployed endpoint.

  • Logic:

    1. Collects features from the UI.

    2. Casts inputs to float to prevent serialization errors.

    3. Calls the Vertex AI Endpoint.

    4. Banding: Maps raw hazard scores to risk categories:

      • HIGH: $\text{Hazard} \ge 1.2$

      • MEDIUM: $0.7 \le \text{Hazard} < 1.2$

      • LOW: $\text{Hazard} < 0.7$

  • Configuration: Scaled automatically with a maximum of 2 instances via app.yaml.

6. Repository Structure

.
├── app_engine/          # GAE Service (Flask UI + SDK Client)
│   ├── main.py          # App logic and Vertex AI proxy
│   ├── app.yaml         # App Engine configuration
│   └── templates/       # HTML/CSS frontend
├── deployment/          # MLOps Layer (Vertex AI Endpoint)
│   ├── app/main.py      # FastAPI inference wrapper
│   ├── Dockerfile       # Container definition
│   └── deploy.py        # Vertex AI deployment script
├── training/            # Research & Model Development
│   ├── build_survival_dataset.py
│   ├── train_model.py
│   └── evaluate_model.py
├── artifacts/           # Processed data and model binaries
└── README.md

7. Setup & Deployment

  1. Preprocessing: Run python training/build_survival_dataset.py.

  2. Training: Run python training/train_model.py.

  3. Inference Deployment:

    • Build and push Docker image to Artifact Registry.

    • Execute python deployment/deploy.py.

  4. UI Deployment:

    • Update ENDPOINT_ID in app_engine/main.py.

    • Run gcloud app deploy from the app_engine/ directory.

About

An end-to-end MLOps pipeline for predicting student burnout using Survival Analysis. The system processes student interaction logs from the OULAD dataset, trains a Cox Proportional Hazards model, and serves predictions via a decoupled architecture on Google Cloud Platform.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors