Skip to content

moisesdiaz-ds/tracking_models

Repository files navigation

Tracking models

Simple framework to save and track your model versions performance on the exploration and training phase

How it works

The framework only uses the save_model function and the ClassModelResults class in order to effectively and easily track your model versions

save_model function

The save_model function locally saves:

  • A pickle file of your model
  • The hyperparameters of the algorithim you used
  • The perfomance metrics
  • Some stats that characterize your model data
  • The features and target variable
  • The mean X_train feature values for tracking purposes
  • Any other metric you want to add

ClassModelResults class

This class loads all the resutls from saved from the save_model function and summarize it in a dictionary with the following dataframes:

  • params: The hyperparameters of the algorithim every model was built
  • metrics: The perfomance metrics of every model
  • stats: Some stats that characterize every model data
  • features_train_cols: The features and target variable of every model
  • features_train_mean: The mean X_train feature values of every model

NOTE: you can also specify that you only want to load the n last results, using ClassModelResults(last_results=n)

Try it yourself

Try it yourself and test it using the Tracking model results - Test notebook.ipynb

Example

Build a basic model

df = pd.read_csv('Titanic_train.csv')

df = pd.get_dummies(df,columns=['Sex'])

target = 'Survived'
X = df[['Pclass','Age','Sex_female','Fare']]
X = X.fillna(0)
y = df[target]
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=42)
n = 10

### Training
from sklearn import svm
import xgboost as xgb

#clf = RandomForestClassifier(n_estimators=50+(n*2), max_depth=(n*3), random_state=n)
#clf = svm.SVC(C=n,probability=True)
clf = xgb.XGBClassifier(max_depth=n)
clf.fit(X_train,y_train)

### PREDICTIONS
y_pred = clf.predict(X_test)
#y_probas = clf.predict_proba(X_test)
labels = y.unique()
feature_names = list(X_train.columns)

### Save model
save_model(clf,X_train,X_test,y_train,y_test,"Your_user_name",1,"",
           {},
           {},
           {},
           1,
           save = 1
          )
  
saved metrics 
MODEL ID:  Your_user_name-20230226153020
Tipo_metrica: Clasificacion
Tipo_modelo: 1
Algoritmo: XGBClassifier
umbral: 0.5
AUC: 0.8466666666666667
Gini: 0.6933333333333334
PRAUC: 0.8108876087759457
F1_score: 0.7467811158798284
Accuracy: 0.8
Recall: 0.725
Precision: 0.7699115044247787
Fecha: 2023-02-26 15:30:00
Comentario: 

Modelo, parametros, metricas y stats guardados exitosamente

Initiate object model results

model_results = ClassModelResults()

Update all models results

dir_results_files = 'results_files' #Default path
dir_models = 'models' #Default path
df_results = model_results.get_model_results(dir_results_files)

Get metrics from all the stored models

df_results["metrics"]
Model Tipo_metrica Tipo_modelo Algoritmo umbral AUC Gini PRAUC F1_score Accuracy Recall Precision Fecha Comentario
0 Your_user_name-20230226152641 Clasificacion 1 XGBClassifier 0.5 0.873905 0.747810 0.857136 0.737327 0.806780 0.666667 0.824742 2023-02-26 15:26:00 NaN
0 Your_user_name-20230226152648 Clasificacion 1 XGBClassifier 0.5 0.846667 0.693333 0.810888 0.746781 0.800000 0.725000 0.769912 2023-02-26 15:26:00 NaN
0 Your_user_name-21214708 Clasificacion 1 RandomForestClassifier 0.5 0.833333 0.666667 NaN 0.776812 0.786441 0.778095 0.780085 2023-02-21 21:47:00 NaN
0 Your_user_name-20230226153020 Clasificacion 1 XGBClassifier 0.5 0.846667 0.693333 0.810888 0.746781 0.800000 0.725000 0.769912 2023-02-26 15:30:00 NaN
0 Your_user_name-21214715 Clasificacion 1 SVC 0.5 0.785476 0.570952 NaN 0.648842 0.684746 0.653095 0.682529 2023-02-21 21:47:00 NaN
0 Your_user_name-21214718 Clasificacion 1 SVC 0.5 0.763810 0.527619 NaN 0.640441 0.688136 0.645476 0.691590 2023-02-21 21:47:00 NaN

Get columns

df_results["features_train_cols"]
Model 0 1 2 3 4
0 Your_user_name-20230226152648 Pclass Age Sex_female Fare Survived
0 Your_user_name-21214718 Pclass Age Sex_female Fare Survived
0 Your_user_name-20230226153020 Pclass Age Sex_female Fare Survived
0 Your_user_name-21214708 Pclass Age Sex_female Fare Survived
0 Your_user_name-21214715 Pclass Age Sex_female Fare Survived
0 Your_user_name-20230226152641 Pclass Age Sex_female Fare Survived

Load specific model

load_model(f"Your_user_name-20230226152641",dir_models=dir_models)
{'chosen_model': XGBClassifier(base_score=0.5, booster='gbtree', callbacks=None,
               colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1,
               early_stopping_rounds=None, enable_categorical=False,
               eval_metric=None, gamma=0, gpu_id=-1, grow_policy='depthwise',
               importance_type=None, interaction_constraints='',
               learning_rate=0.300000012, max_bin=256, max_cat_to_onehot=4,
               max_delta_step=0, max_depth=1, max_leaves=0, min_child_weight=1,
               missing=nan, monotone_constraints='()', n_estimators=100,
               n_jobs=0, num_parallel_tree=1, predictor='auto', random_state=0,
               reg_alpha=0, reg_lambda=1, ...)}

Get best model

model_results.load_best_model('AUC')
{'chosen_model': XGBClassifier(base_score=0.5, booster='gbtree', callbacks=None,
               colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1,
               early_stopping_rounds=None, enable_categorical=False,
               eval_metric=None, gamma=0, gpu_id=-1, grow_policy='depthwise',
               importance_type=None, interaction_constraints='',
               learning_rate=0.300000012, max_bin=256, max_cat_to_onehot=4,
               max_delta_step=0, max_depth=1, max_leaves=0, min_child_weight=1,
               missing=nan, monotone_constraints='()', n_estimators=100,
               n_jobs=0, num_parallel_tree=1, predictor='auto', random_state=0,
               reg_alpha=0, reg_lambda=1, ...)}

About

Simple framework to save and track your model versions performance on the exploration and training phase

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors