This repo consists of helper functions for me, maybe they could help you aswell.
Heat map with annotations
correlation = df.corr().abs()
plt.figure(figsize=(8,8))
sns.heatmap(correlation, annot=True)
plt.show()Feature Selection with SelectKBest
from sklearn.feature_selection import SelectKBest
kbest = SelectKBest(k=5)
k_best_features = kbest.fit_transform(features, target)
list(df.columns[kbest.get_support(indices=True)])Concatenate One Hot Encoded Categorical Variables
df = pd.concat([df, pd.get_dummies(df["col"], prefix="col")], axis=1)
df.drop(["col"], axis=1, inplace=True)Scaling
from sklearn.preprocessing import MinMaxScaler
scaler = MinMaxScaler()
scaled_columns = pd.DataFrame(scaler.fit_transform(df[columns_to_scale]), columns=columns_to_scale)Get list of numerical variables
num_vars = [ var for var in data.columns if data[var].dtypes != ‘O’]Get list of categorical variables
cat_vars = [var for var in data.columns if data[var].dtypes == ‘O’]Using joblib to save models and pipelines
import joblib
joblib.dump(pipeline, 'model.joblib')
joblib_model = joblib.load('model.joblib')Using pickle to save models and pipelines
import pickle
with open('model.pkl', 'wb') as model_file: pickle.dump(pipeline, model_file)Author: github.com/merveenoyan
