RFE : Recursive Feature Elimination
Model building before feature Elimination:
# Building Logistic regression with feature elimination
from sklearn.linear_model import LogisticRegressionlogreg = LogisticRegression().fit(X_train, y_train)
Score:
Training set accuracy: 0.790 Test set accuracy: 0.745
Feature columns:
| Pregnancies | Glucose | BloodPressure | SkinThickness | Insulin | BMI | DiabetesPedigreeFunction | Age | Outcome |
|---|
#feature selection by RFE approach
from sklearn.linear_model import LogisticRegression
logreg = LogisticRegression()
from sklearn.feature_selection import RFE
rfe = RFE(logreg, 4) # running RFE with 13 variables as output
rfe = rfe.fit(X,y)
print(rfe.support_) # Printing the boolean results
print(rfe.ranking_) # Printing the ranking
[ True True False False False True True False] [1 1 2 4 5 1 1 3]
# Variables selected by RFE
col = [‘Pregnancies’, ‘Glucose’, ‘BMI’, ‘DiabetesPedigreeFunction’]
Building the Model after feature elimination:
logreg.fit(X_train[col], y_train)
logreg = LogisticRegression().fit(X_train[col], y_train)
print(“Training set accuracy: {:.3f}”.format(logreg.score(X_train[col], y_train)))
print(“Test set accuracy: {:.3f}”.format(logreg.score(X_test[col], y_test)))
Result: