Feature Selection, Machine Learning

Feature Selection Technique 2

RFE : Recursive Feature Elimination

Model building before feature Elimination:

# Building Logistic regression with feature elimination
from sklearn.linear_model import LogisticRegression

logreg = LogisticRegression().fit(X_train, y_train)

Score:

Training set accuracy: 0.790  Test set accuracy: 0.745

Feature columns:

Pregnancies Glucose BloodPressure SkinThickness Insulin BMI DiabetesPedigreeFunction Age Outcome

#feature selection by RFE approach
from sklearn.linear_model import LogisticRegression
logreg = LogisticRegression()
from sklearn.feature_selection import RFE
rfe = RFE(logreg, 4) # running RFE with 13 variables as output
rfe = rfe.fit(X,y)
print(rfe.support_) # Printing the boolean results
print(rfe.ranking_) # Printing the ranking

[ True True False False False True True False] [1 1 2 4 5 1 1 3]

# Variables selected by RFE
col = [‘Pregnancies’, ‘Glucose’, ‘BMI’, ‘DiabetesPedigreeFunction’]

Building the Model after feature elimination:

logreg.fit(X_train[col], y_train)
logreg = LogisticRegression().fit(X_train[col], y_train)
print(“Training set accuracy: {:.3f}”.format(logreg.score(X_train[col], y_train)))
print(“Test set accuracy: {:.3f}”.format(logreg.score(X_test[col], y_test)))

Result:

Training set accuracy: 0.780
Test set accuracy: 0.758

 

 

Feature Selection, Machine Learning

Feature Selection Technique-1

 

GLM : Generalized linear Model

Github link

#Feature selection by different way and analysing the with fitting the model

import statsmodels.api as sm
#Feature selection by GLM approach
# Logistic regression model
logm1 = sm.GLM(y_train,(sm.add_constant(X_train)), family = sm.families.Binomial())
logm1.fit().summary()

 

 

Dep. Variable: Outcome No. Observations: 537
Model: GLM Df Residuals: 528
Model Family: Binomial Df Model: 8
Link Function: logit Scale: 1.0000
Method: IRLS Log-Likelihood: -245.19
Date: Thu, 08 Aug 2019 Deviance: 490.37
Time: 16:19:56 Pearson chi2: 667.
No. Iterations: 5 Covariance Type: nonrobust
coef std err z P>|z| [0.025 0.975]
const -9.3762 0.908 -10.328 0.000 -11.155 -7.597
Pregnancies 0.1084 0.039 2.803 0.005 0.033 0.184
Glucose 0.0373 0.005 7.973 0.000 0.028 0.046
BloodPressure -0.0096 0.006 -1.566 0.117 -0.022 0.002
SkinThickness -0.0004 0.008 -0.048 0.962 -0.017 0.016
Insulin -0.0012 0.001 -1.103 0.270 -0.003 0.001
BMI 0.0952 0.018 5.197 0.000 0.059 0.131
DiabetesPedigreeFunction 1.3783 0.367 3.758 0.000 0.659 2.097
Age 0.0202 0.011 1.809 0.070 -0.002 0.042