Skip to content

mahshid1373/Random_Forest_Classification

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Random Forest Classifier with Model Evaluation and Feature Importance

This repository contains a Python notebook that demonstrates the use of a Random Forest classifier for a supervised learning problem. The notebook includes training the model, evaluating its performance using multiple metrics, and exploring feature importance.

Connect with Me

Table of Contents

Project Overview

This project focuses on training a Random Forest classifier on a dataset. The key steps include:

  1. Training the model: We use the Random Forest algorithm to fit the model to the training data.
  2. Evaluating the model: Predictions are generated using the test set, and various metrics are calculated to assess the model's performance.

Sample Data

Below is a sample of the dataset used in this project:

age job marital education default balance housing loan contact day month duration campaign pdays previous poutcome y
58 management married tertiary no 2143 yes no unknown 5 may 261 1 -1 0 unknown no
44 technician single secondary no 29 yes no unknown 5 may 151 1 -1 0 unknown no
33 entrepreneur married secondary no 2 yes yes unknown 5 may 76 1 -1 0 unknown no
47 blue-collar married unknown no 1506 yes no unknown 5 may 92 1 -1 0 unknown no
33 unknown single unknown no 1 no no unknown 5 may 198 1 -1 0 unknown no

This table displays key features such as age, job, marital status, education level, and financial information (e.g., balance, loan, and housing).

Usage

This is the Distribution of Target:

Distribution of Dataset Target

So 88.3% of target value has the value of NO and other are Yes

and here is the Distribution of all features:

Distribution of Dataset Features

and here is the correlation matrix of the features:

Correlation Matrixs

Each tree image is limited to only showing the first few nodes. These trees can get very large and difficult to visualize. The colors represent the majority class of each node (box, with red indicating majority 0 (no subscription) and blue indicating majority 1 (subscription). The colors get darker the closer the node gets to being fully 0 or 1. Each node also contains the following information:

  • The variable name and value used for splitting
  • The % of total samples in each split
  • The % split between classes in each split Decission Tree Decission Tree Decission Tree

Hyperparameter Tuning:

  • n_estimators: the number of decision trees in the forest. Increasing this hyperparameter generally improves the performance of the model but also increases the computational cost of training and predicting.
  • max_depth: the maximum depth of each decision tree in the forest. Setting a higher value for max_depth can lead to overfitting while setting it too low can lead to underfitting.

Random Forest

This plots what the model predicted against what the correct prediction was. We can use this to understand the tradeoff between false positives and false negatives. Random Forest

Model Evaluation

The model's performance is evaluated using the following metrics:

  • Accuracy: The overall correctness of the model.
  • Precision: The ability of the classifier to not label a negative sample as positive.
  • Recall: The ability of the classifier to find all the positive samples.

Accuracy: 0.9026871613402632

Precision: 0.7362637362637363

Recall: 0.25475285171102663

Feature Importance

You can understand which features contribute the most to the predictions made by the Random Forest model with this code:

importances = rf.feature_importances_
indices = np.argsort(importances)[::-1]

Dependencies

The main libraries used in this project include:

  • scikit-learn: For machine learning algorithms and model evaluation
  • matplotlib: For visualizing feature importance
  • graphviz: For visualizing decision trees
  • numpy: For numerical operations

Install these dependencies using:

pip install -r requirements.txt

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors