This project contains an exploratory data analysis (EDA) and a machine learning model to predict whether students will take a specific type of exam (SM).
.
├── data/ # Directory to store datasets (datasets are not uploaded to GitHub)
├── notebooks/ # Jupyter notebooks for EDA and model training
│ ├── eda.ipynb # Exploratory Data Analysis notebook
│ └── model_training.ipynb# Model training and evaluation notebook
├── README.md # Project overview and instructions
└── requirements.txt # Python dependencies
- Python 3.7 or higher
- Jupyter Notebook
-
Clone the repository:
git clone https://github.com/LeviJesus/students_evasion_with_machine_learning.git cd students_evasion_with_machine_learning -
Create a virtual environment and activate it:
python -m venv venv source venv/bin/activate # On Windows use `venv\Scripts\activate`
-
Install the required packages:
pip install -r requirements.txt
-
Run the Jupyter Notebook server:
jupyter notebook
-
Open and run the notebooks in the
notebooks/directory:eda.ipynb: Perform exploratory data analysis.model_training.ipynb: Train and evaluate the machine learning model.
Heavy datasets are not included in this repository.
Please download them from this Google Drive link.
The EDA notebook (eda.ipynb) includes:
- Loading and cleaning the datasets.
- Visualizing the distribution of access times.
- Analyzing the temporal patterns of student access.
- Examining the distribution of different types of exams.
The model training notebook (model_training.ipynb) includes:
- Defining the target variable.
- Creating features based on student access patterns.
- Normalizing the data.
- Training and evaluating multiple machine learning models.
- Selecting the best model based on performance metrics.
Contributions are welcome! Please open an issue or submit a pull request for any improvements or bug fixes.