Skip to content

CrystalCo/vulnerability_detection

Repository files navigation

Setup

To download this repository, clone it using git: git clone https://github.com/CrystalCo/vulnerability_detection.git

Requires Python 3.8.10

Step 0

If the files in slicesSource are not compressed, you may skip this step. Otherwise do the following: - Download git lfs at https://git-lfs.github.com/ - Install by running git lfs install. - Run git lfs pull to unzip the large files.

Step 1

Create an ENV variable with the path to this project, and call it VUL_PATH.

Example for Unix/MacOS:

export VUL_PATH=`pwd`

or manually:

export VUL_PATH=/Users/cryst/Documents/vulnerability_detection

Step 2

Create a virtual environment:

python3 -m virtualenv env

Activate virtual environment:

source env/bin/activate

Install requirements:

pip install -r requirements.txt 

Step 3

Make sure the following folders are inside the root directory:

  • w2vModel/metrics/
  • w2vModel/metrics/bgru
  • w2vModel/metrics/blstm
  • w2vModel/model/
  • model/

Make sure the following folders are inside the data directory:

  • CVE/
  • CWE/
  • DLinputs/
  • DLinputs/
  • DLvectors/
  • DLvectors/
  • DLvectors/
  • DLvectors/
  • slicesSource/
  • token/
  • tokens/

0_SYSE_source2slice

Original code that converts source code to slices.

SYSE_1_isVulnerable

Contains the original source code for binary vulnerability detection. 2_Application_Codes.ipynb is the main file to run in this folder. It uses BGRU & BLSTM to detect whether a slice of code contains a vulnerability or not.

SYSE_2_vulnerabilityType

Contains the follow up code that attempts the multiclass classification of vulnerabilities across 162 Common Weakness Enumeration (CWE) IDs.

CWE_Data_Preprocessing

CWE_Data_Preprocessing.ipynb was the first step in preprocessing the data. It collects the SARD & CVE test case IDs for all the source slices we have. Then, it scrapes the Internet for the CWE attributes for each SARD & CVE ID. Finally, it ouputs 2 files: CWE_DF.csv & CVE_DF.csv. CWE_DF.csv contains all the unique CWE IDs, their details, and counts*. CVE_DF.csv contains all the unique CVE IDs, their descriptions & counts*, and the CWE-ID associated with them if applicable.

*number of times they appear in the source code file.

Grouping_By_Abstraction

Grouping_By_Abstraction.ipynb then collects all the CWE IDs found in the previous step, and creates a tree of relationships between these CWEs. CWEs were grouped by similarity, which are defined as pillars in the Research Concepts view in the CWE website. A dictionary of SARD & CVE IDs mapped to their respective group ID is created, and saved to SARD_CVE_to_groups.csv.

Grouping_By_CWE

Grouping_By_CWE.ipynb CWEs grouped by their unique CWE-ID. A dictionary of the original SARD & CVE IDs mapped to their respective CWE-ID is then saved to SARD_CVE_to_CWE.csv.

2A_Vulnerability_Classification_ML

3A_Vulnerability_Classification_ML.ipynb attempts to classify vulnerability types using ML models.

2B_Vulnerability_Classification_ML_PCA

3A_Vulnerability_Classification_ML_PCA.ipynb attempts to classify vulnerability types using ML models with PCA transformed data.

2C_Vulnerability_Classification_DL

3B_Vulnerability_Classification_DL.ipynb attempts to classify vulnerability types using DL models.

2D_Vulnerability_Classification_DL_PCA

3A_Vulnerability_Classification_DL_PCA.ipynb attempts to classify vulnerability types using DL models with PCA transformed data.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors