VulInject

This is the codebase for the paper "VulInject: Multi-Type Samples Generation for Learning-based Vulnerability Detection".

Structure

VulInject
├── data                                    
│   ├── pattern                             <- Vulnerability Pattern            
│   └── programs                            <- Benign programs' source code
|       |—— vim
|       └── ...
├── result
│   └── generated_vulnerable_programs       <- Gnerated vulnerable programs' information
│       ├── vim
│       │   ├── c_origin                    <- Original C files 
│       │   ├── c_vul                       <- Generated vulnerable C files
│       │   └── match_result                <- Detailed vulnerability infomation including critical variables info, CVE type, vulnerable code slice, modified statements lines info
│       └── ...
├── src
|   ├── copydetect                          <- Source code of Copydetect
|   ├── ctags                               <- Source code of Ctags
|   ├── get_code_slice                      <- Source code for syntax matching, semantic matching and code slicing
|   │   ├── match.sh                        <- Shell script for syntax matching and semantic matching
|   │   ├── get_slice.py                    <- Generate code slices
|   │   └── ...
|   ├── joern-0.3.1                         <- Source code of Joern
|   ├── neo4j                               <- Source code of neo4j
|   ├── pattern_application                 
|   │   └── injection.py                    <- Inject vulnerabilities into programs using patterns
|   └── type_labeling                       
|       ├── LLM.py                          <- Vulnerability type labeling using ChatGPT
|       ├── vulsample.py                    <- Module for extracting patch statements type and critival variables type
|       └── ...
├── experiments
|   ├── binary_models                       <- Source code for binary model training and test 
|   |   └── data                            <- data for binary model training and test
│   └── multiclass_models
|       ├── PDBERT                          <- Source code for PDBERT
|       └── VulBERTa                        <- Source code for VulBERTa
├── base_env.yml                            <- Conda base environment configuration file
├── OPENAI_env.yml                          <- Conda OPENAI environment configuration file
├── vulinject_env.yml                       <- Conda vulinject environment configuration file
├── run.sh                                  <- Shell script for the entire project
└── README.md

Get Started

Prerequisites

Install necessary dependencies before running the project:

JAVA (jdk1.8.0_161)
ant (1.9.14)
Joern (0.3.1)
Universal Ctags (6.1.0)
Copydetect
Neo4j (2.1.5)

Setup

This section gives the steps, explainations for getting the project running.

1) Clone this repo

$ git clone https://github.com/VulInject/VulInject.git

2) Install Prerequisites

You should install prerequisites and add them to the system path.

export JAVA_HOME=/usr/java/jdk1.8.0_161
export JRE_HOME=/usr/java/jdk1.8.0_161/jre
export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar:$JRE_HOME/lib
export PATH=$JAVA_HOME/bin:$JRE_HOME/bin:$PATH
export ANT_HOME=/usr/ant/apache-ant-1.9.14
export PATH=$PATH:$ANT_HOME/bin

We provide source code of Joern, Ctags, Copydetect, and Neo4j in the src folder. You can directly use them.

3) Configure Conda

Create three new conda environments.

conda env create -f base_env.yml
conda env create -f OPENAI_env.yml 
conda env create -f vulinject_env.yml

4) Configure the project

You should modify paths in the configuration file (src/get_code_slice/config.json) to ensure successful code slicing.

How To Run

Vulnerability Generation

Put your target programs' repository in data/programs
We provide a shell script for you to inject vulnerabilities into the target programs automatically.

bash run.sh

Generated vulnerabilities are stored in result/generated_vulnerable_programs/<Your Target program's name>

Downstream Tasks Evaluation

Models

For binary models, we use VulCNN, VulBERTa, LineVul and Devign. For multiclass models, we use PDBERT and VulBERTa.

The source code for these models can be found in the experiments directory.

The complete experimental source code and trained models have been uploaded to Zenodo and are available at: https://zenodo.org/records/18811174?preview=1&token=eyJhbGciOiJIUzUxMiJ9.eyJpZCI6ImJiOWQ1YWExLTI2YTYtNDQ1Yi1hOTYwLTk5Nzk5Y2IzODQ4MCIsImRhdGEiOnt9LCJyYW5kb20iOiJhN2ZjMWI0OWM0ODViZmY2MmQxNGI5NmRjMzk2MTkzNiJ9.84AVkgnjPXAnDkjuW1lrboIo1blG7yy-Eus5t9drSvgXO3Q6OpzvZD2d8iCXgbFdGd8WPvg62x3N-fJ1M91UaA

Dataset

352 vulnerability patterns we extract are present in data/pattern.
For the training and testing of binary and multi-class classification models, we use the PrimeVul dataset as the baseline. The data is located in the respective model directories. Please refer to the README file of each model for details.
We will make all generated datasets publicly available upon the acceptance of this paper.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

VulInject

Structure

Get Started

Prerequisites

Setup

1) Clone this repo

2) Install Prerequisites

3) Configure Conda

4) Configure the project

How To Run

Vulnerability Generation

Downstream Tasks Evaluation

Models

Dataset

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
data		data
experiments		experiments
result/generated_vulnerable_programs		result/generated_vulnerable_programs
src		src
OPENAI_env.yml		OPENAI_env.yml
README.md		README.md
base_env.yml		base_env.yml
run.sh		run.sh
vulinject_env.yml		vulinject_env.yml

Folders and files

Latest commit

History

Repository files navigation

VulInject

Structure

Get Started

Prerequisites

Setup

1) Clone this repo

2) Install Prerequisites

3) Configure Conda

4) Configure the project

How To Run

Vulnerability Generation

Downstream Tasks Evaluation

Models

Dataset

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages