Skip to content

G2FUZZ/G2FUZZ

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Description

Modern software often accepts inputs with highly complex grammars. To conduct greybox fuzzing and uncover security bugs in such software, it is essential to generate inputs that conform to the software input grammar. However, this is a well-known challenging task because it requires a deep understanding of the grammar, which is often not available and hard to infer. Recent advances in large language models (LLMs) have shown that they can be used to synthesize high-quality natural language text and code that conforms to the grammar of a given input format. Nevertheless, LLMs are often incapable or too costly to generate non-textual outputs, such as images, videos, and PDF files. This limitation hinders the application of LLMs in grammar-aware fuzzing.

This paper presents a novel approach to enabling grammar-aware fuzzing over non-textual inputs. We employ LLMs (e.g., GPT-3.5) to synthesize and further mutate input generators, often in the format of Python scripts, that generate data that conform to the grammar of a given input format. Then, non-textual data yielded by the input generators are further mutated by traditional fuzzers (e.g., AFL++) to explore the software input space more effectively. Holistically, our approach, namely G2FUZZ, features a hybrid strategy that combines a “holistic search” driven by LLMs and a “local search” driven by industrial quality fuzzers. Two key advantages of G2FUZZ are: (1) LLMs are good at synthesizing and mutating input generators and enabling jumping out of local optima, thus achieving a synergistic effect when combined with mutation-based fuzzers; (2) LLMs are less frequently invoked unless really needed, thus significantly reducing the cost of LLM usage.

We have implemented G2FUZZ on the latest version of AFL++ (AFL++-4.32c).

How to use it

Step I: Preparation

Install the dependency libraries

pip install openai==1.63.2

prepare the setting files

cd evaluation_path
git clone https://github.com/G2FUZZ/G2FUZZ
cp ./G2FUZZ/openai_key.txt .
cp ./G2FUZZ/program_to_format.json .
cp ./G2FUZZ/model_setting.json .

Then, you need to set up these three files:

  • openai_key.txt: The OpenAI key.
  • program_to_format.json: The target program and its expected input formats.
  • model_setting.json: The model we used.

Compile G2FUZZ and target program

The compilation method for G2FUZZ is the same as that for AFL++: make source-only. The method for compiling the target program is also consistent with AFL++, requiring program.afl (the program compiled under the default mode) and program.cmp (the program compiled under cmplog mode).

Step II: Run seed generation to get init output

cd evaluation_path
python ./G2FUZZ/program_gen.py --output ./<program_name>_output --program <program_name>

For example:

python ./G2FUZZ/program_gen.py --output ./jhead_output --program jhead

Step III: Run fuzzing

1. Construct input corpus

The final input corpus has two parts: 1) The initial seed you prepared, such as seeds from FuzzBench/UNIFUZZ. 2) The seeds generated by G2FUZZ. In this step, we need to integrate them into one folder initial_seeds for fuzzing.

cd evaluation_path
mkdir initial_seeds
cp -r seeds/you/prepared/* initial_seeds
cp -r <program_name>_output/default/gen_seeds initial_seeds

For example:

cp -r jhead_output/default/gen_seeds/* initial_seeds

Note: To ensure experimental fairness in the paper, all fuzzers — including G2FUZZ — are initialized with the same set of initial seeds you prepared. Moreover, the fuzzing process in G2FUZZ is suspended during its seed generation phase.

2. Formal fuzzing

cd evaluation_path
./G2FUZZ/afl-fuzz -i ./initial_seeds -o ./<program_name>_output -c ./program.cmp -m 1024 -k ./G2FUZZ/ -- ./program.afl <ARG> @@ <ARG>

Note that: ./<program_name>_output is the --output ./<program_name>_output in Step I.

For example:

./G2FUZZ/afl-fuzz -i ./initial_seeds -o ./jhead_output -c ./jhead.cmp -m 1024 -k ./G2FUZZ/ -- ./jhead.afl @@

Contact

If you have any questions or suggestions, feel free to contact me via email: [email protected]

About

No description, website, or topics provided.

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors