Description

Modern software often accepts inputs with highly complex grammars. To conduct greybox fuzzing and uncover security bugs in such software, it is essential to generate inputs that conform to the software input grammar. However, this is a well-known challenging task because it requires a deep understanding of the grammar, which is often not available and hard to infer. Recent advances in large language models (LLMs) have shown that they can be used to synthesize high-quality natural language text and code that conforms to the grammar of a given input format. Nevertheless, LLMs are often incapable or too costly to generate non-textual outputs, such as images, videos, and PDF files. This limitation hinders the application of LLMs in grammar-aware fuzzing.

This paper presents a novel approach to enabling grammar-aware fuzzing over non-textual inputs. We employ LLMs (e.g., GPT-3.5) to synthesize and further mutate input generators, often in the format of Python scripts, that generate data that conform to the grammar of a given input format. Then, non-textual data yielded by the input generators are further mutated by traditional fuzzers (e.g., AFL++) to explore the software input space more effectively. Holistically, our approach, namely G2FUZZ, features a hybrid strategy that combines a “holistic search” driven by LLMs and a “local search” driven by industrial quality fuzzers. Two key advantages of G2FUZZ are: (1) LLMs are good at synthesizing and mutating input generators and enabling jumping out of local optima, thus achieving a synergistic effect when combined with mutation-based fuzzers; (2) LLMs are less frequently invoked unless really needed, thus significantly reducing the cost of LLM usage.

We have implemented G2FUZZ on the latest version of AFL++ (AFL++-4.32c).

How to use it

Step I: Preparation

Install the dependency libraries

pip install openai==1.63.2

prepare the setting files

cd evaluation_path
git clone https://github.com/G2FUZZ/G2FUZZ
cp ./G2FUZZ/openai_key.txt .
cp ./G2FUZZ/program_to_format.json .
cp ./G2FUZZ/model_setting.json .

Then, you need to set up these three files:

openai_key.txt: The OpenAI key.
program_to_format.json: The target program and its expected input formats.
model_setting.json: The model we used.

Compile G2FUZZ and target program

The compilation method for G2FUZZ is the same as that for AFL++: make source-only. The method for compiling the target program is also consistent with AFL++, requiring program.afl (the program compiled under the default mode) and program.cmp (the program compiled under cmplog mode).

Step II: Run seed generation to get init output

cd evaluation_path
python ./G2FUZZ/program_gen.py --output ./<program_name>_output --program <program_name>

For example:

python ./G2FUZZ/program_gen.py --output ./jhead_output --program jhead

Step III: Run fuzzing

1. Construct input corpus

The final input corpus has two parts: 1) The initial seed you prepared, such as seeds from FuzzBench/UNIFUZZ. 2) The seeds generated by G2FUZZ. In this step, we need to integrate them into one folder initial_seeds for fuzzing.

cd evaluation_path
mkdir initial_seeds
cp -r seeds/you/prepared/* initial_seeds
cp -r <program_name>_output/default/gen_seeds initial_seeds

For example:

cp -r jhead_output/default/gen_seeds/* initial_seeds

Note: To ensure experimental fairness in the paper, all fuzzers — including G2FUZZ — are initialized with the same set of initial seeds you prepared. Moreover, the fuzzing process in G2FUZZ is suspended during its seed generation phase.

2. Formal fuzzing

cd evaluation_path
./G2FUZZ/afl-fuzz -i ./initial_seeds -o ./<program_name>_output -c ./program.cmp -m 1024 -k ./G2FUZZ/ -- ./program.afl <ARG> @@ <ARG>

Note that: ./<program_name>_output is the --output ./<program_name>_output in Step I.

For example:

./G2FUZZ/afl-fuzz -i ./initial_seeds -o ./jhead_output -c ./jhead.cmp -m 1024 -k ./G2FUZZ/ -- ./jhead.afl @@

Contact

If you have any questions or suggestions, feel free to contact me via email: [email protected]

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
benchmark		benchmark
coresight_mode		coresight_mode
custom_mutators		custom_mutators
dictionaries		dictionaries
docs		docs
frida_mode		frida_mode
include		include
instrumentation		instrumentation
nyx_mode		nyx_mode
py_utils		py_utils
qemu_mode		qemu_mode
src		src
test		test
testcases		testcases
unicorn_mode		unicorn_mode
utils		utils
Android.bp		Android.bp
CITATION.cff		CITATION.cff
CONTRIBUTING.md		CONTRIBUTING.md
Changelog.md		Changelog.md
Dockerfile		Dockerfile
GNUmakefile		GNUmakefile
GNUmakefile.gcc_plugin		GNUmakefile.gcc_plugin
GNUmakefile.llvm		GNUmakefile.llvm
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
TODO.md		TODO.md
afl-addseeds		afl-addseeds
afl-cmin		afl-cmin
afl-cmin.bash		afl-cmin.bash
afl-persistent-config		afl-persistent-config
afl-plot		afl-plot
afl-system-config		afl-system-config
afl-whatsup		afl-whatsup
afl-wine-trace		afl-wine-trace
config.h		config.h
dynamic_list.txt		dynamic_list.txt
entitlements.plist		entitlements.plist
generator_mutation.py		generator_mutation.py
injections.dic		injections.dic
model_setting.json		model_setting.json
openai_key.txt		openai_key.txt
program_gen.py		program_gen.py
program_to_format.json		program_to_format.json
test-instr.c		test-instr.c
types.h		types.h

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Description

How to use it

Step I: Preparation

Install the dependency libraries

prepare the setting files

Compile G2FUZZ and target program

Step II: Run seed generation to get init output

Step III: Run fuzzing

1. Construct input corpus

2. Formal fuzzing

Contact

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Description

How to use it

Step I: Preparation

Install the dependency libraries

prepare the setting files

Compile G2FUZZ and target program

Step II: Run seed generation to get init output

Step III: Run fuzzing

1. Construct input corpus

2. Formal fuzzing

Contact

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages