Modern software often accepts inputs with highly complex grammars. To conduct greybox fuzzing and uncover security bugs in such software, it is essential to generate inputs that conform to the software input grammar. However, this is a well-known challenging task because it requires a deep understanding of the grammar, which is often not available and hard to infer. Recent advances in large language models (LLMs) have shown that they can be used to synthesize high-quality natural language text and code that conforms to the grammar of a given input format. Nevertheless, LLMs are often incapable or too costly to generate non-textual outputs, such as images, videos, and PDF files. This limitation hinders the application of LLMs in grammar-aware fuzzing.
This paper presents a novel approach to enabling grammar-aware fuzzing over non-textual inputs. We employ LLMs (e.g., GPT-3.5) to synthesize and further mutate input generators, often in the format of Python scripts, that generate data that conform to the grammar of a given input format. Then, non-textual data yielded by the input generators are further mutated by traditional fuzzers (e.g., AFL++) to explore the software input space more effectively. Holistically, our approach, namely G2FUZZ, features a hybrid strategy that combines a “holistic search” driven by LLMs and a “local search” driven by industrial quality fuzzers. Two key advantages of G2FUZZ are: (1) LLMs are good at synthesizing and mutating input generators and enabling jumping out of local optima, thus achieving a synergistic effect when combined with mutation-based fuzzers; (2) LLMs are less frequently invoked unless really needed, thus significantly reducing the cost of LLM usage.
We have implemented G2FUZZ on the latest version of AFL++ (AFL++-4.32c).
pip install openai==1.63.2
cd evaluation_path
git clone https://github.com/G2FUZZ/G2FUZZ
cp ./G2FUZZ/openai_key.txt .
cp ./G2FUZZ/program_to_format.json .
cp ./G2FUZZ/model_setting.json .
Then, you need to set up these three files:
openai_key.txt: The OpenAI key.program_to_format.json: The target program and its expected input formats.model_setting.json: The model we used.
The compilation method for G2FUZZ is the same as that for AFL++: make source-only.
The method for compiling the target program is also consistent with AFL++, requiring program.afl (the program compiled under the default mode) and program.cmp (the program compiled under cmplog mode).
cd evaluation_path
python ./G2FUZZ/program_gen.py --output ./<program_name>_output --program <program_name>
For example:
python ./G2FUZZ/program_gen.py --output ./jhead_output --program jhead
The final input corpus has two parts: 1) The initial seed you prepared, such as seeds from FuzzBench/UNIFUZZ. 2) The seeds generated by G2FUZZ.
In this step, we need to integrate them into one folder initial_seeds for fuzzing.
cd evaluation_path
mkdir initial_seeds
cp -r seeds/you/prepared/* initial_seeds
cp -r <program_name>_output/default/gen_seeds initial_seeds
For example:
cp -r jhead_output/default/gen_seeds/* initial_seeds
Note: To ensure experimental fairness in the paper, all fuzzers — including G2FUZZ — are initialized with the same set of initial seeds you prepared. Moreover, the fuzzing process in G2FUZZ is suspended during its seed generation phase.
cd evaluation_path
./G2FUZZ/afl-fuzz -i ./initial_seeds -o ./<program_name>_output -c ./program.cmp -m 1024 -k ./G2FUZZ/ -- ./program.afl <ARG> @@ <ARG>
Note that: ./<program_name>_output is the --output ./<program_name>_output in Step I.
For example:
./G2FUZZ/afl-fuzz -i ./initial_seeds -o ./jhead_output -c ./jhead.cmp -m 1024 -k ./G2FUZZ/ -- ./jhead.afl @@
If you have any questions or suggestions, feel free to contact me via email: [email protected]