docker build -t verilogeval:v2 .
docker run -it --name your-perfect-dokcer-name -v $(pwd):/workspace/verilogeval verilogeval:v2
If you have any questions or would like further information, please feel free to contact us at [email protected] and [email protected]. You can also visit our homepages for more details about our work: Zhongkai Yu and Chenyang Zhou.
This repository provides an end-to-end evaluation framework that spans Verilog generation, Verilog debugging, reference model generation, and utility tooling. Conceptually, it is organized into four main components:
- Verilog Gen: datasets for generating Verilog designs with different structures and difficulty levels (self‑contain / non‑self‑contain / CPU IP).
- Verilog Debugging: buggy RTL variants and their associated
prompt.txt / ref.sv / test.svfor 0‑shot and 1‑shot debugging tasks. - Ref Model Gen: cross‑language functional reference models (Python / CXXRTL / SystemC) for the same set of problems.
- Tool Box: utilities for verification and data generation, such as cross‑language consistency checking and reference‑model‑based testbench generation.
Directory: scripts/
sv-generate: unified LLM Verilog generation / debugging script.- Supports multiple backends (OpenAI, DeepSeek, Gemini, Claude, Together, local vLLM server).
--taskselects the high‑level prompting style, e.g.:code-complete-iccad2023: complete the body ofTopModule.spec-to-rtl: generate RTL directly from problem specification.
--examplesand--rulescontrol few‑shot examples and coding conventions.
- Other files such as
verilog-example-prefix_*.txtandprompt-example-prefix.txtprovide prefix examples for different tasks / shot settings.
mkdir -p build/
MODEL_NAME="gpt-5.2" # change this to your model
TASK_NAME="nowcoder" # change this to your task
./configure --with-model=$MODEL_NAME --with-task=$TASK_NAME
make
mkdir -p .save && mv Prob* .save/
The evalution harness is run using make and various evaluation parameters can be set as below:
mkdir -p build/
./configure --with-task=$task --with-model=$model --with-examples=$shots --with-samples=$samples --with-temperature=$temperature --with-top-p=$top_p
make
Evaluation can be sped up by providing the -j flag to make, such as -j4 to run 4 worker processes.
Valid models are listed at the top of scripts/sv-generate. The number of in-context learning examples can be between 0-4, and given with --with-examples. Samples to collect per problem are given by --with-samples. Finally, model temperature and top_p can be set to --with-temperature and --with-top-p, respectively.
These parameters can be easily swept with a shell script, to create separate build directories for each evaluation harness configuration target.