Patronus

This is the implementation of our paper 'Patronus: Identifying and Mitigating Transferable Backdoors in Language Models'.

Requirements

python == 3.9.18
torch == 2.1.1 (cuda12.1)
transformers == 4.35.2
datasets == 2.15.0
openprompt == 1.0.1
seqeval == 1.2.2
wordfreq == 3.1.1
umap-learn == 0.5.5
matplotlib

pip install -r requirements.txt

import nltk
nltk.download('stopwords')

Architecture

configs: Parameter configuration files for executing code
data: Stores and loads data
defenders: Implements defense methods
models: Stores model files
poisoners: Implements trigger insertion and constructs poisoned data
trainers: Implements backdoor attack training
victims: Loads models

Implementation

The main program includes

attack_plm.py : attack pre-trained model + fine-tuning downstream tasks
trigger_search.py : search suspicious triggers
adv_ft_sc.py / adv_pretrain.py : Purify backdoor models through adversarial fine-tuning and adversarial pre-training

Run

We use shell scripts to run the code.

For example, we attack the bert model with 6 triggers through the script shown below:

CUDA_VISIBLE_DEVICES=0 python attack_plm.py --config_path ./configs/attack/attack.yaml

We run the following command to search for triggers in the backdoor model:

CUDA_VISIBLE_DEVICES=0 python trigger_search.py --config_path ./configs/trigger_search/trigger_search.yaml

We run the following command to purify backdoor model:

CUDA_VISIBLE_DEVICES=0 python adv_ft_sc.py --config_path ./configs/defense/adv_fintune/adv_fintune.yaml

CUDA_VISIBLE_DEVICES=0 python adv_pretrain.py --config_path ./configs/defense/adv_pretrain/adv_pretrain.yaml

Citing this paper

@article{zhao2025patronus,
  title={Patronus: Identifying and Mitigating Transferable Backdoors in Pre-trained Language Models},
  author={Zhao, Tianhang and Du, Wei and Zhao, Haodong and Duan, Sufeng and Liu, Gongshen},
  journal={arXiv preprint arXiv:2512.06899},
  year={2025}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Patronus

Requirements

Architecture

Implementation

Run

Citing this paper

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
configs		configs
data		data
defenders		defenders
models		models
poisoners		poisoners
trainers		trainers
utils		utils
victims		victims
README.md		README.md
adv_ft_sc.py		adv_ft_sc.py
adv_pretrain.py		adv_pretrain.py
attack_plm.py		attack_plm.py
requirements.txt		requirements.txt
stopwords.zip		stopwords.zip
trigger_search.py		trigger_search.py

Folders and files

Latest commit

History

Repository files navigation

Patronus

Requirements

Architecture

Implementation

Run

Citing this paper

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages