Kelp: A Streaming Safeguard for Large Models via Latent Dynamics-Guided Risk Detection

Overview

Training Datasets 🤗

This 🤗Huggingface dataset contains responses generated by a wide variety of advanced models, including:

LLMs:

VLMs:

The dataset combines data sourced from WildGuard, S-Eval, and JailbreakV.

Getting Start

Train

You can set the following key parameters directly in train.py

model_name: Name of the base model.

train_dataset_dir: Path to the training dataset

test_dataset_dir: Path to the test dataset

python train.py

Evaluation

ckpt_path: Path to the trained checkpoint file (e.g., "./checkpoints/my_model_v1/best.pth")

python eval.py

The evaluation script reports performance at two levels:

Response-level: Overall accuracy, F1, etc. (entire response after generation)

Streaming-level: Metrics considering token-by-token generation.

To test the detection efficiency of Plugguard, run the following script:

python utils/demo_qwen3_with_guardrail.py

We provide a test dataset of 1,000 test samples located at utils/test_sample_1000.txt

Quick Start

For the demo, we prioritize ease of testing: we first let the model produce a response, then concatenate the user query and the model output and run a safety check. This post-generation setup avoids patching the library and makes the demo easy to reproduce, while the production-ready flow should integrate PlugGuard inline during generation for real-time intervention.

python demo.py

Implementation Note: We provide a modified modeling_qwen3.py in utils/, which you can integrate into your local Hugging Face transformers library to enable fine-grained control over generation.

Citation

If you find this work useful, please cite our paper:

@misc{li2025kelpstreamingsafeguardlarge,
      title={Kelp: A Streaming Safeguard for Large Models via Latent Dynamics-Guided Risk Detection}, 
      author={Xiaodan Li and Mengjie Wu and Yao Zhu and Yunna Lv and YueFeng Chen and Cen Chen and Jianmei Guo and Hui Xue},
      year={2025},
      eprint={2510.09694},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2510.09694}, 
}

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
assets		assets
utils		utils
.DS_Store		.DS_Store
LICENSE		LICENSE
README.md		README.md
dataset.py		dataset.py
demo.py		demo.py
eval.py		eval.py
models.py		models.py
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Kelp: A Streaming Safeguard for Large Models via Latent Dynamics-Guided Risk Detection

Overview

Training Datasets 🤗

Getting Start

Train

Evaluation

Quick Start

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Kelp: A Streaming Safeguard for Large Models via Latent Dynamics-Guided Risk Detection

Overview

Training Datasets 🤗

Getting Start

Train

Evaluation

Quick Start

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages