Simple BoN Jailbreaking for Ollama

This project is based on the references linked below. Since the original source code from the paper is difficult to use, I created a simple Python program for local testing.

The main source file is bon.py, which borrows code from bon-jailbreaking, including functions such as FALSE_POSITIVE_PHRASES, apply_word_scrambling, apply_random_capitalization, and apply_ascii_noising for text augmentation.

To determine whether a response is harmful, this program uses the OpenAI moderation API.

In bon.py, the model llama3.2 is hardcoded for testing purposes. You can replace it with any Ollama-supported model.

For an example of test result, see candidate.txt.

How to run

Install Ollama
Download Ollama from the following link:
- https://ollama.com/download
Install an Ollama Model
Use the following command to install the llama3.2 model:

ollama run llama3.2

For more information, refer to https://ollama.com/library/llama3.2

Install Ollama Python Library Install the required Python library with:

pip install ollama

For details, see https://github.com/ollama/ollama-python

Run this program

Execute the program using:

python bon.py

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.gitignore		.gitignore
README.md		README.md
bon.py		bon.py
candidate.txt		candidate.txt
test_on_anthropic.py		test_on_anthropic.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Simple BoN Jailbreaking for Ollama

How to run

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Simple BoN Jailbreaking for Ollama

How to run

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages