Example code for the paper:
"InvThink: Towards AI Safety via Inverse Reasoning" (XXX xxx)
To train LLMs with the InvThink framework, you should prepare a dataset that consists of:
- query,
- inverse reasoning (including harm enumeration, consequence analysis, mitigation strategy, and safe forward reasoning), and
- safe response.
You can train your LLMs using the provided scripts.
Please update configuration parameters (e.g., model_name) according to your setup.
@inproceedings{invthink2025,
title = {InvThink: Towards AI Safety via Inverse Reasoning},
author = {Anonymous},
booktitle = {XXX},
year = {2025}
}