Links to research papers and resources corresponding to implemented features in this repository. Please feel free to fill in any missing references!
OpenUnlearning: Accelerating LLM Unlearning via Unified Benchmarking of Methods and Metrics introduces
- The technical report on OpenUnlearning, its design, features and other details.
- A meta-evaluation framework to benchmark unlearning evaluations on a set of 450+ open sourced models.
- Results benchmarking 8 diverse unlearning methods in one place using 10 evaluation metrics on TOFU.
| Method | Resource |
|---|---|
| GradAscent, GradDiff | Naive baselines found in many papers including MUSE, TOFU etc. |
| NPO | Paper📄, Code 🐙 |
| SimNPO | Paper📄, Code 🐙 |
| IdkDPO | TOFU (📄) |
| RMU | WMDP paper (🐙, 🌐), later used in G-effect (🐙) |
| UNDIAL | Paper📄, Code 🐙 |
| AltPO | Paper📄, Code 🐙 |
| SatImp | Paper📄, Code 🐙 |
| WGA (G-effect) | Paper📄, Code 🐙 |
| CE-U (Cross-Entropy unlearning) | Paper📄 |
| PDU | Paper 📄 |
| Benchmark | Resource |
|---|---|
| TOFU | Paper📄 |
| MUSE | Paper📄 |
| WMDP | Paper📄 |
| Metric | Resource |
|---|---|
| Verbatim Probability / ROUGE, simple QA-ROUGE | Naive metrics found in many papers including MUSE, TOFU etc. |
| Membership Inference Attacks (LOSS, ZLib, Reference, GradNorm, MinK, MinK++) | MIMIR (🐙), MUSE (📄) |
| PrivLeak | MUSE (📄) |
| Forget Quality, Truth Ratio, Model Utility | TOFU (📄) |
| Extraction Strength (ES) | Carlini et al., 2021 (📄), used for unlearning in Wang et al., 2025 (📄) |
| Exact Memorization (EM) | Tirumala et al., 2022 (📄), used for unlearning in Wang et al., 2025 (📄) |
| lm-evaluation-harness | Repository: 💻 |