DIVERSE: Disagreement-Inducing Vector Evolution For Rashomon Set Exploration

🌟 Overview

DIVERSE is a framework for systematically exploring the Rashomon set of neural networks — the collection of models that achieve similar accuracy to a reference model while differing in their predictive behavior.
The method augments pretrained networks with Feature-wise Linear Modulation (FiLM) layers and uses Covariance Matrix Adaptation Evolution Strategy (CMA-ES) to explore a latent modulation space, discovering diverse model variants without retraining or gradient access

✨ Key Ideas

🎛 FiLM-based modulation: Introduces lightweight FiLM layers into a frozen pretrained model, enabling controlled activation shifts via a latent vector.
🧬 CMA-ES optimization: Gradient-free evolutionary search over latent vectors, targeting disagreement while maintaining accuracy.
📊 Rigorous Rashomon protocol: Enforces Rashomon membership on a validation set and reports diversity only on a held-out test set.
⚡ Scalable exploration: Substantially reduces computational costs compared to retraining-based approaches.

🚀 Getting Started

Requirements

System

Tested on Ubuntu 22.04 LTS
Requires a bash shell to run experiment scripts (.sh)
Requires an NVIDIA GPU with CUDA support
(tested on GeForce RTX 4090 with CUDA 12.4)
Windows users may run the code inside WSL2 or a Linux container
(not officially tested)

Software

Anaconda
Python 3.11.7

Setup

conda env create -f environment.yml

Activate the new environment: conda activate diverse

Training the Pretrained Models and Generating Z Seeds

First, make the provided bash script executable:

chmod +x init.sh

Then run it:

./init.sh

This will:

Train the reference (pretrained) models for each dataset.
Generate initial latent vectors (Z seeds) used for CMA-ES exploration.

Running CMA-ES Search

Before running, ensure the conda environment is activated:

conda activate diverse

This script performs an extensive hyperparameter sweep, which can take a long time and heavily use the GPU. Parallelism is controlled through subprocesses; to adjust the number of workers, edit utils/experiment_parameters.py

You have to repeat the following command for each epsilon (0.01, 0.02, 0.03, 0.04, 0.05) and each model type (mnist, resnet50_pneumonia, vgg16_cifar10)

python run_epsilon_CMA.py --model_type=<model_type> --epsilon=<epsilon>

Evaluating CMA-ES Results

Once you have run all CMA-ES runs, you can evaluate each epsilon, z dimension (2, 4, 8, 16, 32 and 64) and dataset combination with the following:

python -m CMA.CMA_evaluation --model_type=<model_type> --epsilon=<epsilon> --z_dim=<z_dim>

🏁 Running Baselines

Before running any baselines, ensure the conda environment is activated:

conda activate diverse

Dropout (Hsu et al., 2024)

We provide an implementation of the dropout-based Rashomon exploration method described in Hsu et al. (ICLR 2024).

For MNIST:

python -m baselines.dropout --method=gaussian --model=mnist --epsilon=0.05 --search_budget=167

For ResNet50:

python -m baselines.dropout --method=gaussian --model=resnet --epsilon=0.05 --search_budget=167

For VGG16:

python -m baselines.dropout --method=gaussian --model=vgg --epsilon=0.05 --search_budget=167

Retraining

Warning: Retraining is computationally expensive and may require significant time and GPU resources.You can adjust the number of models to train and evaluate via the --search_budget flags. Supported values: 167, 320, 640, 1284, 2562, 5120.

Training

For MNIST:

python -m baselines.retraining --model=mnist --start_seed=42 --search_budget=5120

For ResNet50:

python -m baselines.retraining --model=resnet --start_seed=42 --search_budget=640

For VGG16:

python -m baselines.retraining --model=vgg --start_seed=45 --search_budget=640

Evaluation

After training, results can be evaluated on the test set. Outputs will be stored in 5 folders (epsilon_0.01, epsilon_0.02, epsilon_0.03, epsilon_0.04, epsilon_0.05) in the following path:

baseline_evaluations/retraining/retraining_<model>/epsilon_<epsilon_value>/

For MNIST:

python -m baselines.retraining_evaluator --model=mnist --search_budget=5120

For ResNet50:

python -m baselines.retraining_evaluator --model=resnet --search_budget=640

For VGG16:

python -m baselines.retraining_evaluator --model=vgg --search_budget=640

Plotting the results

To plot the results, you will first have to run each CMA search and evaluation for every dataset and epsilon, and have also run and evaluate all baselines for each dataset on the same epsilons.

Once the results are available, generate the plots with:

python -m utils.plotter

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DIVERSE: Disagreement-Inducing Vector Evolution For Rashomon Set Exploration

🌟 Overview

✨ Key Ideas

🚀 Getting Started

Requirements

Setup

Training the Pretrained Models and Generating Z Seeds

Running CMA-ES Search

Evaluating CMA-ES Results

🏁 Running Baselines

Dropout (Hsu et al., 2024)

Retraining

Training

Evaluation

Plotting the results

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
CMA		CMA
FiLM		FiLM
baselines		baselines
init		init
utils		utils
.gitignore		.gitignore
README.md		README.md
environment.yml		environment.yml
init.sh		init.sh
run_epsilon_CMA.py		run_epsilon_CMA.py

Folders and files

Latest commit

History

Repository files navigation

DIVERSE: Disagreement-Inducing Vector Evolution For Rashomon Set Exploration

🌟 Overview

✨ Key Ideas

🚀 Getting Started

Requirements

Setup

Training the Pretrained Models and Generating Z Seeds

Running CMA-ES Search

Evaluating CMA-ES Results

🏁 Running Baselines

Dropout (Hsu et al., 2024)

Retraining

Training

Evaluation

Plotting the results

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages