DIVERSE is a framework for systematically exploring the Rashomon set of neural networks — the collection of models that achieve similar accuracy to a reference model while differing in their predictive behavior.
The method augments pretrained networks with Feature-wise Linear Modulation (FiLM) layers and uses Covariance Matrix Adaptation Evolution Strategy (CMA-ES) to explore a latent modulation space, discovering diverse model variants without retraining or gradient access
- 🎛 FiLM-based modulation: Introduces lightweight FiLM layers into a frozen pretrained model, enabling controlled activation shifts via a latent vector.
- 🧬 CMA-ES optimization: Gradient-free evolutionary search over latent vectors, targeting disagreement while maintaining accuracy.
- 📊 Rigorous Rashomon protocol: Enforces Rashomon membership on a validation set and reports diversity only on a held-out test set.
- ⚡ Scalable exploration: Substantially reduces computational costs compared to retraining-based approaches.
System
- Tested on Ubuntu 22.04 LTS
- Requires a bash shell to run experiment scripts (
.sh) - Requires an NVIDIA GPU with CUDA support
(tested on GeForce RTX 4090 with CUDA 12.4) - Windows users may run the code inside WSL2 or a Linux container
(not officially tested)
Software
- Anaconda
- Python 3.11.7
conda env create -f environment.ymlActivate the new environment: conda activate diverse
First, make the provided bash script executable:
chmod +x init.shThen run it:
./init.shThis will:
- Train the reference (pretrained) models for each dataset.
- Generate initial latent vectors (Z seeds) used for CMA-ES exploration.
Before running, ensure the conda environment is activated:
conda activate diverseThis script performs an extensive hyperparameter sweep, which can take a long time and heavily use the GPU. Parallelism is controlled through subprocesses; to adjust the number of workers, edit utils/experiment_parameters.py
You have to repeat the following command for each epsilon (0.01, 0.02, 0.03, 0.04, 0.05) and each model type (mnist, resnet50_pneumonia, vgg16_cifar10)
python run_epsilon_CMA.py --model_type=<model_type> --epsilon=<epsilon>Once you have run all CMA-ES runs, you can evaluate each epsilon, z dimension (2, 4, 8, 16, 32 and 64) and dataset combination with the following:
python -m CMA.CMA_evaluation --model_type=<model_type> --epsilon=<epsilon> --z_dim=<z_dim>Before running any baselines, ensure the conda environment is activated:
conda activate diverseWe provide an implementation of the dropout-based Rashomon exploration method described in Hsu et al. (ICLR 2024).
For MNIST:
python -m baselines.dropout --method=gaussian --model=mnist --epsilon=0.05 --search_budget=167For ResNet50:
python -m baselines.dropout --method=gaussian --model=resnet --epsilon=0.05 --search_budget=167For VGG16:
python -m baselines.dropout --method=gaussian --model=vgg --epsilon=0.05 --search_budget=167Warning: Retraining is computationally expensive and may require significant time and GPU resources.You can adjust the number of models to train and evaluate via the --search_budget flags.
Supported values: 167, 320, 640, 1284, 2562, 5120.
For MNIST:
python -m baselines.retraining --model=mnist --start_seed=42 --search_budget=5120For ResNet50:
python -m baselines.retraining --model=resnet --start_seed=42 --search_budget=640For VGG16:
python -m baselines.retraining --model=vgg --start_seed=45 --search_budget=640After training, results can be evaluated on the test set.
Outputs will be stored in 5 folders (epsilon_0.01, epsilon_0.02, epsilon_0.03, epsilon_0.04, epsilon_0.05) in the following path:
baseline_evaluations/retraining/retraining_<model>/epsilon_<epsilon_value>/
For MNIST:
python -m baselines.retraining_evaluator --model=mnist --search_budget=5120For ResNet50:
python -m baselines.retraining_evaluator --model=resnet --search_budget=640For VGG16:
python -m baselines.retraining_evaluator --model=vgg --search_budget=640To plot the results, you will first have to run each CMA search and evaluation for every dataset and epsilon, and have also run and evaluate all baselines for each dataset on the same epsilons.
Once the results are available, generate the plots with:
python -m utils.plotter