This repository contains code corresponding to the work titled: "Extending Iterated, Spatialized Prisoners’ Dilemma to Understand Multicellularity: game theory with self-scaling players"
The general idea is to run a simulation where independent RL agents play Iterated prisoners dilemma (IPD) games with one another. But with a twist: instead of their restricted ability to cooperate or defect, agents in-addition, have the option to merge or split. These operators allow agents the ability to forego their individuality in exchange for that of another, better performant individual, with their 2D spatial arrangement growing/shrinking accordingly.
Here's a visualization: (the colorbar is representative of agents' size).
The interactive demo might provide a better understanding of the matter.
In a virtualenv:
pip3 install -r metadata/requirements.txtA simulation of merge based IPD (termed ipd-ms) can be run as:
python3 ipd-ms.py --mode "fixed" --mem_len 4 -bs 20and,
A simulation of classic IPD (with cooperate and defect as possible actions) can be run as:
python3 ipd.py --mode "fixed" --mem_len 4 -bs 20These hyperparameters require some background on the design of our RL agents:
-
Each RL agent carries with it two data structures: a list of its memories (actions played so far), and a policy table (discrete map from memory states to actions).
-
The memory size (size of the list) can be pre-set to a fixed capacity (represented by the --mem_len hyperparameter).
-
Given that each simulation involves multiple agents (determined by the -bs hyperparmeter), the --mode hyperparameter provides a way to set the --mem_len hyperparameter of each agent.
Specifically, in the code snippet above:
-
The --mode option specifies if agents are to be uniformly set to a constant memory capacity (mode = fixed) or heterogenously set to sizes from a uniform random distribution (mode = range_memory ) whose upper-bound is decided by the --mem_len parameter (--mem_len = 4, set here).
-
The -bs option specifies the number of RL agents to initialize. Here we set it to a value of 20 (implying a 20x20 grid composed of 400 agents).
The file, hyperparams.py maintains a list of invariant hyperparameters values related to the q-learning algorithm, mutation rate etc.
The results we report focuses on the relationship between memory size (varied by the --mem_len option) and merge tendency (varied by enabling or disabling merge/split actions: by running either ipd-ms or ipd simulations)
running plot.sh replicates our result:
cd metadata
chmod +x plot.sh
./plot.shnote: plot.sh fetches cached simulation data (running them from scratch is time-consuming). A re-run of these simulations should yield ~the same results.
