Rationality Measurement and Theory for Reinforcement Learning Agents

This repository contains empirical verification of our rationality measures and theoretical analysis. More details are in the following paper:

Kejiang Qian, Amos Storkey, Fengxiang He. Rationality Measurement and Theory for Reinforcement Learning Agents. arXiv

Empirically Testable Hypotheses

Our theory leads to the following hypotheses.

H1: Benefits of regularisations: layer normalisation (LN), $\ell_2$ regularisation (L2), and weight normalisation (WN), can penalise hypothesis complexity.
H2: Benefits of domain randomisation: improves robustness of reinforcement learning algorithms against distribution shifts across environments.
H3: Deficits of environment shifts: larger environment shifts lead to worse rationality.

Project Structure

Rationality/
├── src/
│   ├── env/                # Customised Taxi & CliffWalking environments
│   │   ├── taxi.py
│   │   └── cliffwalking.py
│   ├── model/              # DQN implementation
│   │   └── DQN.py
│   ├── utils/              # Logger & helper functions
│   ├── regularisers.py     # Regularisation modules
│   └── runners.py          # Training / evaluation pipeline
│
├── experiment/             # Reproduction scripts
│   ├── exp1_*_reg.sh
│   ├── exp2_*_domain_rand.sh
│   └── exp3_*_env_level.sh
│
└── train.py                # Main entry

Installation

conda create -n rationality python=3.10
conda activate rationality

pip install torch gym numpy pandas matplotlib

Quick Start

Train DQN on Taxi

python train.py \
  --env taxi \
  --episodes 2000 \
  --regulariser ln

Train on CliffWalking with domain randomisation

python train.py \
  --env cliffwalking \
  --eps_train 0.3

Experiments Reproduction

All results are available at Google Drive.

Exp1 – Regularisation

bash experiment/exp1_taxi_reg.sh
bash experiment/exp1_cliff_reg.sh

Exp2 – Domain Randomisation

bash experiment/exp2_taxi_domain_rand.sh
bash experiment/exp2_cliff_domain_rand.sh

Exp3 – Environment Level

bash experiment/exp3_taxi_env_level.sh
bash experiment/exp3_cliff_env_level.sh

Results will be saved to:

logs/{env}/{experiment}/

Citation

If you use this code in your research, please cite:

@article{qian2025rationality,
  title={Rationality Measurement and Theory for Reinforcement Learning Agents},
  author={Qian, Kejiang and Storkey, Amos and He, Fengxiang},
  year={2025}
}

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
experiment		experiment
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
__init__.py		__init__.py
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Rationality Measurement and Theory for Reinforcement Learning Agents

Empirically Testable Hypotheses

Project Structure

Installation

Quick Start

Train DQN on Taxi

Train on CliffWalking with domain randomisation

Experiments Reproduction

Exp1 – Regularisation

Exp2 – Domain Randomisation

Exp3 – Environment Level

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Rationality Measurement and Theory for Reinforcement Learning Agents

Empirically Testable Hypotheses

Project Structure

Installation

Quick Start

Train DQN on Taxi

Train on CliffWalking with domain randomisation

Experiments Reproduction

Exp1 – Regularisation

Exp2 – Domain Randomisation

Exp3 – Environment Level

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages