When to ASK: Uncertainty-Gated Language Assistance for Reinforcement Learning

Official implementation of ASK (Adaptive Safety through Knowledge), an extrinsic method that improves out-of-distribution (OOD) generalization in reinforcement learning by selectively querying a Language Model (LM) based on uncertainty estimates, without retraining the RL policy.

ASK uses Monte Carlo Dropout to measure epistemic and aleatoric uncertainty at each step. When uncertainty exceeds a threshold τ, it queries a LM for an action recommendation. In in-domain scenarios, ASK preserves PPO baseline performance. Under downward generalization (trained on 8×8, tested on 4×4–7×7), 32B/72B models achieve up to 0.95 reward, where both PPO and LM alone fail completely.

Requirements

Python 3.11
uv

Installation

uv sync
source .venv/bin/activate

Setup: Generate Evaluation Maps

The FrozenLake evaluation and test maps are not included in the repository and must be generated before running any experiment. Each map size uses 100 fixed contexts for evaluation and 100 for testing (the paper uses 300 total, equally split into train/eval/test).

python scripts/generate_maps.py

This creates tmp/frozenlake{4..8}/eval/ and tmp/frozenlake{4..8}/test/ with 100 .npy maps each.

You can also generate them manually:

from awu.envs.frozen_lake import FrozenLake

for size in [4, 5, 6, 7, 8]:
    env = FrozenLake(id="FrozenLake-v1", size=size)
    env.create_structures(100, eval=True)   # -> tmp/frozenlake{size}/eval/
    env.create_structures(100, eval=False)  # -> tmp/frozenlake{size}/test/

Running Experiments

Full pipeline

bash scripts/run_all.sh

Runs setup, PPO training, SLM-only rollout, and gated rollout in sequence.

Individual steps

# 1. Train the PPO agent
bash scripts/run_rl.sh

# 2. Run SLM-only rollout
bash scripts/run_slm.sh

# 3. Run uncertainty-gated rollout (ASK: PPO + SLM)
bash scripts/run_gated.sh

Results are saved under runs/.

Evaluation

Evaluation scripts load the pre-trained PPO model from HuggingFace (NathanGavenski/ppo-FrozenLake-v1-8x8) and run it over the fixed evaluation maps.

python eval_ppo.py        # PPO-only
python eval_ppo_slm.py    # ASK: PPO + SLM gated
python eval_slm.py        # SLM-only

Configuration

File	Description
`configs/rl/ppo.yaml`	PPO training config (environment regime, timesteps)
`configs/slm/small.yaml`	SLM config — Qwen2.5-1.5B-Instruct
`configs/slm/medium.yaml`	SLM config — larger Qwen variant
`configs/sweeps/`	Sweep configs for threshold and model search

The regime field in configs/rl/ppo.yaml controls the experiment type:

experiment:
  regime: rl_only   # rl_only | slm_only | gated

Key hyperparameters from the paper:

PPO training: 2×10⁷ timesteps (StableBaselines3 defaults)
MC Dropout: N=100 forward passes, dropout rate 0.2
LMs: Qwen2.5 family (0.5B–72B), off-the-shelf from HuggingFace, no fine-tuning
Evaluation: 100 episodes per configuration

Project Structure

├── configs/          # YAML experiment configs
├── eval_ppo.py       # Evaluate PPO-only
├── eval_ppo_slm.py   # Evaluate ASK (PPO + SLM gated)
├── eval_slm.py       # Evaluate SLM-only
├── prompts/          # SLM prompt templates
├── scripts/
│   ├── generate_maps.py   # Generate FrozenLake maps (run once)
│   ├── run_all.sh         # Full pipeline
│   ├── run_rl.sh          # Train PPO
│   ├── run_slm.sh         # SLM-only rollout
│   ├── run_gated.sh       # ASK gated rollout
│   └── setup.sh           # Install dependencies
└── src/awu/
    ├── envs/              # FrozenLake environment
    ├── experiments/       # Training and rollout entry points
    ├── slm/               # SLM loading, prompting, and parsing
    ├── uncertainty/       # MC Dropout uncertainty estimation
    └── utils/             # Callbacks, seeding, I/O utilities

Acknowledgments

This work was partially supported by UK Research and Innovation [grant number EP/S023356/1], in the UKRI Centre for Doctoral Training in Safe and Trusted Artificial Intelligence (www.safeandtrustedai.org), and by the Kunumi Institute (https://www.kunuminst.org/), through individual grants awarded to the authors.

Citation

@inproceedings{7fc4d3b96a2c441d92209a877e111a5d,
  title     = "When to ASK: Uncertainty-Gated Language Assistance for Reinforcement Learning",
  author    = "Juarez Monteiro and Nathan Gavenski and Gianlucca Zuin and Adriano Veloso",
  booktitle = "Proceedings of the 2026 International Joint Conference on Neural Networks (IJCNN)",
  year      = "2026",
  month     = jun,
  note      = "Conference date: 21-06-2026 Through 26-06-2026",
  url       = "https://attend.ieee.org/wcci-2026/",
}

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
configs		configs
prompts		prompts
scripts		scripts
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
eval_ppo.py		eval_ppo.py
eval_ppo_slm.py		eval_ppo_slm.py
eval_slm.py		eval_slm.py
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

When to ASK: Uncertainty-Gated Language Assistance for Reinforcement Learning

Requirements

Installation

Setup: Generate Evaluation Maps

Running Experiments

Full pipeline

Individual steps

Evaluation

Configuration

Project Structure

Acknowledgments

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

When to ASK: Uncertainty-Gated Language Assistance for Reinforcement Learning

Requirements

Installation

Setup: Generate Evaluation Maps

Running Experiments

Full pipeline

Individual steps

Evaluation

Configuration

Project Structure

Acknowledgments

Citation

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages