We study cross-task knowledge reuse in deep reinforcement learning using three complementary paradigms:
- Transfer Learning: pretrain on a source task, then adapt to a target task
- Meta-Learning: learn an initialization that adapts quickly (few-shot) to a task
- Continual Learning: learn tasks sequentially while reducing catastrophic forgetting
Environments used:
- Snake (custom)
- PuckWorld (custom)
- Pong (Atari:
ALE/Pong-v5)
Plus a controlled sine-wave regression benchmark (for EWC + toy MAML sanity checks).
python -m venv .venv
source .venv/bin/activatepip install -r requirements.txtGymnasium Atari typically needs ROM install/acceptance.
Recommended:
pip install "gymnasium[atari,accept-rom-license]"Or AutoROM:
pip install "autorom[accept-rom-license]"
AutoROM --accept-licenseSanity check:
python -c "import gymnasium as gym; env=gym.make('ALE/Pong-v5'); env.reset(); print('Pong OK')"Train PPO on one environment:
python -m bridging_tasks.transfer.train --env pong --timesteps 1000000
python -m bridging_tasks.transfer.train --env snake --timesteps 500000
python -m bridging_tasks.transfer.train --env puckworld --timesteps 500000Outputs:
outputs/transfer/<env>/<timestamp>/- saved SB3 model (
.zip) - reward curve plot
- tensorboard logs
- saved SB3 model (
The original project report frames meta-learning as MAML-style adaptation. In practice, implementing full second-order MAML in RL is expensive and brittle, so this repo includes a first-order meta-learning loop (Reptile-style) that still learns an initialization that adapts quickly.
python -m bridging_tasks.meta.train --iterations 200 --k_shots 5Outputs:
outputs/meta/<timestamp>/(plots + init checkpoints)
Runs:
- sine transfer baselines (scratch / freeze / finetune)
- toy sine MAML
- EWC forgetting matrix
python -m bridging_tasks.continual.run --allOutputs:
outputs/continual/<timestamp>/
MIT (See LICENSE).