This repository implements Proximal Policy Optimization (PPO) for CarRacing-v3 using Gymnasium + Stable-Baselines3 (SB3). The goal is to reach >900 average reward.
- Training entry:
python scripts/train.py --config configs/default.yaml - Evaluation:
python scripts/evaluate.py --config configs/default.yaml --checkpoint <path> - Video:
python scripts/record_video.py --config configs/default.yaml --checkpoint <path> - Sweep:
python scripts/sweep.py --sweep configs/sweep.yaml
CarRacing-v3 is a continuous-control Box2D environment with:
- Observation: RGB image (96×96×3) by default (optionally resized/grayscaled)
- Actions: 3D continuous vector
[steer, gas, brake] - Reward shaping that encourages staying on the track and making forward progress
See Gymnasium docs for full detail.
This implementation uses SB3’s PPO with a CNN policy (CnnPolicy) and the standard PPO clipped objective:
- Policy loss:
min(r_t * A_t, clip(r_t, 1-ε, 1+ε) * A_t) - Value loss: MSE between predicted value and return
- Entropy bonus encourages exploration
Advantages are computed using GAE(λ). Observations are preprocessed with common image RL wrappers (optional grayscale + resize, frame stacking, channel transpose).
pip install -r requirements.txtpython scripts/train.py --config configs/default.yamlOutputs go under experiments/<run>/:
models/(best_model.zip, final_model.zip, etc.)logs/(SB3 evaluation logs and TensorBoard files if enabled)plots/(evaluation curves)videos/(if recorded)
python scripts/evaluate.py \
--config configs/default.yaml \
--checkpoint experiments/<run>/models/best_model.zip \
--episodes 20python scripts/record_video.py \
--config configs/default.yaml \
--checkpoint experiments/<run>/models/best_model.zip \
--out_dir assets/videospython scripts/sweep.py --sweep configs/sweep.yamlThese settings were found to be stable and effective for CarRacing PPO:
| Parameter | Value |
|---|---|
| env_name | CarRacing-v3 |
| total_timesteps | 3_000_000 |
| learning_rate | 3e-4 |
| gamma | 0.99 |
| gae_lambda | 0.95 |
| clip_range | 0.2 |
| n_steps | 2048 |
| n_epochs | 10 |
| batch_size | 64 |
| ent_coef | 0.01 |
| vf_coef | 0.5 |
| max_grad_norm | 0.5 |
| frame_stack | 4 |
| grayscale | true |
| resize | 84 |
Policy rollout video (best model):
best_model_car_racing_ppo-step-0-to-step-20000.mp4
pytest -qMIT (See LICENSE).
