Skip to content

ak811/carracing-ppo

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

29 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CarRacing PPO (Gymnasium + Stable-Baselines3)

This repository implements Proximal Policy Optimization (PPO) for CarRacing-v3 using Gymnasium + Stable-Baselines3 (SB3). The goal is to reach >900 average reward.

  • Training entry: python scripts/train.py --config configs/default.yaml
  • Evaluation: python scripts/evaluate.py --config configs/default.yaml --checkpoint <path>
  • Video: python scripts/record_video.py --config configs/default.yaml --checkpoint <path>
  • Sweep: python scripts/sweep.py --sweep configs/sweep.yaml

Car Racing


CarRacing Environment

CarRacing-v3 is a continuous-control Box2D environment with:

  • Observation: RGB image (96×96×3) by default (optionally resized/grayscaled)
  • Actions: 3D continuous vector [steer, gas, brake]
  • Reward shaping that encourages staying on the track and making forward progress

See Gymnasium docs for full detail.


PPO Notes

This implementation uses SB3’s PPO with a CNN policy (CnnPolicy) and the standard PPO clipped objective:

  • Policy loss: min(r_t * A_t, clip(r_t, 1-ε, 1+ε) * A_t)
  • Value loss: MSE between predicted value and return
  • Entropy bonus encourages exploration

Advantages are computed using GAE(λ). Observations are preprocessed with common image RL wrappers (optional grayscale + resize, frame stacking, channel transpose).


Installation

pip install -r requirements.txt

Quick Start

Train (single run)

python scripts/train.py --config configs/default.yaml

Outputs go under experiments/<run>/:

  • models/ (best_model.zip, final_model.zip, etc.)
  • logs/ (SB3 evaluation logs and TensorBoard files if enabled)
  • plots/ (evaluation curves)
  • videos/ (if recorded)

Evaluate a checkpoint

python scripts/evaluate.py \
  --config configs/default.yaml \
  --checkpoint experiments/<run>/models/best_model.zip \
  --episodes 20

Record a rollout video

python scripts/record_video.py \
  --config configs/default.yaml \
  --checkpoint experiments/<run>/models/best_model.zip \
  --out_dir assets/videos

Run a sweep

python scripts/sweep.py --sweep configs/sweep.yaml

Best Performing Model

These settings were found to be stable and effective for CarRacing PPO:

Parameter Value
env_name CarRacing-v3
total_timesteps 3_000_000
learning_rate 3e-4
gamma 0.99
gae_lambda 0.95
clip_range 0.2
n_steps 2048
n_epochs 10
batch_size 64
ent_coef 0.01
vf_coef 0.5
max_grad_norm 0.5
frame_stack 4
grayscale true
resize 84

Best Model Outputs

Policy rollout video (best model):

best_model_car_racing_ppo-step-0-to-step-20000.mp4

Testing

pytest -q

License

MIT (See LICENSE).

About

CarRacing PPO (Gymnasium + Stable-Baselines3)

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages