CarRacing PPO (Gymnasium + Stable-Baselines3)

This repository implements Proximal Policy Optimization (PPO) for CarRacing-v3 using Gymnasium + Stable-Baselines3 (SB3). The goal is to reach >900 average reward.

Training entry: python scripts/train.py --config configs/default.yaml
Evaluation: python scripts/evaluate.py --config configs/default.yaml --checkpoint <path>
Video: python scripts/record_video.py --config configs/default.yaml --checkpoint <path>
Sweep: python scripts/sweep.py --sweep configs/sweep.yaml

CarRacing Environment

CarRacing-v3 is a continuous-control Box2D environment with:

Observation: RGB image (96×96×3) by default (optionally resized/grayscaled)
Actions: 3D continuous vector [steer, gas, brake]
Reward shaping that encourages staying on the track and making forward progress

See Gymnasium docs for full detail.

PPO Notes

This implementation uses SB3’s PPO with a CNN policy (CnnPolicy) and the standard PPO clipped objective:

Policy loss: min(r_t * A_t, clip(r_t, 1-ε, 1+ε) * A_t)
Value loss: MSE between predicted value and return
Entropy bonus encourages exploration

Advantages are computed using GAE(λ). Observations are preprocessed with common image RL wrappers (optional grayscale + resize, frame stacking, channel transpose).

Installation

pip install -r requirements.txt

Quick Start

Train (single run)

python scripts/train.py --config configs/default.yaml

Outputs go under experiments/<run>/:

models/ (best_model.zip, final_model.zip, etc.)
logs/ (SB3 evaluation logs and TensorBoard files if enabled)
plots/ (evaluation curves)
videos/ (if recorded)

Evaluate a checkpoint

python scripts/evaluate.py \
  --config configs/default.yaml \
  --checkpoint experiments/<run>/models/best_model.zip \
  --episodes 20

Record a rollout video

python scripts/record_video.py \
  --config configs/default.yaml \
  --checkpoint experiments/<run>/models/best_model.zip \
  --out_dir assets/videos

Run a sweep

python scripts/sweep.py --sweep configs/sweep.yaml

Best Performing Model

These settings were found to be stable and effective for CarRacing PPO:

Parameter	Value
env_name	CarRacing-v3
total_timesteps	3_000_000
learning_rate	3e-4
gamma	0.99
gae_lambda	0.95
clip_range	0.2
n_steps	2048
n_epochs	10
batch_size	64
ent_coef	0.01
vf_coef	0.5
max_grad_norm	0.5
frame_stack	4
grayscale	true
resize	84

Best Model Outputs

Policy rollout video (best model):

best_model_car_racing_ppo-step-0-to-step-20000.mp4

Testing

pytest -q

License

MIT (See LICENSE).

Name		Name	Last commit message	Last commit date
Latest commit History 29 Commits
configs		configs
scripts		scripts
src/carracing_ppo		src/carracing_ppo
tests		tests
.gitignore		.gitignore
CITATION.cff		CITATION.cff
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CarRacing PPO (Gymnasium + Stable-Baselines3)

CarRacing Environment

PPO Notes

Installation

Quick Start

Train (single run)

Evaluate a checkpoint

Record a rollout video

Run a sweep

Best Performing Model

Best Model Outputs

Testing

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

CarRacing PPO (Gymnasium + Stable-Baselines3)

CarRacing Environment

PPO Notes

Installation

Quick Start

Train (single run)

Evaluate a checkpoint

Record a rollout video

Run a sweep

Best Performing Model

Best Model Outputs

Testing

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages