A personal project I built for training an autonomous driving agent in CARLA using Ray RLlib PPO.
The agent perceives the environment through a depth camera and LiDAR, plans along a route, and learns to steer and accelerate via trial-and-error. I use a custom multimodal Torch model that encodes each sensor modality independently before fusing them.
DO_RL/
├─ envs/
│ ├─ carla_env.py # Gymnasium environment (main)
│ ├─ config.py # Env / Sensor / Route / Reward config dataclasses
│ ├─ sensor.py # Sensor creation, reading, preprocessing
│ ├─ route_manager.py # Route planning, lateral/heading error, goal distance
│ └─ reward.py # Reward term definitions
├─ rl_rllib/
│ ├─ models_torch.py # Custom multimodal Torch model
│ ├─ ppo.py # PPOConfig builder and training wrapper
│ ├─ register_envs.py # RLlib environment registration
│ ├─ callbacks.py # Training metrics callbacks
│ └─ ports.py # Per-worker CARLA / TM port assignment
├─ scripts/
│ ├─ env_test.py # Standalone env test (no RLlib)
│ ├─ train_rllib.py # RLlib PPO training entry point
│ └─ run_rllib.py # Load checkpoint and run visually
├─ runs/ # Ray Tune / RLlib training output
└─ debug_out/ # Debug images exported during env test or rollout
This project runs locally on Windows with a CARLA server. Required:
- Python 3.10+ (I run it on 3.12)
- CARLA 0.9.16 + its Python API
ray[rllib]torchgymnasiumnumpy
Optional:
opencv-python— real-time depth / LiDAR displayPillow— fallback image saving when OpenCV is not available
Install:
pip install -r requirements.txtLaunch the CARLA server before running any script.
Default ports:
- CARLA server:
2000 - Traffic Manager:
8000
With multiple workers, ports are allocated automatically:
- worker 1:
2000 / 8000 - worker 2:
2002 / 8002 - worker 3:
2004 / 8004
See rl_rllib/ports.py for the logic.
In envs/route_manager.py I have:
sys.path.append(r"D:\CARLA_0.9.16\PythonAPI\carla")Change this to match your CARLA installation path, otherwise GlobalRoutePlanner will fail to import.
Default args in scripts/train_rllib.py:
--num-workers 4--num-gpus 1.0
Reduce these if your machine has limited resources.
2D continuous vector:
action[0]:steer— range[-1, 1]action[1]:accel— range[-1, 1]accel >= 0maps tothrottleaccel < 0maps tobrake
A Dict with four modalities:
depth:(1, H, W), default(1, 84, 84), normalized to[0, 1]lidar:(36,)— minimum distance per front sectorroute:(3,)—lateral_error,heading_error,dist_to_goalstate:(5,)—speed,yaw_rate,prev_steer,prev_throttle,prev_brake
Defined in envs/reward.py. Default terms:
progress— reward for moving toward the goalheading— penalty for heading errorlateral— penalty for lateral deviationspeed— reward for maintaining target speedcollision— strong penaltystuck— penalty for being stucktermination— end-of-episode term
Weights are configured in envs/config.py under RewardSection.reward_scales.
Useful flags:
--throttlefixed throttle value--steerfixed steer value--sleeppause between steps for visual inspection--save-dirsave debug images when OpenCV is unavailable
python scripts/train_rllib.py ^
--host 127.0.0.1 ^
--base-port 2000 ^
--base-tm-port 8000 ^
--town Town05 ^
--num-workers 2 ^
--num-gpus 1 ^
--use-custom-model ^
--stop-timesteps 200000Key flags:
--use-custom-model— enables the multimodal fusion model inrl_rllib/models_torch.py--run-dir— output directory (default:runs/train_rllib/)--exp-name— experiment name--checkpoint-every— checkpoint interval (training iterations)--keep-checkpoints— number of recent checkpoints to keep--enable-eval— enable a separate evaluation environment
python scripts/run_rllib.py ^
--checkpoint runs\train_rllib\<exp>\<trial>\checkpoint_000026 ^
--port 2000 ^
--tm-port 8000 ^
--town Town05 ^
--steps 1200This script restores the policy weights and runs the agent in CarlaEnv, with optional depth / LiDAR visualization.
No separate YAML config. Everything is controlled through Python dataclasses and CLI arguments.
EnvSection— host, port, map, timestep, max steps, trafficSensorSection— depth camera and LiDAR dimensions, FOV, mount pose, sector paramsRouteSection— sampling resolution, goal distance range, arrival threshold, max lateral deviationRewardSection— reward term weights
Controlled via PPOHParams:
- number of workers and GPUs
- learning rate, batch size, GAE, entropy, gradient clipping
- whether to use the custom model
- whether to enable evaluation
When --use-custom-model is set, I use CarlaFusionTorchModel in rl_rllib/models_torch.py.
Each modality is encoded independently:
| Modality | Encoder | Output dim |
|---|---|---|
depth (1×84×84) |
_TinyCNN (3-layer CNN) |
128 |
lidar (36,) |
MLP [128, 128] | 128 |
route (3,) |
MLP [64, 64] | 64 |
state (5,) |
MLP [64, 64] | 64 |
All four encodings are concatenated (384-dim) and passed through a fusion MLP [256, 256] → 256-dim shared representation, which feeds both the policy head and value head.
The current implementation uses a compact _TinyCNN depth encoder plus separate MLP encoders for LiDAR, route, and vehicle state, then fuses all modality embeddings through a shared MLP before branching into policy and value heads.
Training pipeline:
scripts/train_rllib.py— parse CLI args- Build runtime info and PPO hyperparams
rl_rllib/register_envs.py— registercarla_envrl_rllib/ports.py— assign ports per workerenvs/carla_env.py— create CARLA environment, return Gymnasium interfacerl_rllib/ppo.py— buildPPOConfig, launch Ray Tunerl_rllib/callbacks.py— aggregate training metrics frominfo
If the custom model is enabled, additionally:
rl_rllib/models_torch.py— buildCarlaFusionTorchModel- Encode
depth / lidar / route / stateindependently - Fuse and output policy logits + value
Ray Tune output. Each trial contains:
params.json/params.pklprogress.csvresult.jsonevents.out.tfevents.*checkpoint_xxxxxx/
Depth and LiDAR images exported during env testing or policy rollout.
- CARLA PythonAPI path is hardcoded in
route_manager.py - Multi-worker training requires all ports to be available locally
runs/anddebug_out/contain experiment history — this repo is more of a working directory than a clean template