DO_RL — Dueling for Optimization

A personal project I built for training an autonomous driving agent in CARLA using Ray RLlib PPO.

The agent perceives the environment through a depth camera and LiDAR, plans along a route, and learns to steer and accelerate via trial-and-error. I use a custom multimodal Torch model that encodes each sensor modality independently before fusing them.

Project Structure

DO_RL/
├─ envs/
│   ├─ carla_env.py        # Gymnasium environment (main)
│   ├─ config.py           # Env / Sensor / Route / Reward config dataclasses
│   ├─ sensor.py           # Sensor creation, reading, preprocessing
│   ├─ route_manager.py    # Route planning, lateral/heading error, goal distance
│   └─ reward.py           # Reward term definitions
├─ rl_rllib/
│   ├─ models_torch.py     # Custom multimodal Torch model
│   ├─ ppo.py              # PPOConfig builder and training wrapper
│   ├─ register_envs.py    # RLlib environment registration
│   ├─ callbacks.py        # Training metrics callbacks
│   └─ ports.py            # Per-worker CARLA / TM port assignment
├─ scripts/
│   ├─ env_test.py         # Standalone env test (no RLlib)
│   ├─ train_rllib.py      # RLlib PPO training entry point
│   └─ run_rllib.py        # Load checkpoint and run visually
├─ runs/                   # Ray Tune / RLlib training output
└─ debug_out/              # Debug images exported during env test or rollout

Dependencies

This project runs locally on Windows with a CARLA server. Required:

Python 3.10+ (I run it on 3.12)
CARLA 0.9.16 + its Python API
ray[rllib]
torch
gymnasium
numpy

Optional:

opencv-python — real-time depth / LiDAR display
Pillow — fallback image saving when OpenCV is not available

Install:

pip install -r requirements.txt

Before You Start

1. Start CARLA

Launch the CARLA server before running any script.

Default ports:

CARLA server: 2000
Traffic Manager: 8000

With multiple workers, ports are allocated automatically:

worker 1: 2000 / 8000
worker 2: 2002 / 8002
worker 3: 2004 / 8004

See rl_rllib/ports.py for the logic.

2. Fix the hardcoded CARLA path

In envs/route_manager.py I have:

sys.path.append(r"D:\CARLA_0.9.16\PythonAPI\carla")

Change this to match your CARLA installation path, otherwise GlobalRoutePlanner will fail to import.

3. Check GPU and Ray resources

Default args in scripts/train_rllib.py:

--num-workers 4
--num-gpus 1.0

Reduce these if your machine has limited resources.

Observation, Action and Reward

Action Space

2D continuous vector:

action[0]: steer — range [-1, 1]
action[1]: accel — range [-1, 1]
- accel >= 0 maps to throttle
- accel < 0 maps to brake

Observation Space

A Dict with four modalities:

depth: (1, H, W), default (1, 84, 84), normalized to [0, 1]
lidar: (36,) — minimum distance per front sector
route: (3,) — lateral_error, heading_error, dist_to_goal
state: (5,) — speed, yaw_rate, prev_steer, prev_throttle, prev_brake

Reward

Defined in envs/reward.py. Default terms:

progress — reward for moving toward the goal
heading — penalty for heading error
lateral — penalty for lateral deviation
speed — reward for maintaining target speed
collision — strong penalty
stuck — penalty for being stuck
termination — end-of-episode term

Weights are configured in envs/config.py under RewardSection.reward_scales.

Quick Start

Useful flags:

--throttle fixed throttle value
--steer fixed steer value
--sleep pause between steps for visual inspection
--save-dir save debug images when OpenCV is unavailable

1 Train with RLlib PPO

python scripts/train_rllib.py ^
  --host 127.0.0.1 ^
  --base-port 2000 ^
  --base-tm-port 8000 ^
  --town Town05 ^
  --num-workers 2 ^
  --num-gpus 1 ^
  --use-custom-model ^
  --stop-timesteps 200000

Key flags:

--use-custom-model — enables the multimodal fusion model in rl_rllib/models_torch.py
--run-dir — output directory (default: runs/train_rllib/)
--exp-name — experiment name
--checkpoint-every — checkpoint interval (training iterations)
--keep-checkpoints — number of recent checkpoints to keep
--enable-eval — enable a separate evaluation environment

2 Run a checkpoint

python scripts/run_rllib.py ^
  --checkpoint runs\train_rllib\<exp>\<trial>\checkpoint_000026 ^
  --port 2000 ^
  --tm-port 8000 ^
  --town Town05 ^
  --steps 1200

This script restores the policy weights and runs the agent in CarlaEnv, with optional depth / LiDAR visualization.

Configuration

No separate YAML config. Everything is controlled through Python dataclasses and CLI arguments.

Environment — `envs/config.py`

EnvSection — host, port, map, timestep, max steps, traffic
SensorSection — depth camera and LiDAR dimensions, FOV, mount pose, sector params
RouteSection — sampling resolution, goal distance range, arrival threshold, max lateral deviation
RewardSection — reward term weights

RLlib training — `rl_rllib/ppo.py`

Controlled via PPOHParams:

number of workers and GPUs
learning rate, batch size, GAE, entropy, gradient clipping
whether to use the custom model
whether to enable evaluation

Model Architecture

When --use-custom-model is set, I use CarlaFusionTorchModel in rl_rllib/models_torch.py.

Each modality is encoded independently:

Modality	Encoder	Output dim
`depth` (1×84×84)	`_TinyCNN` (3-layer CNN)	128
`lidar` (36,)	MLP [128, 128]	128
`route` (3,)	MLP [64, 64]	64
`state` (5,)	MLP [64, 64]	64

All four encodings are concatenated (384-dim) and passed through a fusion MLP [256, 256] → 256-dim shared representation, which feeds both the policy head and value head.

The current implementation uses a compact _TinyCNN depth encoder plus separate MLP encoders for LiDAR, route, and vehicle state, then fuses all modality embeddings through a shared MLP before branching into policy and value heads.

Code Flow

Training pipeline:

scripts/train_rllib.py — parse CLI args
Build runtime info and PPO hyperparams
rl_rllib/register_envs.py — register carla_env
rl_rllib/ports.py — assign ports per worker
envs/carla_env.py — create CARLA environment, return Gymnasium interface
rl_rllib/ppo.py — build PPOConfig, launch Ray Tune
rl_rllib/callbacks.py — aggregate training metrics from info

If the custom model is enabled, additionally:

rl_rllib/models_torch.py — build CarlaFusionTorchModel
Encode depth / lidar / route / state independently
Fuse and output policy logits + value

Output Directories

`runs/train_rllib/`

Ray Tune output. Each trial contains:

params.json / params.pkl
progress.csv
result.json
events.out.tfevents.*
checkpoint_xxxxxx/

`debug_out/`

Depth and LiDAR images exported during env testing or policy rollout.

Known Limitations

CARLA PythonAPI path is hardcoded in route_manager.py
Multi-worker training requires all ports to be available locally
runs/ and debug_out/ contain experiment history — this repo is more of a working directory than a clean template

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
envs		envs
rl_rllib		rl_rllib
scripts		scripts
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DO_RL — Dueling for Optimization

Project Structure

Dependencies

Before You Start

1. Start CARLA

2. Fix the hardcoded CARLA path

3. Check GPU and Ray resources

Observation, Action and Reward

Action Space

Observation Space

Reward

Quick Start

1 Train with RLlib PPO

2 Run a checkpoint

Configuration

Environment — `envs/config.py`

RLlib training — `rl_rllib/ppo.py`

Model Architecture

Code Flow

Output Directories

`runs/train_rllib/`

`debug_out/`

Known Limitations

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

DO_RL — Dueling for Optimization

Project Structure

Dependencies

Before You Start

1. Start CARLA

2. Fix the hardcoded CARLA path

3. Check GPU and Ray resources

Observation, Action and Reward

Action Space

Observation Space

Reward

Quick Start

1 Train with RLlib PPO

2 Run a checkpoint

Configuration

Environment — envs/config.py

RLlib training — rl_rllib/ppo.py

Model Architecture

Code Flow

Output Directories

runs/train_rllib/

debug_out/

Known Limitations

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Environment — `envs/config.py`

RLlib training — `rl_rllib/ppo.py`

`runs/train_rllib/`

`debug_out/`

Packages