Skip to content

H0waB0utJ1/DO_RL

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

DO_RL — Dueling for Optimization

A personal project I built for training an autonomous driving agent in CARLA using Ray RLlib PPO.

The agent perceives the environment through a depth camera and LiDAR, plans along a route, and learns to steer and accelerate via trial-and-error. I use a custom multimodal Torch model that encodes each sensor modality independently before fusing them.


Project Structure

DO_RL/
├─ envs/
│   ├─ carla_env.py        # Gymnasium environment (main)
│   ├─ config.py           # Env / Sensor / Route / Reward config dataclasses
│   ├─ sensor.py           # Sensor creation, reading, preprocessing
│   ├─ route_manager.py    # Route planning, lateral/heading error, goal distance
│   └─ reward.py           # Reward term definitions
├─ rl_rllib/
│   ├─ models_torch.py     # Custom multimodal Torch model
│   ├─ ppo.py              # PPOConfig builder and training wrapper
│   ├─ register_envs.py    # RLlib environment registration
│   ├─ callbacks.py        # Training metrics callbacks
│   └─ ports.py            # Per-worker CARLA / TM port assignment
├─ scripts/
│   ├─ env_test.py         # Standalone env test (no RLlib)
│   ├─ train_rllib.py      # RLlib PPO training entry point
│   └─ run_rllib.py        # Load checkpoint and run visually
├─ runs/                   # Ray Tune / RLlib training output
└─ debug_out/              # Debug images exported during env test or rollout

Dependencies

This project runs locally on Windows with a CARLA server. Required:

  • Python 3.10+ (I run it on 3.12)
  • CARLA 0.9.16 + its Python API
  • ray[rllib]
  • torch
  • gymnasium
  • numpy

Optional:

  • opencv-python — real-time depth / LiDAR display
  • Pillow — fallback image saving when OpenCV is not available

Install:

pip install -r requirements.txt

Before You Start

1. Start CARLA

Launch the CARLA server before running any script.

Default ports:

  • CARLA server: 2000
  • Traffic Manager: 8000

With multiple workers, ports are allocated automatically:

  • worker 1: 2000 / 8000
  • worker 2: 2002 / 8002
  • worker 3: 2004 / 8004

See rl_rllib/ports.py for the logic.

2. Fix the hardcoded CARLA path

In envs/route_manager.py I have:

sys.path.append(r"D:\CARLA_0.9.16\PythonAPI\carla")

Change this to match your CARLA installation path, otherwise GlobalRoutePlanner will fail to import.

3. Check GPU and Ray resources

Default args in scripts/train_rllib.py:

  • --num-workers 4
  • --num-gpus 1.0

Reduce these if your machine has limited resources.


Observation, Action and Reward

Action Space

2D continuous vector:

  • action[0]: steer — range [-1, 1]
  • action[1]: accel — range [-1, 1]
    • accel >= 0 maps to throttle
    • accel < 0 maps to brake

Observation Space

A Dict with four modalities:

  • depth: (1, H, W), default (1, 84, 84), normalized to [0, 1]
  • lidar: (36,) — minimum distance per front sector
  • route: (3,)lateral_error, heading_error, dist_to_goal
  • state: (5,)speed, yaw_rate, prev_steer, prev_throttle, prev_brake

Reward

Defined in envs/reward.py. Default terms:

  • progress — reward for moving toward the goal
  • heading — penalty for heading error
  • lateral — penalty for lateral deviation
  • speed — reward for maintaining target speed
  • collision — strong penalty
  • stuck — penalty for being stuck
  • termination — end-of-episode term

Weights are configured in envs/config.py under RewardSection.reward_scales.


Quick Start

Useful flags:

  • --throttle fixed throttle value
  • --steer fixed steer value
  • --sleep pause between steps for visual inspection
  • --save-dir save debug images when OpenCV is unavailable

1 Train with RLlib PPO

python scripts/train_rllib.py ^
  --host 127.0.0.1 ^
  --base-port 2000 ^
  --base-tm-port 8000 ^
  --town Town05 ^
  --num-workers 2 ^
  --num-gpus 1 ^
  --use-custom-model ^
  --stop-timesteps 200000

Key flags:

  • --use-custom-model — enables the multimodal fusion model in rl_rllib/models_torch.py
  • --run-dir — output directory (default: runs/train_rllib/)
  • --exp-name — experiment name
  • --checkpoint-every — checkpoint interval (training iterations)
  • --keep-checkpoints — number of recent checkpoints to keep
  • --enable-eval — enable a separate evaluation environment

2 Run a checkpoint

python scripts/run_rllib.py ^
  --checkpoint runs\train_rllib\<exp>\<trial>\checkpoint_000026 ^
  --port 2000 ^
  --tm-port 8000 ^
  --town Town05 ^
  --steps 1200

This script restores the policy weights and runs the agent in CarlaEnv, with optional depth / LiDAR visualization.


Configuration

No separate YAML config. Everything is controlled through Python dataclasses and CLI arguments.

Environment — envs/config.py

  • EnvSection — host, port, map, timestep, max steps, traffic
  • SensorSection — depth camera and LiDAR dimensions, FOV, mount pose, sector params
  • RouteSection — sampling resolution, goal distance range, arrival threshold, max lateral deviation
  • RewardSection — reward term weights

RLlib training — rl_rllib/ppo.py

Controlled via PPOHParams:

  • number of workers and GPUs
  • learning rate, batch size, GAE, entropy, gradient clipping
  • whether to use the custom model
  • whether to enable evaluation

Model Architecture

When --use-custom-model is set, I use CarlaFusionTorchModel in rl_rllib/models_torch.py.

Each modality is encoded independently:

Modality Encoder Output dim
depth (1×84×84) _TinyCNN (3-layer CNN) 128
lidar (36,) MLP [128, 128] 128
route (3,) MLP [64, 64] 64
state (5,) MLP [64, 64] 64

All four encodings are concatenated (384-dim) and passed through a fusion MLP [256, 256] → 256-dim shared representation, which feeds both the policy head and value head.

The current implementation uses a compact _TinyCNN depth encoder plus separate MLP encoders for LiDAR, route, and vehicle state, then fuses all modality embeddings through a shared MLP before branching into policy and value heads.


Code Flow

Training pipeline:

  1. scripts/train_rllib.py — parse CLI args
  2. Build runtime info and PPO hyperparams
  3. rl_rllib/register_envs.py — register carla_env
  4. rl_rllib/ports.py — assign ports per worker
  5. envs/carla_env.py — create CARLA environment, return Gymnasium interface
  6. rl_rllib/ppo.py — build PPOConfig, launch Ray Tune
  7. rl_rllib/callbacks.py — aggregate training metrics from info

If the custom model is enabled, additionally:

  1. rl_rllib/models_torch.py — build CarlaFusionTorchModel
  2. Encode depth / lidar / route / state independently
  3. Fuse and output policy logits + value

Output Directories

runs/train_rllib/

Ray Tune output. Each trial contains:

  • params.json / params.pkl
  • progress.csv
  • result.json
  • events.out.tfevents.*
  • checkpoint_xxxxxx/

debug_out/

Depth and LiDAR images exported during env testing or policy rollout.


Known Limitations

  • CARLA PythonAPI path is hardcoded in route_manager.py
  • Multi-worker training requires all ports to be available locally
  • runs/ and debug_out/ contain experiment history — this repo is more of a working directory than a clean template

About

Research on Autonomous Driving Based on the Carla Simulation Environment

Topics

Resources

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages