Website · Docs · Dataset · Model · NAVSIM Model · Supplementary · Paper
An open-source end-to-end driving stack for CARLA.
▶ State-of-the-art performance on all major Leaderboard 2.0 benchmarks ◀
Long Nguyen1,3 ·
Micha Fauth1 ·
Bernhard Jaeger1,3 ·
Daniel Dauner1,3
Maximilian Igl2 ·
Andreas Geiger1,3 ·
Kashyap Chitta2
1University of Tübingen, Tübingen AI Center · 2NVIDIA Research · 3KE:SAI
- Table of Contents
- Updates
- 1. Quick Start
- 2. CARLA Research Cycle
- 3. Extensions
- 4. Project Structure
- 5. Common Issues
- Beyond CARLA: Cross-Benchmark Deployment
- Further Documentation
- Acknowledgements
- Citation
| Date | Content |
|---|---|
| 26.04.14 | WARNING: The parameter transfuser_token_dim's default value should be 256, as used in the paper, and not 64. |
| 26.04.13 | WARNING: In some rare cases, we notice training instability. See instructions, if you face similar problem. |
| 26.04.11 | Added Fail2Drive benchmark support, see instructions. |
| 26.03.21 | Added evaluation support for the reinforcement-learning planner CaRL, see instructions. |
| 26.03.18 | Deactivated Kalman Filter and all post-processing heuristics. See performance report here. |
| 26.02.25 | LEAD is accepted to CVPR 2026 🎉 |
| 26.02.25 | NAVSIM extension released. Code and instructions available. Supplementary data coming soon. |
| 26.02.02 | Preliminary support for 123D. See instructions. |
| 26.01.13 | CARLA dataset and training documentation released. |
| 25.12.24 | Initial release — paper, checkpoints, expert driver, and inference code. |
Get LEAD running locally: from cloning the repo and installing dependencies, to downloading a pretrained checkpoint and driving a CARLA Leaderboard 2.0 route end-to-end. We tested the instructions on the following configurations:
| OS | GPU | CUDA | Driver | Inference | Training |
|---|---|---|---|---|---|
| Ubuntu 22.04 | L40S | 13.0 | 580 | ✅ | ✅ |
| Ubuntu 22.04 | A100 | 13.0 | 580 | ❌ | ✅ |
| Ubuntu 24.04 | RTX 5090 | 13.1 | 590 | ✅ | ❌ |
| Ubuntu 22.04 | RTX A4000 | 13.0 | 580 | ✅ | ❌ |
| Ubuntu 22.04 | GTX 1080ti | 13.0 | 580 | ✅ | ❌ |
Clone the repository and register the project root:
git clone https://github.com/kesai-labs/lead.git
cd lead
# Set project root variable
echo -e "export LEAD_PROJECT_ROOT=$(pwd)" >> ~/.bashrc
# Activate project's hook
echo "source $(pwd)/scripts/main.sh" >> ~/.bashrc
# Reload shell config
source ~/.bashrcVerify that ~/.bashrc reflects these paths correctly.
We use Miniconda as container and uv for Python dependencies. Runtime and dev dependencies are declared entirely in pyproject.toml.
# (Optional, needed in some cases) Accept terms and services
conda tos accept --override-channels --channel \
https://repo.anaconda.com/pkgs/main
conda tos accept --override-channels --channel \
https://repo.anaconda.com/pkgs/r
# Create a conda environment
conda create -n lead python=3.10 -y
conda activate lead
# Install system-level tools
conda install -c conda-forge \
ffmpeg parallel tree gcc zip unzip git-lfs uv
# Tell uv to use conda environment
mkdir -p $CONDA_PREFIX/etc/conda/activate.d \
$CONDA_PREFIX/etc/conda/deactivate.d
echo 'export VIRTUAL_ENV=$CONDA_PREFIX' > \
$CONDA_PREFIX/etc/conda/activate.d/uv.sh
echo 'unset VIRTUAL_ENV' > \
$CONDA_PREFIX/etc/conda/deactivate.d/uv.sh
# Reactivate conda environment
conda activate lead
# Install dependencies
uv sync --active --extra dev
# Optional: activate git hooks
pre-commit installSet up CARLA:
# Download and setup CARLA at 3rd_party/CARLA_0915
bash scripts/setup_carla.sh
# Or symlink your pre-installed CARLA
ln -s /your/carla/path 3rd_party/CARLA_0915Tip
uv cheatsheet:
# Add dependency
uv add --active <pkg>
# Add dev dependency
uv add --active --optional dev <pkg>
# Update uv.lock
uv lockPre-trained checkpoints are hosted on HuggingFace. Following are the results from the paper. To reproduce these results, enable the Kalman filter, stop-sign, and creeping heuristics by setting sensor_agent_creeping=True use_kalman_filter=True slower_for_stop_sign=True in config_closed_loop.
| Variant | Bench2Drive | Longest6 v2 | Town13 | Checkpoint |
|---|---|---|---|---|
| Full TransFuser V6 | 95 | 62 | 5.24 | Link |
| ResNet34 (60M params) | 94 | 57 | 5.01 | Link |
| + Rear camera | 95 | 53 | TBD | Link |
| − Radar | 94 | 52 | TBD | Link |
| Vision only | 91 | 43 | TBD | Link |
| Town13 held out | 93 | 52 | 3.52 | Link |
Without these heuristics, the performance changes as follows, with the biggest boost observed on Town13.
| Variant | Bench2Drive | Longest6 v2 | Town13 |
|---|---|---|---|
| Full E2E TransFuser V6 | 95 → 94 | 62 → 62 | 5 → 10 |
Download the checkpoints:
# Download one checkpoint for testing
bash scripts/download_one_checkpoint.sh
# Download all checkpoints
git clone https://huggingface.co/ln2697/tfv6 outputs/checkpoints
cd outputs/checkpoints
git lfs pullVerify your setup with a single route:
# Start driving environment
bash scripts/start_carla.sh
# Turn on images and videos output
export LEAD_CLOSED_LOOP_CONFIG="produce_demo_image=true \
produce_demo_video=true \
produce_debug_image=true \
produce_debug_video=true \
produce_input_image=true \
produce_input_video"
# Run policy on one route
python lead/leaderboard_wrapper.py \
--checkpoint outputs/checkpoints/tfv6_resnet34 \
--routes data/benchmark_routes/bench2drive/23687.xml \
--bench2driveDriving logs are saved to outputs/local_evaluation/<route_id>/:
| Output | Description |
|---|---|
*_debug.mp4 |
Debug visualization video |
*_demo.mp4 |
Demo video |
*_grid.mp4 |
Grid visualization video |
*_input.mp4 |
Raw input video |
alpasim_metric_log.json |
AlpaSim metric log |
checkpoint_endpoint.json |
Checkpoint endpoint metadata |
infractions.json |
Detected infractions |
metric_info.json |
Evaluation metrics |
debug_images/ |
Per-frame debug visualizations |
demo_images/ |
Per-frame demo images |
grid_images/ |
Per-frame grid visualizations |
input_images/ |
Per-frame raw inputs |
input_log/ |
Input log data |
Tip
If run into OOM issue, there are few options:
- Run those commands block the later 2 seeds from being loaded into memory for ensembling:
mv outputs/checkpoints/tfv6_resnet34/model_0030_1.pth \
outputs/checkpoints/tfv6_resnet34/_model_0030_1.pth
mv outputs/checkpoints/tfv6_resnet34/model_0030_2.pth \
outputs/checkpoints/tfv6_resnet34/_model_0030_2.pth- For local computer, start CARLA with
-quality-level=Poor. This reduces the rendering quality, however will introduce distribution shift and should not be used for official evaluation.
Launch the interactive infraction dashboard to analyze driving failures — especially useful for Longest6 or Town13 where iterating over evaluation logs is time-consuming:
python lead/infraction_webapp/app.pyNavigate to http://localhost:5000 and point it at outputs/local_evaluation.
Tip
The app supports browser bookmarking to jump directly to a specific timestamp.
The primary focus of this repository is solving the original CARLA Leaderboard 2.0. This section walks through the full research loop — collecting expert demonstrations, training a TFv6 policy, and benchmarking it closed-loop.
Before collecting, you may want to enlarge or diversify the route set under data/data_routes/. This step is optional — the shipped routes are enough to reproduce the paper results. However, if you want to improve the performance of the model, in particular for Longest6 v2 or Fail2Drive, introduce more routes is the easiest way to achieve this.
Two ways to do it:
- Automatic — sample routes programmatically from a CARLA town (e.g. random start/goal pairs with scenario annotations). Useful when you want large-scale coverage without hand-authoring.
- Manual — use the bundled carla_route_generator, a GUI tool for clicking waypoints on a map and exporting routes. Launch it via the hotkey script:
cd 3rd_party/carla_route_generator
python3 scripts/window.pyGenerated XML files can be dropped directly into data/data_routes/ and picked up by the expert during data collection.
Tip
Out of the box, carla_route_generator is purely mouse-driven (left-click to add/remove waypoints, right-click for scenarios, wheel to pan/zoom). Annotating hundreds of routes this way is slow. There are few tricks to accelerate the process:
- We strongly recommend extending scripts/window.py with Qt
QShortcut/keyPressEventbindings for the actions you repeat most — e.g. add new route. Even one or two of keys cuts manual annotation time substantially. - Only annotate route manually, add scenarios automatically via Python.
With CARLA running, collect data for a single route via Python (recommended for debugging):
python lead/leaderboard_wrapper.py \
--expert \
--routes data/data_routes/lead/noScenarios/short_route.xmlOr via bash (recommended for flexibility):
bash scripts/eval_expert.shCollected data is saved to outputs/expert_evaluation/ with the following structure:
| Directory | Content |
|---|---|
bboxes/ |
3D bounding boxes per frame |
depth/ |
Compressed and quantized depth maps |
depth_perturbated/ |
Depth from perturbated ego state |
hdmap/ |
Ego-centric rasterized HD map |
hdmap_perturbated/ |
HD map aligned to perturbated ego pose |
lidar/ |
LiDAR point clouds |
metas/ |
Per-frame metadata and ego state |
radar/ |
Radar detections |
radar_perturbated/ |
Radar from perturbated ego state |
rgb/ |
RGB images |
rgb_perturbated/ |
RGB from perturbated ego state |
semantics/ |
Semantic segmentation maps |
semantics_perturbated/ |
Semantics from perturbated ego state |
results.json |
Route-level summary and evaluation metadata |
On a SLURM Cluster of 92 GTX 1080ti, the data collection is often finished after 2 days.
Tip
- To configure camera/lidar/radar calibration, see config_base.py and config_expert.py.
- For large-scale collection on SLURM, see the data collection docs.
- The Jupyter notebooks provide visualization examples.
Download the dataset from HuggingFace:
# Download all routes
git clone https://huggingface.co/datasets/ln2697/lead_carla data/carla_leaderboard2/zip
cd data/carla_leaderboard2/zip
git lfs pull
# Or download a single route for testing
bash scripts/download_one_route.sh
# Unzip the routes
bash scripts/unzip_routes.sh
# Build data cache
python scripts/build_cache.pyPerception pretraining. Logs and checkpoints are saved to outputs/local_training/pretrain:
# Single GPU
python3 lead/training/train.py \
logdir=outputs/local_training/pretrain
# Distributed Data Parallel
bash scripts/pretrain_ddp.shPlanning post-training. Logs and checkpoints are saved to outputs/local_training/posttrain:
# Single GPU
python3 lead/training/train.py \
logdir=outputs/local_training/posttrain \
load_file=outputs/local_training/pretrain/model_0030.pth \
use_planning_decoder=true
# Distributed Data Parallel
bash scripts/posttrain_ddp.shTip
- For distributed training on SLURM, see the SLURM training docs.
- For a complete workflow (pretrain → posttrain → eval), see this example.
- For detailed documentation, see the training guide.
With CARLA running, evaluate on any benchmark via Python:
python lead/leaderboard_wrapper.py \
--checkpoint outputs/checkpoints/tfv6_resnet34 \
--routes <ROUTE_FILE> \
[--bench2drive]| Benchmark | Route file | Extra flag |
|---|---|---|
| Bench2Drive | data/benchmark_routes/bench2drive/23687.xml |
--bench2drive |
| Longest6 v2 | data/benchmark_routes/longest6/00.xml |
— |
| Town13 | data/benchmark_routes/Town13/0.xml |
— |
| Fail2Drive | data/benchmark_routes/fail2drive/Base_Animals_0075.xml |
--fail2drive |
Or via bash:
bash scripts/eval_bench2drive.sh # Bench2Drive
bash scripts/eval_longest6.sh # Longest6 v2
bash scripts/eval_town13.sh # Town13
bash scripts/eval_fail2drive.sh # Fail2Drive (requires CARLA_F2D)Results are saved to outputs/local_evaluation/ with videos, infractions, and metrics.
Tip
- See the evaluation docs for details.
- For distributed evaluation, see the SLURM evaluation docs.
- Our SLURM wrapper supports WandB for reproducible benchmarking.
Beyond the core Leaderboard 2.0 workflow, LEAD also supports additional benchmarks (Fail2Drive, NAVSIM), an alternative RL policy (CaRL), and an alternative data format (123D). Each extension plugs into the code base with minimal changes.
Fail2Drive is a CARLA Leaderboard 2 benchmark for testing closed-loop generalization on unseen long-tail scenarios.
Setup. Download the Fail2Drive simulator (custom CARLA build with novel assets):
mkdir -p 3rd_party/CARLA_F2D
curl -L https://hf.co/datasets/SimonGer/Fail2Drive/resolve/main/fail2drive_simulator.tar.gz \
| tar -xz -C 3rd_party/CARLA_F2DEvaluate. With CARLA_F2D running, evaluate on a single route:
# Start the Fail2Drive CARLA simulator
bash 3rd_party/CARLA_F2D/CarlaUE4.sh
# Evaluate model on one route
LEAD_CLOSED_LOOP_CONFIG="sensor_agent_creeping=True \
use_kalman_filter=True \
slower_for_stop_sign=True" \
python lead/leaderboard_wrapper.py \
--checkpoint outputs/checkpoints/tfv6_regnety \
--routes data/benchmark_routes/fail2drive/Base_Animals_0075.xml \
--fail2driveTip
For SLURM evaluation, use evaluate_fail2drive from slurm/init.sh. See existing experiment scripts for usage patterns.
Results.
| Method | Bench2Drive | Fail2Drive In-Distribution | Fail2Drive Generalization | ||||
|---|---|---|---|---|---|---|---|
| DS ↑ | DS ↑ | SR(%) ↑ | HM ↑ | DS ↑ | SR(%) ↑ | HM ↑ | |
| TCP | 59.9 | 24.7 | 39.1 | 30.3 | 24.5 (-0.8%) | 31.4 (-19.7%) | 27.5 (-9.1%) |
| UniAD | 45.8 | 47.5 | 36.3 | 41.2 | 44.0 (-7.4%) | 27.6 (-24.0%) | 33.9 (-17.6%) |
| Orion | 77.8 | 53.0 | 52.0 | 52.5 | 51.2 (-3.4%) | 46.0 (-11.5%) | 48.5 (-7.7%) |
| HiP-AD | 86.8 | 74.1 | 70.7 | 72.4 | 67.1 (-9.4%) | 56.7 (-19.8%) | 61.5 (-15.1%) |
| SimLingo | 85.1 | 82.6 | 79.3 | 80.9 | 71.7 (-13.2%) | 55.0 (-30.6%) | 62.2 (-23.1%) |
| TFv5 | 84.2 | 83.3 | 78.5 | 80.8 | 75.4 (-9.5%) | 61.1 (-22.2%) | 67.5 (-16.5%) |
| TFv6 (Ours) | 95.2 | 90.2 | 93.3 | 91.7 | 79.5 (-11.9%) | 70.7 (-24.2%) | 74.8 (-18.4%) |
With CARLA running, evaluate the CaRL agent via Python:
CUBLAS_WORKSPACE_CONFIG=:4096:8 \
python lead/leaderboard_wrapper.py \
--checkpoint outputs/checkpoints/CaRL \
--routes data/benchmark_routes/bench2drive/24240.xml \
--carl-agent \
--bench2drive \
--timeout 900Or via bash:
bash scripts/eval_carl.shThe results are in outputs/local_evaluation/<route_id>/.
Tip
- With small code changes, you can also integrate CaRL into LEAD's expert-driving pipeline as a hybrid expert policy.
- For large scale evaluation on SLURM, see this directory.
Setup. Install navtrain and navtest splits following navsimv1.1/docs/install.md, then install the navhard split following navsimv2.2/docs/install.md.
Training. Run perception pretraining (script) followed by planning post-training (script). We use one seed for pretraining and three seeds for post-training to estimate performance variance.
Evaluation. Run evaluation on navtest and navhard.
With CARLA running, collect data in 123D format via Python:
export LEAD_EXPERT_CONFIG="target_dataset=6 \
py123d_data_format=true \
use_radars=false \
lidar_stack_size=2 \
save_only_non_ground_lidar=false \
save_lidar_only_inside_bev=false"
python -u $LEAD_PROJECT_ROOT/lead/leaderboard_wrapper.py \
--expert \
--py123d \
--routes data/data_routes/50x38_Town12/ParkingCrossingPedestrian/3250_1.xmlOr via bash:
bash scripts/eval_expert_123d.shOutput in 123D format is saved to data/carla_leaderboard2_py123d/:
| Directory | Content |
|---|---|
logs/train/*.arrow |
Per-route driving logs in Arrow format |
logs/train/*.json |
Per-route metadata |
maps/carla/*.arrow |
Map data in Arrow format |
To visualize collected scenes in 3D with Viser:
python scripts/123d_viser.pyTip
This feature is experimental. Change PY123D_DATA_ROOT in scripts/main.sh to set the output directory.
The project is organized into the following top-level directories. See the full documentation for a detailed breakdown.
| Directory | Purpose |
|---|---|
lead/ |
Main package — model architecture, training, inference, expert driver |
3rd_party/ |
Third-party dependencies (CARLA, benchmarks, evaluation tools) |
data/ |
Route definitions. Sensor data will be stored here, too. |
scripts/ |
Utility scripts for data processing, training, and evaluation |
outputs/ |
Checkpoints, evaluation results, and visualizations |
notebooks/ |
Jupyter notebooks for data inspection and analysis |
slurm/ |
SLURM job scripts for large-scale experiments |
| Issue | Fix |
|---|---|
| Stale or corrupted data errors | Delete and rebuild the training cache / buckets |
| Simulator hangs or is unresponsive | Restart the CARLA simulator |
| Route or evaluation failures | Restart the leaderboard |
| Training instability after PyTorch version update | No fix for now. We tried to upgrade Torch several times but failed to achieve stable training on newer Torch versions. |
| OOM in evaluation | Use larger GPU. In our submit_job utility function, we first attempt to use a smaller GPU partition (1080ti/2080ti). After some failures, we switch automatically to a partition with more VRAM. |
| Training instability in general | Turn off mixed-precision training and train in 32bit precision. |
If you face training instability, as reported in issue #67, try following solutions:
- Turn off mixed-precision training in config.
- Purge the Conda environment and reinstall from scratch.
- Try the latest working commit at
a41d11616.
The LEAD pipeline and TFv6 models serve as reference implementations across multiple E2E driving platforms:
| Platform | Model | Highlight |
|---|---|---|
| Waymo E2E Driving Challenge | DiffusionLTF | 2nd place in the inaugural vision-based E2E driving challenge |
| NAVSIM v1 Huggingface Leaderboard | LTFv6 | +3 PDMS over Latent TransFuser baseline on navtest |
| NAVSIM v2 Huggingface Leaderboard | LTFv6 | +6 EPMDS over Latent TransFuser baseline on navhard |
| NVIDIA AlpaSim | TransFuserDriver | Official baseline policy for closed-loop simulation |
For a deeper dive, visit the full documentation site:
Data Collection · Training · Evaluation.
The documentation will be updated regularly.
This project builds on the shoulders of excellent open-source work. Special thanks to carla_garage for the foundational codebase.
PDM-Lite · Leaderboard · Scenario Runner · NAVSIM · Waymo Open Dataset
SimLingo · PlanT2 · Bench2Drive Leaderboard · Bench2Drive · CaRL · Fail2Drive
Long Nguyen led development of the project. Kashyap Chitta, Bernhard Jaeger, and Andreas Geiger contributed through technical discussion and advisory feedback. Daniel Dauner provided guidance with NAVSIM.
If you find this work useful, please consider giving this repository a star and citing our paper:
@inproceedings{Nguyen2026CVPR,
author = {Long Nguyen and Micha Fauth and Bernhard Jaeger and Daniel Dauner and Maximilian Igl and Andreas Geiger and Kashyap Chitta},
title = {LEAD: Minimizing Learner-Expert Asymmetry in End-to-End Driving},
booktitle = {Conference on Computer Vision and Pattern Recognition (CVPR)},
year = {2026},
}


