posebench

Regression testing benchmark for PoseLib. Each estimator is compared to the corresponding one in pycolmap where available, across four problem categories: relative pose, absolute pose, homography, and depth-aided relative pose.

Installation

pip install -e .

Quick start

Run the fast subset benchmark (downloads ~10 instances per dataset automatically):

posebench --subset

Since results/baseline-subsampled-10.json is committed to the repo, this will automatically compare your results against the baseline and show a regression diff:

Running problem absolute pose
Running problem relative pose
Running problem homography
Running problem monodepth pose
Finished running evaluation in 67.1 seconds (57 datasets)
Results saved to results/20260217_230848.json

REF: results/baseline-subsampled-10.json
  timestamp: 2026-02-17T23:07:13.446200
  poselib:   2.1.0
NEW: results/20260217_230848.json
  timestamp: 2026-02-17T23:08:48.016309
  poselib:   2.1.0

Values shown are from NEW, with (delta = NEW - REF) in parentheses.

=== absolute pose (average over 9 shared datasets) ===
                            AUC0             AUC1             AUC5              med_rot      med_pos             avg_rt             med_rt
PnP (poselib)      56.52 (+0.00)    93.27 (+0.00)    98.65 (+0.00)   3.5E-01 (+0.0E+00)   0.0 (+0.0)     9.3ms (+0.0ms)     8.2ms (+0.0ms)
PnP (COLMAP)       54.45 (+0.00)    91.47 (+0.00)    98.47 (+0.00)   4.3E-01 (+0.0E+00)   0.0 (+0.0)    15.0ms (-0.1ms)    13.8ms (-0.0ms)
PnPf (poselib)     31.24 (+0.00)    74.93 (+0.00)    90.43 (+0.00)   1.4E+01 (+0.0E+00)   0.7 (+0.0)    49.7ms (+0.2ms)    43.8ms (+0.2ms)
PnPf (COLMAP)      26.52 (+0.00)    67.40 (+0.00)    90.49 (+0.00)   5.8E-01 (+0.0E+00)   0.4 (+0.0)    13.5ms (-0.1ms)    12.5ms (-0.1ms)
PnPfr (poselib)    31.19 (+0.00)    75.37 (+0.00)    91.46 (+0.00)   1.7E+01 (+0.0E+00)   0.4 (+0.0)    45.6ms (-0.0ms)    36.8ms (+0.1ms)
PnPfr (COLMAP)     26.68 (+0.00)    68.60 (+0.00)    89.03 (+0.00)   5.6E-01 (+0.0E+00)   0.3 (+0.0)    13.2ms (-0.2ms)    12.0ms (-0.2ms)

=== relative pose (average over 22 shared datasets) ===
                           AUC5            AUC10            AUC20             avg_rt             med_rt
E (poselib)       51.89 (+0.00)    65.63 (+0.00)    76.33 (+0.00)    23.2ms (-0.0ms)    21.9ms (-0.1ms)
E (poselib,TS)    51.46 (+0.00)    65.26 (+0.00)    76.14 (+0.00)    28.2ms (-0.1ms)    24.9ms (-0.1ms)
E (COLMAP)        49.00 (+0.00)    64.70 (+0.00)    76.17 (+0.00)    14.8ms (-0.0ms)    12.9ms (+0.0ms)
F (poselib)       33.83 (+0.00)    47.40 (+0.00)    60.47 (+0.00)    22.7ms (+0.1ms)    20.6ms (+0.2ms)
F (COLMAP)        30.38 (+0.00)    43.40 (+0.00)    55.77 (+0.00)     5.5ms (-0.0ms)     4.0ms (-0.0ms)

=== homography (average over 2 shared datasets) ===
                        AUC5            AUC10            AUC20              avg_rt             med_rt
H (poselib)     3.15 (+0.00)     4.08 (+0.00)    10.03 (+0.00)   515.9us (+11.7us)   234.8us (+0.8us)
H (COLMAP)      3.15 (+0.00)     4.08 (+0.00)     4.54 (+0.00)      1.1ms (+0.0ms)     1.1ms (+0.0ms)

=== monodepth pose (average over 24 shared datasets) ===
                                            AUC5            AUC10            AUC20             avg_rt             med_rt
E                                  63.36 (+0.00)    74.67 (+0.00)    82.48 (+0.00)    10.5ms (-0.0ms)     9.9ms (+0.0ms)
RePoseD (calibrated)               62.91 (+0.00)    76.49 (+0.00)    85.24 (+0.00)    25.3ms (+0.3ms)    24.1ms (+0.3ms)
RePoseD (calibrated) with shift    66.45 (+0.00)    77.67 (+0.00)    85.79 (+0.00)    32.7ms (+0.8ms)    31.6ms (+0.8ms)
E + shared f                       53.28 (+0.00)    68.28 (+0.00)    79.42 (+0.00)    14.1ms (+0.2ms)    12.1ms (+0.1ms)
RePoseD (shared focal)             58.58 (+0.00)    74.19 (+0.00)    83.61 (+0.00)    19.4ms (+0.7ms)    18.2ms (+0.6ms)
F                                  44.42 (+0.00)    59.04 (+0.00)    72.03 (+0.00)     9.8ms (+0.1ms)     8.9ms (+0.1ms)
RePoseD (varying focal)            43.01 (+0.00)    58.68 (+0.00)    73.57 (+0.00)    23.0ms (+1.0ms)    22.1ms (+0.9ms)

=== Summary (across all problems) ===
  unchanged: 72/72

This is particularly useful in CI to catch regressions quickly.

Full benchmark

posebench

Downloads the full dataset on first run. Results are saved to results/<timestamp>.json. If results/baseline.json exists, the run automatically shows a comparison against it.

Baselines and comparison

Creating and updating baselines

# Save the current results as the subset baseline:
posebench --subset --create_baseline   # → results/baseline-subsampled-10.json

# Save the current results as the full-data baseline:
posebench --create_baseline            # → results/baseline.json

Auto-compare

Whenever a matching baseline file exists (results/baseline.json for full runs, results/baseline-subsampled-10.json for --subset), the benchmark automatically shows a comparison diff instead of the plain metrics table. Use --no-save to suppress this.

Manual comparison

Compare any two saved result files explicitly:

# Compare two specific files:
posebench --compare results/baseline.json results/new.json

# With one argument, results/baseline.json is used as the reference:
posebench --compare results/new.json

# Add --per-dataset for a per-dataset breakdown:
posebench --compare results/baseline.json results/new.json --per-dataset

CLI reference

usage: posebench [-h] [--min_iterations N] [--max_iterations N]
                 [--success_prob P] [--initial_lambda L]
                 [--method METHOD] [--dataset DATASET]
                 [--subsample N] [--subset]
                 [--create_subset [N]]
                 [--output OUTPUT] [--no-save]
                 [--create_baseline]
                 [--compare FILE [FILE ...]] [--per-dataset]

options:
  --min_iterations N    RANSAC minimum iterations
  --max_iterations N    RANSAC maximum iterations
  --success_prob P      RANSAC success probability
  --initial_lambda L    Bundle adjustment initial lambda
  --method METHOD       Only run methods whose name contains METHOD (comma-separated)
  --dataset DATASET     Only run datasets whose name contains DATASET (comma-separated)
  --subsample N         Use every Nth instance at runtime (without modifying data files)
  --subset              Use the pre-built subset dataset (~10 instances per dataset)
  --create_subset [N]   Build a subsampled dataset from the full data and zip it.
                        Uniformly samples up to N instances per dataset (default N=10).
                        Output: data-subsampled-N/ directory and data-subsampled-N.zip
  --output OUTPUT       Output path for results JSON (default: results/<timestamp>.json)
  --no-save             Do not save results to file
  --create_baseline     Run benchmark and save to the canonical baseline file
                        (results/baseline.json or results/baseline-subsampled-10.json)
  --compare FILE [FILE ...]
                        Compare result files. With two args: REF NEW. With one arg:
                        compares results/baseline.json vs FILE
  --per-dataset         Show per-dataset breakdown in comparison (use with --compare)

Python API

from posebench import run_benchmark

all_metrics, dataset_names, saved_path = run_benchmark(
    min_iterations=100,
    max_iterations=1000,
    success_prob=0.9999,
    initial_lambda=1e-3,
    method_filter=["poselib"],
    dataset_filter=["megadepth"],
    subsample=10,
    subset=False,
    output="results/my_run.json",
    no_save=False,
    print_results=True,
)

run_benchmark returns a 3-tuple:

all_metrics: Dict[str, Dict[str, Dict[str, Dict[str, float]]]] — {problem: {dataset: {method: {metric: value}}}}
dataset_names: list of all dataset names that were evaluated
saved_path: path to the saved JSON file, or None if no_save=True

Datasets

The benchmark covers four problem categories across 57 datasets from various papers. Datasets are downloaded automatically on first run.

Category	Datasets
Relative pose	MegaDepth1500 (×6 matchers), ScanNet1500 (×5 matchers), IMC (9 scenes), Fisheye (2)
Absolute pose	7-Scenes (2), Cambridge Landmarks (5), ETH3D, MegaScenes32k
Homography	Barath et al. (2 scenes)
Monodepth relative pose	6 scenes × 2 depth estimators × 2 matchers

To download datasets manually instead of automatically:

sh download_data.sh

Maintenance

Regenerating the subset after adding new datasets

After adding a new dataset to the benchmark:

# Rebuild the subset from the full dataset (requires full data in data/):
posebench --create_subset 10

# Upload the resulting data-subsampled-10.zip to storage and update the download URL
# in download_data() in src/posebench/__init__.py.

# Then update the committed baseline:
posebench --subset --create_baseline
git add results/baseline-subsampled-10.json
git commit -m "update subset baseline"

TODO

Add missing estimators (multi-camera, hybrid, etc.)
Add refinement benchmark

Name		Name	Last commit message	Last commit date
Latest commit History 71 Commits
.github/workflows		.github/workflows
results		results
scripts		scripts
src/posebench		src/posebench
tests		tests
.gitignore		.gitignore
.python-version		.python-version
README.md		README.md
download_data.sh		download_data.sh
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

posebench

Installation

Quick start

Full benchmark

Baselines and comparison

Creating and updating baselines

Auto-compare

Manual comparison

CLI reference

Python API

Datasets

Maintenance

Regenerating the subset after adding new datasets

TODO

About

Uh oh!

Releases 1

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

posebench

Installation

Quick start

Full benchmark

Baselines and comparison

Creating and updating baselines

Auto-compare

Manual comparison

CLI reference

Python API

Datasets

Maintenance

Regenerating the subset after adding new datasets

TODO

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages