Jack Langerman (@JackLangerman), Denys Rozumnyi (@DRozumnyi), Yuzhong Huang (@YuzhongHuang), Dmytro Mishkin (@ducha_aiki)
ICCV 2025 Highlight Paper!
This repository contains code and experiments for understanding how to compare structured (semantic) representations (in this case wireframes), proposing and testing new metrics, and benchmarking reconstruction methods using these metrics.
The basic idea of the paper is the following:
- We have different wireframes, representing house roofs, one of which is ground truth (GT), and others are outputs of the some algorithms. This data is in data/input
- People of different backgrounds (but mostly 3D modellers) are asked to rank the wireframes from the best to the worst w.r.t. GT. data/input/human_pairwise_annotation
- We ran various metrics, such as corner precision/recall/F1, edge precision/recall/F1, Wireframe Edit Distance (WED), volumetric OiO, etc. The metrics are implemented in structured_rec_metrics/metrics.py
- Then we compare the rankings coming from the human annotations to the ones, created by metrics.
- Also we have designed a set of "unit-tests" to check if the metrics satisfy some maybe desired properties like a triangular inequality, etc.
For fun, we have also run a limited comparison using different VLMs, like the ChatGPT 4o and Grok 2 -- sounds ancient by now, we know. They are in the data/input/vlm
To re-compute metrics (better use CUDA GPU)
python scripts/compute_metrics.py --gt_path data/input/gt.parquet --pred_dir data/input/s23dr_2024_submissions/ --output_dir data/output/test1 --solutions_csv data/input/submissions_valid.csv --corner_th 25 --edge_th 25 --device cpuTo re-use data we already computed - skip this step and use our data in data/output
To recompute table run
python scripts/check_metric_properties.pyNote, that it will download pretty large dataset HoHo to evaluate metric properties. You also may need to apply for access and accept the conditions. The dataset is free.
Now you can run the jupyter notebooks, which produce figures, which we have in the paper
- notebooks/annotators_metric_agreement.ipynb -- all the main graphs live here
- notebooks/figure_5_average_error_num_raters.ipynb -- plots, which show theoretical stability of the rankings
- notebooks/figure_5_win_stability.ipynb -- plots, which show empirical stability of the rankings
- notebooks/rating_times.ipynb -- annotation statistics
- notebooks/unit-tests-metric-table.ipynb -- metric properties table
To install the package and all its dependencies, run:
pip install -e .https://arxiv.org/abs/2503.08208
@inproceedings{langerman2025explaininghumanpreferencesmetrics,
title={Explaining Human Preferences via Metrics for Structured 3D Reconstruction},
author={Jack Langerman and Denys Rozumnyi and Yuzhong Huang and Dmytro Mishkin},
year={2025},
booktitle={ICCV}
url={https://arxiv.org/abs/2503.08208},
}