Uncertainty maps provide a quantitative and visual representation of the estimated confidence of Deep Learning (DL) models in contouring predictions and have been proposed to improve clinicians’ efficiency during manual review. However, uncertainty maps are not currently integrated into clinical workflows, and evidence on their actual benefit in clinical decision-making remains limited. This study investigates the impact of simulated uncertainty maps on clinicians’ behaviour during manual editing of high-quality clinical target volume (CTV) contours in rectal cancer radiotherapy. An inter-observer variability (IOV) dataset of 10 patients was used to simulate high-quality DL uncertainty maps and contours. Six clinicians edited the contours across two editing sessions, with and without uncertainty maps. For each session, editing time, editing amount, questionnaire responses, and interview feedback were collected to assess the impact both quantitatively and qualitatively. Editing time and editing amount were comparable with and without uncertainty maps, while both measures decreased significantly in the second editing session, indicating a learning effect from task repetition. Qualitative feedback showed that clinicians’ decisions were shaped more by human factors, such as workload, mood, memory and anchoring biases, than by the uncertainty maps. Moreover, the study revealed low clinician trust in the uncertainty maps, which were used primarily for confirmation rather than decision-making. The findings suggest that the value of uncertainty maps may be limited for high-quality contours and highlight the need to investigate their relevance for different use cases.
Under review for MIDL2026: https://openreview.net/pdf?id=8G6bouMNzF
Requirements:
- python 3.9
- poetry >= 2.0
Please start by cloning this repository. Then install the required dependencies in your local environment using poetry by running:
poetry install
For generating the simulated uncertainty maps from the IOV dataset, generate_uncertainty_maps_from_iov.py, uncertainty_maps_helpers.py and load_helpers are used.
For evaluating the results of the observer study the following are used:
- for assessing geometric metrics: geometric_evaluation_session.py, geometric_evaluation_uncertainty.py and performance_metrics.py
- for assessing IOV: iov_compute_ci.py, iov_distribution.py, iov_heatmaps.py and variability_maps_helpers.py
- for evaluating time and ratings from the questionnaires and plotting figures: Plot_figures.ipynb