Skip to content

NKI-RT/weakly-supervised-commissioning

Repository files navigation

Weakly supervised commissioning

When introducing an auto-segmentation model into clinical practice, one would like to assess both the quality of the predicted segmentations and the robustness of the model over a wide range of anatomical variation and/or image quality, representative of daily clinical practice. This is particularly difficult when the model is provided by an external party and the institution introducing the model does not possess a high quality dataset to commission the model on.

Assuming that a segmentation model is more likely to fail for an atypical case as opposed to a more average one, we propose a methodology that selects cases for commissioning based on unsupervised anomaly detection. For this, the model supplier is required to provide a set of image/shape features that correlate with model performance on the training data. Next, the receiving hospital can use these features to train an unsupervised anomaly detector on a large dataset of unlabeled cases and use the anomaly scores to select representative cases for commissioning of the model. Since the anomaly detector is trained on unlabeled data, a large, high quality, curated dataset is not required on the receiving hospital side.

Using the proposed approach to select cases for manual evaluation increases the likelihood of selecting atypical edge cases with low segmentation performance, as compared to a random selection. An increase in segmentation performance spread of 37% was observed when including 20 cases using the proposed methodology compared to a random selection. This provides a more representative range of expected segmentation performance in clinical practice to evaluate the segmentation model on. This approach could be used for model commissioning to increase the confidence that the model performs well over a wide range of expected anatomical variation, or for online model QA after clinical introduction.

See also technical note: Weakly supervised commissioning of externally developed auto-segmentation models and applied to male pelvis MR auto-segmentation

Installation

Requirements:

  • python 3.9
  • poetry >= 2.0

Please start by cloning this repository. Then install the required dependencies in your local environment using poetry by running:

poetry install

Usage

Please have a look at example settings file wsc_config.json and the code in the run_wsc.py file.

For calculating the features and metrics, we assume that the data folders contain the following subfolders containing the images and contours in nifti format: These will be saved as "features.csv" and "metrics.csv" in the data folders.

  • supplier_dir:
    • images
    • labels_gt (the ground truth contours)
    • labels_pred (the model predictions)
  • deployer_dir:
    • images
    • labels_pred (the model predictions)
  • test_dir: (only for manuscript)
    • images
    • labels_gt (the ground truth contours)
    • labels_pred (the model predictions)

Or you can pre-calculate the features and metrics csv files, please have a look at the calculate_features_for_folder and compare_folders functions.

Model supplier

  • First the Model supplier needs to identify the combination of features that correlate with geometric contouring performance based on an exhaustive search, see run_wsc_model_supplier.

Model deployer

  • Train an isolation forest on unlabeled data using the feautres identified by the model supplier, see run_wsc_model_deployer.

Validation

  • In our case, we validated the weakly supervised commissioning approach using a separate labeled dataset, see run_wsc_validate and the notebook in validate_wsc.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors