Skip to content

HaoyuCui/hi-UNI

Repository files navigation

hi-UNI

Official code for Prediction of molecular subtypes for endometrial cancer based on hierarchical foundation model. Bioinformatics

Journal link | Cite

hi-UNI: hierarchical UNI is used for whole slide image classification, using a weakly supervised pipeline. Our method achieved state-of-the-art performance, offering cost-effective and fast molecular subtyping for endometrial cancer.

Overview

image

Installation

Install the dependencies

pip install -r requirements.txt

Preprocessing

  1. We have uploaded another repo for data preprocessing: WSI_Segmenter. Which can also be found in the ./preprocess directory. The detailed patch extraction and segmentation steps can be found in the ./preprocess/readme.md.

  2. Extract raw patches to at least 1024x1024 resolution, use tiatoolbox or DeepZoom for patch extraction. The tumor segmentation network can be easily added to these pipelines.

Data preparation

  1. Prepare the data in the following structure, png or jpeg format is supported. Note that extracting patches only from the tumor region is recommended.

    ├── data
    │   ├── slide_1
    │   │   ├── patch_1.png
    │   │   ├── patch_2.png
    │   │   ├── ...
    │   ├── slide_2
    │   │   ├── patch_1.png
    │   │   ├── patch_2.png
    │   │   ├── ...
    │   ├── ...
    │   └── slide_n
    │       ├── ...
    │       └── patch_n.png
  2. Create a hierarchical structure for the data.

    python utils/create_hi_patches.py --input <INPUT_DIR> --output <OUTPUT_DIR> --how non-blank

    --how : center (center-crop) or non-blank (selective-sampling, proposed in the paper)

  3. Organize your data like example.csv. Create k-fold split for the data.

    python utils/gen_kfold_split.py --csv <CSV_PATH>  --dir <STEP_2_OUTPUT_DIR> --k 5 --on slide

    --on slide split the data on slide level

    --on patient split the data on patient level (use name column)

    A directory named kf will be created in the current directory.

  4. Apply for the UNI model from and download the pytorch_model.bin.

  5. Modify the config.yaml file to set hyperparameters and UNI's storage path.

    • Hyperparameters: batch_size, lr, epochs, iters_to_val, save_best

    • UNI config: freeze_ratio (for ViT blocks), cmb (hi-UNI combinations), UNI_path

    • Task-specific config: class_names

Train and evaluate

  1. Train & evaluate a single fold (e.g., fold 1) and evaluate on the validation set

    python train.py --fold 1
  2. Train & evaluate all folds (for Windows)

    python ./scripts/train_kf.py

    Train & evaluate all folds (for Linux)

    sh ./scripts/train_kf.sh
  3. The results will be saved in the runs/ directory.

    In the format of:

     ├── runs
     │   ├── {cmbs}_{freeze_ration}  # configuration
     │   │   ├── 1  # fold name
     │   │   │   ├── {fold}_best.pth  # best model
     │   │   │   ├── slide_{iter}.png  # slide-level ROC
     │   │   │   ├── ...
     │   │   ├── ...
     │   ...

Comparison experiments

We are grateful to the authors for sharing their code. We use CLAM for data preprocessing and feature extraction in comparison experiments.

Model Authors GitHub link
CLAM Lu et al. https://github.com/mahmoodlab/CLAM
DTFD-MIL Zhang et al. https://github.com/hrzhang1123/DTFD-MIL
SETMIL Zhao et al. https://github.com/Louis-YuZhao/SETMIL
TransMIL Shao et al. https://github.com/szc19990412/TransMIL
im4MEC Fremond et al. https://github.com/AIRMEC/im4MEC

License

© IMIC - This code is made available under the GPLv3 License and is available for non-commercial academic purposes.

Reference

If you find our work useful in your research, please consider citing our paper:

Haoyu Cui, Qinhao Guo, Jun Xu, Xiaohua Wu, Chengfei Cai, Yiping Jiao, Wenlong Ming, Hao Wen, Xiangxue Wang, Prediction of molecular subtypes for endometrial cancer based on hierarchical foundation model, Bioinformatics, 2025

@article{10.1093/bioinformatics/btaf059,
    author = {Cui, Haoyu and Guo, Qinhao and Xu, Jun and Wu, Xiaohua and Cai, Chengfei and Jiao, Yiping and Ming, Wenlong and Wen, Hao and Wang, Xiangxue},
    title = {Prediction of molecular subtypes for endometrial cancer based on hierarchical foundation model},
    journal = {Bioinformatics},
    pages = {btaf059},
    year = {2025},
    month = {02},
    issn = {1367-4811},
    doi = {10.1093/bioinformatics/btaf059},
    url = {https://doi.org/10.1093/bioinformatics/btaf059},
}

About

Official code for Prediction of molecular subtypes for endometrial cancer based on hierarchical foundation model.

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors