Official code for Prediction of molecular subtypes for endometrial cancer based on hierarchical foundation model. Bioinformatics
hi-UNI: hierarchical UNI is used for whole slide image classification, using a weakly supervised pipeline. Our method achieved state-of-the-art performance, offering cost-effective and fast molecular subtyping for endometrial cancer.
Install the dependencies
pip install -r requirements.txt-
We have uploaded another repo for data preprocessing: WSI_Segmenter. Which can also be found in the ./preprocess directory. The detailed patch extraction and segmentation steps can be found in the ./preprocess/readme.md.
-
Extract raw patches to at least 1024x1024 resolution, use tiatoolbox or DeepZoom for patch extraction. The tumor segmentation network can be easily added to these pipelines.
-
Prepare the data in the following structure, png or jpeg format is supported. Note that extracting patches only from the tumor region is recommended.
├── data │ ├── slide_1 │ │ ├── patch_1.png │ │ ├── patch_2.png │ │ ├── ... │ ├── slide_2 │ │ ├── patch_1.png │ │ ├── patch_2.png │ │ ├── ... │ ├── ... │ └── slide_n │ ├── ... │ └── patch_n.png
-
Create a hierarchical structure for the data.
python utils/create_hi_patches.py --input <INPUT_DIR> --output <OUTPUT_DIR> --how non-blank
--how: center (center-crop) or non-blank (selective-sampling, proposed in the paper) -
Organize your data like
example.csv. Create k-fold split for the data.python utils/gen_kfold_split.py --csv <CSV_PATH> --dir <STEP_2_OUTPUT_DIR> --k 5 --on slide
--on slidesplit the data on slide level--on patientsplit the data on patient level (use name column)A directory named
kfwill be created in the current directory. -
Apply for the UNI model from
and download the
pytorch_model.bin. -
Modify the config.yaml file to set hyperparameters and UNI's storage path.
-
Hyperparameters: batch_size, lr, epochs, iters_to_val, save_best
-
UNI config: freeze_ratio (for ViT blocks), cmb (hi-UNI combinations), UNI_path
-
Task-specific config: class_names
-
-
Train & evaluate a single fold (e.g., fold 1) and evaluate on the validation set
python train.py --fold 1
-
Train & evaluate all folds (for Windows)
python ./scripts/train_kf.py
Train & evaluate all folds (for Linux)
sh ./scripts/train_kf.sh
-
The results will be saved in the
runs/directory.In the format of:
├── runs │ ├── {cmbs}_{freeze_ration} # configuration │ │ ├── 1 # fold name │ │ │ ├── {fold}_best.pth # best model │ │ │ ├── slide_{iter}.png # slide-level ROC │ │ │ ├── ... │ │ ├── ... │ ...
We are grateful to the authors for sharing their code. We use CLAM for data preprocessing and feature extraction in comparison experiments.
| Model | Authors | GitHub link |
|---|---|---|
| CLAM | Lu et al. | https://github.com/mahmoodlab/CLAM |
| DTFD-MIL | Zhang et al. | https://github.com/hrzhang1123/DTFD-MIL |
| SETMIL | Zhao et al. | https://github.com/Louis-YuZhao/SETMIL |
| TransMIL | Shao et al. | https://github.com/szc19990412/TransMIL |
| im4MEC | Fremond et al. | https://github.com/AIRMEC/im4MEC |
© IMIC - This code is made available under the GPLv3 License and is available for non-commercial academic purposes.
If you find our work useful in your research, please consider citing our paper:
Haoyu Cui, Qinhao Guo, Jun Xu, Xiaohua Wu, Chengfei Cai, Yiping Jiao, Wenlong Ming, Hao Wen, Xiangxue Wang, Prediction of molecular subtypes for endometrial cancer based on hierarchical foundation model, Bioinformatics, 2025
@article{10.1093/bioinformatics/btaf059,
author = {Cui, Haoyu and Guo, Qinhao and Xu, Jun and Wu, Xiaohua and Cai, Chengfei and Jiao, Yiping and Ming, Wenlong and Wen, Hao and Wang, Xiangxue},
title = {Prediction of molecular subtypes for endometrial cancer based on hierarchical foundation model},
journal = {Bioinformatics},
pages = {btaf059},
year = {2025},
month = {02},
issn = {1367-4811},
doi = {10.1093/bioinformatics/btaf059},
url = {https://doi.org/10.1093/bioinformatics/btaf059},
}