Novel nested conformal prediction analysis (NCPA) to unravel complexity in patient subtyping
This repository is related to the application of conformal prediction in a nested setting, allowing to highlight intra-sample heterogeneity in small datasets. The workflow was applied to TCGA-BRCA available at https://portal.gdc.cancer.gov/projects/TCGA-BRCA.
- 00_Branch1_PAM50 is a jupyter notebook which allows the execution of the first branch of the pipeline, applying NCPA allows to assign the samples to their multilabel counterparts
- 01_Data_Preparation.py and 02_Main_Machine_Learning.py are a first test for Branch2, ensuring that the pipelines are correctly running before the application of true NCPA. The Main script needs to have as input parameters the machine learning model to run (either "LogisticRegression", "KNN", "RandomForest", "SVC") and the output folder.
- 03_Conformal_Analysis is a jupyter notebook with the first conformal prediction analysis performed on these results.
- 04_Branch2_Data_Preparation.py and 05_Branch2_Main_Machine_Learning.py are the true NCPA scripts. The Main script needs to have as input parameters the machine learning model to run (see point 2), the output folder, and the input folder for the data generated by the Data Preparation folder
- 06_Conformal_Analysis_Multilabel is the final jupyter notebook which allows the cross analysis of Branch1 e Branch2