This project proposes the SMU-Encoder Classification Model, a multi-modal classification network designed to address missing modality challenges in upper gastrointestinal submucosal lesion classification.
The model integrates style-content feature extraction through the SMUEncoder module and applies a Perceiver Transformer architecture for classification tasks.
A mix-mode training strategy (70% full-modality, 30% missing-modality simulation) is employed to improve robustness while maintaining high classification performance.
The two imaging modalities used in this project are:
- Endoscopic Ultrasound (EUS)
- Endoscopy (OGD, referred to as ING in the code)
For full technical details and methodology, please refer to the [final year thesis].
The experiments are conducted on an Ubuntu 20.04 system with:
- NVIDIA RTX 4090 GPU (24GB)
- Intel(R) Xeon(R) Platinum 8352V CPU (16 cores, 2.10GHz)
- 120GB system memory
The project uses PyTorch 1.11.0 + cu113 and Python 3.8, with the following key packages:
| Package | Version |
|---|---|
| torch | 1.11.0+cu113 |
| torchvision | 0.12.0+cu113 |
| torchsummary | 1.5.1 |
| einops | 0.7.0 |
| numpy | 1.22.4 |
| pandas | 2.0.3 |
| matplotlib | 3.5.2 |
| seaborn | 0.13.2 |
| pytorch-lamb | 1.0.0 |
| scikit-learn | 1.3.2 |
| Pillow | 9.1.1 |
| tqdm | 4.61.2 |
| File / Folder | Purpose |
|---|---|
data_process.py |
Preprocess raw dataset images and masks |
dataset.py |
Dataset loading and transformation logic |
make_json.py |
Generate structured JSON file pairs for training/testing |
train.py |
Model training, testing, and evaluation |
perceiver.py |
Perceiver Transformer model implementation |
vit.py |
Vision Transformer (ViT) model implementation |
image_fusion.py |
Fusion of images with corresponding ROI masks |
roc.py |
Generate ROC curves and calculate AUC scores |
show.py |
Visualize and restore images from tensor format |
visualize.py |
Result comparison and visualization of multiple models |
generate_mask_new.py |
Generate binary mask images from labeled JSON files |
The
test1_totest6_folders contain different experiment variants (multi-modal / single-modal, with/without masks).
Raw data is under./Dataset/train_Datasetand./Dataset/test_Dataset. Processed data is stored in./Processed_Trainand./Processed_Test.
Note: In the code, ING refers to OGD (endoscopic images). You can treat these terms as interchangeable.
Run:
python data_process.pyThe preprocessed data will be saved under Processed_Train and Processed_Test.
In train.py, update the data paths within the make_json.generate_split() function to match your local dataset locations (inside Processed_Train and Processed_Test).
Example command:
python train.py -l 0.00078 -e 30-l: Learning rate (default: 0.00078)-e: Number of epochs (default: 30)- Batch size is configured as 8 in the code.
The function same_seeds() ensures reproducibility by setting fixed random seeds.
After training, result files (test_results.txt, train_metrics.txt, confusion matrices, and optionally ROC curves) will be generated automatically.
To visualize the model comparison, use:
python visualize.pyTo visualize fused input images:
python show.pyFor any questions, please contact:
Li Jiawei (李佳蔚)
📧 [email protected]