Deep Learning for Detecting and Early Predicting Chronic Obstructive Pulmonary Disease from Spirogram Time Series
This repository contains the code accompanying the paper: "Deep Learning for Detecting and Early Predicting Chronic Obstructive Pulmonary Disease from Spirogram Time Series" by Mei, Shuhao, et al., 2025.
Chronic Obstructive Pulmonary Disease (COPD) is a progressive lung condition that obstructs airflow and significantly impacts quality of life. Traditional methods rely on clear, prominent features in spirograms (Volume-Flow time series) for diagnosis but fail to predict long-term COPD risk from subtle patterns in the data.
We present DeepSpiro, a deep learning-based framework designed for both detecting COPD and predicting future COPD risk. Key components of DeepSpiro include:
- SpiroSmoother: Stabilizes Volume-Flow curves to reduce noise and improve input quality.
- SpiroEncoder: Captures variability patterns in spirogram data by extracting key patches of different lengths.
- SpiroExplainer: Integrates heterogeneous data and provides interpretable predictions using volume attention mechanisms.
- SpiroPredictor: Predicts COPD risk for undiagnosed high-risk patients over horizons of 1, 2, 3, 4, 5 years, and beyond.
Highlights:
- Dataset: Evaluated on the UK Biobank dataset.
- Performance: Achieved an AUC of 0.8328 for COPD detection and robust predictive performance for long-term COPD risk (p-value < 0.001).
- Impact: Effectively models disease progression, offering actionable insights for early intervention.
If you use this code in your research, please consider citing our work:
@article{mei2025deep,
title={Deep learning for detecting and early predicting chronic obstructive pulmonary disease from spirogram time series},
author={Mei, Shuhao and Li, Xin and Zhou, Yuxi and Xu, Jiahao and Zhang, Yong and Wan, Yuxuan and Cao, Shan and Zhao, Qinghao and Geng, Shijia and Xie, Junqing and Chen, Shengyong and Hong, Shenda},
journal={npj Systems Biology and Applications},
volume={11},
number={1},
pages={18},
year={2025},
publisher={Nature Publishing Group UK London}
}
This repository supports Python 3.8 and PyTorch 2.2.2. To set up the environment, follow these steps:
# Create and activate a virtual environment
conda create -n DeepSpiro python=3.8
conda activate DeepSpiro
# Install dependencies
pip install -r requirements.txtDepending on your access to data, follow one of the two options below to prepare the required inputs:
If you want to test the model, you can use the example data generated by the script:
- Run the generate_example_data.py script.
- This script will download publicly available Flow Data (spirometry exhalation volume curves) from the UK Biobank website at https://biobank.ndph.ox.ac.uk/showcase/refer.cgi?id=3.
- It will generate a synthetic sample.xlsx file containing Flow Data, approximating FVC, FEV1, and PEF values required for testing.
If you have approved access to the UK Biobank dataset:
- Download the required data from the UK Biobank, specifically the following fields:
- Data-Field 3062: FVC (Forced Vital Capacity),
- Data-Field 3063: FEV1 (Forced Expiratory Volume in 1 second),
- Data-Field 3064: PEF (Peak Expiratory Flow),
- Data-Field 3066: Flow Data (exhalation volume curve).
- Populate the sample.xlsx file (located in the data directory) with the data obtained from the above fields.
- Note: All four data fields (FVC, FEV1, PEF, and Flow Data) are mandatory for accurate analysis.
To run the model effectively, you need to provide additional patient information as arguments: Age, Sex, and Smoking Status. These correspond to the following UK Biobank fields:
- Age: Data-Field 21003
- Sex: Data-Field 31
- Smoking Status: Data-Field 1249
After preparing the data:
- Activate the environment:
conda activate DeepSpiro- Run the prediction script with the required arguments:
python run_predict.py -data ./data/sample.xlsx -age 53 -sex 0 -smoke 1- -data: Path to the sample.xlsx file containing Flow Data.
- -age: Age of the patient (default: 53).
- -sex: Sex of the patient (0 for female, 1 for male; default: 0).
- -smoke: Smoking status (0 for non-smoker, 1 for smoker; default: 1). Alternatively, you can omit the arguments to use default values.
The data used in this project is sourced from the UK Biobank, available upon application approval. For more information, visit the UK Biobank website.