Skip to content

yudaleng/COPD-Early-Prediction

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Deep Learning for Detecting and Early Predicting Chronic Obstructive Pulmonary Disease from Spirogram Time Series

Overview

This repository contains the code accompanying the paper: "Deep Learning for Detecting and Early Predicting Chronic Obstructive Pulmonary Disease from Spirogram Time Series" by Mei, Shuhao, et al., 2025.

Introduction

Chronic Obstructive Pulmonary Disease (COPD) is a progressive lung condition that obstructs airflow and significantly impacts quality of life. Traditional methods rely on clear, prominent features in spirograms (Volume-Flow time series) for diagnosis but fail to predict long-term COPD risk from subtle patterns in the data.

We present DeepSpiro, a deep learning-based framework designed for both detecting COPD and predicting future COPD risk. Key components of DeepSpiro include:

  1. SpiroSmoother: Stabilizes Volume-Flow curves to reduce noise and improve input quality.
  2. SpiroEncoder: Captures variability patterns in spirogram data by extracting key patches of different lengths.
  3. SpiroExplainer: Integrates heterogeneous data and provides interpretable predictions using volume attention mechanisms.
  4. SpiroPredictor: Predicts COPD risk for undiagnosed high-risk patients over horizons of 1, 2, 3, 4, 5 years, and beyond.

Highlights:

  • Dataset: Evaluated on the UK Biobank dataset.
  • Performance: Achieved an AUC of 0.8328 for COPD detection and robust predictive performance for long-term COPD risk (p-value < 0.001).
  • Impact: Effectively models disease progression, offering actionable insights for early intervention.

If you use this code in your research, please consider citing our work:

@article{mei2025deep,
  title={Deep learning for detecting and early predicting chronic obstructive pulmonary disease from spirogram time series},
  author={Mei, Shuhao and Li, Xin and Zhou, Yuxi and Xu, Jiahao and Zhang, Yong and Wan, Yuxuan and Cao, Shan and Zhao, Qinghao and Geng, Shijia and Xie, Junqing and Chen, Shengyong and Hong, Shenda},
  journal={npj Systems Biology and Applications},
  volume={11},
  number={1},
  pages={18},
  year={2025},
  publisher={Nature Publishing Group UK London}
 }

Installation

This repository supports Python 3.8 and PyTorch 2.2.2. To set up the environment, follow these steps:

# Create and activate a virtual environment
conda create -n DeepSpiro python=3.8
conda activate DeepSpiro

# Install dependencies
pip install -r requirements.txt

Preparing the Data

Depending on your access to data, follow one of the two options below to prepare the required inputs:

Option 1: Testing the Model with Example Data

If you want to test the model, you can use the example data generated by the script:

  1. Run the generate_example_data.py script.
  2. This script will download publicly available Flow Data (spirometry exhalation volume curves) from the UK Biobank website at https://biobank.ndph.ox.ac.uk/showcase/refer.cgi?id=3.
  3. It will generate a synthetic sample.xlsx file containing Flow Data, approximating FVC, FEV1, and PEF values required for testing.

Option 2: Using UK Biobank Data

If you have approved access to the UK Biobank dataset:

  1. Download the required data from the UK Biobank, specifically the following fields:
    • Data-Field 3062: FVC (Forced Vital Capacity),
    • Data-Field 3063: FEV1 (Forced Expiratory Volume in 1 second),
    • Data-Field 3064: PEF (Peak Expiratory Flow),
    • Data-Field 3066: Flow Data (exhalation volume curve).
  2. Populate the sample.xlsx file (located in the data directory) with the data obtained from the above fields.
    • Note: All four data fields (FVC, FEV1, PEF, and Flow Data) are mandatory for accurate analysis.

Running the Model with Additional Information

To run the model effectively, you need to provide additional patient information as arguments: Age, Sex, and Smoking Status. These correspond to the following UK Biobank fields:

  • Age: Data-Field 21003
  • Sex: Data-Field 31
  • Smoking Status: Data-Field 1249

Running the Model

After preparing the data:

  1. Activate the environment:
conda activate DeepSpiro
  1. Run the prediction script with the required arguments:
python run_predict.py -data ./data/sample.xlsx -age 53 -sex 0 -smoke 1
  • -data: Path to the sample.xlsx file containing Flow Data.
  • -age: Age of the patient (default: 53).
  • -sex: Sex of the patient (0 for female, 1 for male; default: 0).
  • -smoke: Smoking status (0 for non-smoker, 1 for smoker; default: 1). Alternatively, you can omit the arguments to use default values.

Data Sources

The data used in this project is sourced from the UK Biobank, available upon application approval. For more information, visit the UK Biobank website.

About

Code repository for the paper “Deep Learning for Detecting and Early Predicting Chronic Obstructive Pulmonary Disease from Spirogram Time Series”

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages