Skip to content

delnorte-sd/Inflamed-Genes-Regression

Repository files navigation

Psoriasis Inflammation Prediction

Machine learning analysis to identify key inflammatory genes in psoriasis using RNA-seq data and validate biosensor targets.

Overview

This project uses RNA-seq data from psoriatic and healthy skin samples to:

  • Build a classification model to distinguish inflamed vs. normal tissue
  • Identify the most predictive genes using SHAP analysis
  • Validate NF-κB pathway genes as biomarkers for inflammation
  • Support biosensor development targeting NO and H₂O₂

Dataset

Source: GSE54456 from NCBI GEO

  • Samples: 174 skin biopsies (92 psoriatic, 82 normal)
  • Type: Bulk RNA-seq expression data (RPKM values)
  • File: GSE54456_RPKM_samples.txt

Requirements

pip install pandas numpy scikit-learn matplotlib seaborn shap

Quick Start

  1. Download the expression data file GSE54456_RPKM_samples.txt from GEO
  2. Place it in the same directory as the script
  3. Run the analysis:
python psoriasis_analysis.py

What It Does

1. Data Processing

  • Loads gene expression data for 174 samples
  • Labels samples as inflamed (psoriatic) or normal based on metadata
  • Filters out low-expression genes (mean RPKM > 1.0)
  • Applies log2 transformation for normalization

2. Feature Selection

  • Selects top 500 most variable genes
  • Force-includes NF-κB pathway genes (NFKB1, NFKB2, RELA, etc.)
  • Creates feature matrix for machine learning

3. Model Training

  • Splits data: 70% training, 30% testing
  • Trains logistic regression with L2 regularization (C=0.1)
  • Evaluates performance with accuracy and ROC-AUC

4. SHAP Analysis

  • Calculates SHAP values to identify most predictive genes
  • Ranks genes by their contribution to inflammation prediction
  • Highlights NF-κB pathway genes

Key Results

The analysis outputs:

  • Model Performance: Training/testing accuracy, ROC-AUC score
  • Top 20 Predictive Genes: Ranked by SHAP importance
  • NF-κB Pathway Analysis: Specific evaluation of inflammation regulators
  • Biological Interpretation: Connection to NO/H₂O₂ biosensor targets

Why This Matters

Biological Context

  • NF-κB is a master regulator of inflammation
  • In psoriasis, activated NF-κB drives inflammatory gene expression
  • NF-κB activation leads to production of NO and H₂O₂

Biosensor Validation

This analysis validates that:

  • NO production: NF-κB activates iNOS enzyme → produces nitric oxide
  • H₂O₂ production: NF-κB activates NADPH oxidase → produces hydrogen peroxide

Your plasmid circuit detecting NO and H₂O₂ targets molecules downstream of the key inflammatory genes identified here.

Output

The script prints detailed results including:

  • Data loading and preprocessing statistics
  • Model performance metrics
  • Classification report
  • Top predictive genes with SHAP scores
  • NF-κB pathway gene rankings
  • Biological interpretation

Notes

  • The model uses regularization (C=0.1) to reduce overfitting
  • Sample labels are manually curated from GEO metadata
  • SHAP provides interpretable feature importance
  • NF-κB genes are force-included regardless of variance

References

  • Dataset: Swindell et al., GSE54456, NCBI GEO
  • SHAP: Lundberg & Lee (2017), "A Unified Approach to Interpreting Model Predictions"
  • Biological context: NF-κB pathway in psoriasis pathogenesis

License

Open source for research and educational purposes.

About

Machine learning analysis to identify key inflammatory genes in psoriasis using RNA-seq data and validate biosensor targets.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors