Machine learning analysis to identify key inflammatory genes in psoriasis using RNA-seq data and validate biosensor targets.
This project uses RNA-seq data from psoriatic and healthy skin samples to:
- Build a classification model to distinguish inflamed vs. normal tissue
- Identify the most predictive genes using SHAP analysis
- Validate NF-κB pathway genes as biomarkers for inflammation
- Support biosensor development targeting NO and H₂O₂
Source: GSE54456 from NCBI GEO
- Samples: 174 skin biopsies (92 psoriatic, 82 normal)
- Type: Bulk RNA-seq expression data (RPKM values)
- File:
GSE54456_RPKM_samples.txt
pip install pandas numpy scikit-learn matplotlib seaborn shap- Download the expression data file
GSE54456_RPKM_samples.txtfrom GEO - Place it in the same directory as the script
- Run the analysis:
python psoriasis_analysis.py- Loads gene expression data for 174 samples
- Labels samples as inflamed (psoriatic) or normal based on metadata
- Filters out low-expression genes (mean RPKM > 1.0)
- Applies log2 transformation for normalization
- Selects top 500 most variable genes
- Force-includes NF-κB pathway genes (NFKB1, NFKB2, RELA, etc.)
- Creates feature matrix for machine learning
- Splits data: 70% training, 30% testing
- Trains logistic regression with L2 regularization (C=0.1)
- Evaluates performance with accuracy and ROC-AUC
- Calculates SHAP values to identify most predictive genes
- Ranks genes by their contribution to inflammation prediction
- Highlights NF-κB pathway genes
The analysis outputs:
- Model Performance: Training/testing accuracy, ROC-AUC score
- Top 20 Predictive Genes: Ranked by SHAP importance
- NF-κB Pathway Analysis: Specific evaluation of inflammation regulators
- Biological Interpretation: Connection to NO/H₂O₂ biosensor targets
- NF-κB is a master regulator of inflammation
- In psoriasis, activated NF-κB drives inflammatory gene expression
- NF-κB activation leads to production of NO and H₂O₂
This analysis validates that:
- NO production: NF-κB activates iNOS enzyme → produces nitric oxide
- H₂O₂ production: NF-κB activates NADPH oxidase → produces hydrogen peroxide
Your plasmid circuit detecting NO and H₂O₂ targets molecules downstream of the key inflammatory genes identified here.
The script prints detailed results including:
- Data loading and preprocessing statistics
- Model performance metrics
- Classification report
- Top predictive genes with SHAP scores
- NF-κB pathway gene rankings
- Biological interpretation
- The model uses regularization (C=0.1) to reduce overfitting
- Sample labels are manually curated from GEO metadata
- SHAP provides interpretable feature importance
- NF-κB genes are force-included regardless of variance
- Dataset: Swindell et al., GSE54456, NCBI GEO
- SHAP: Lundberg & Lee (2017), "A Unified Approach to Interpreting Model Predictions"
- Biological context: NF-κB pathway in psoriasis pathogenesis
Open source for research and educational purposes.