Skip to content

usman4373/scRNA-seq-navigator-Seurat

Repository files navigation

cover

πŸ“‘ Table of contents

πŸ“ Overview

scRNA-seq-navigator contains a comprehensive single-cell RNA sequencing (scRNA-seq) analysis pipeline designed for processed data following alignment. The pipeline leverages the robust Seurat (Stuart et al., 2019) toolkit for essential analytical steps, including data integration, quality control, and batch-effect correction, and incorporates the ScType database (Ianevski et al., 2022) for refined cell-type annotation. It further facilitates differential gene expression analysis and multi-dimensional visualization to extract meaningful biological insights from complex single-cell datasets.

  • The analyses provide end-to-end solutions for:
    • Data Integration & Processing: Merging multiple datasets and experimental conditions
    • Batch Effect Correction: Addressing technical variations across different platforms and studies
    • Cell Type Identification: Automated annotation using reference gene signatures
    • Advanced Visualization: 2D/3D embeddings and interactive plots
    • Differential Expression: Statistical analysis of gene expression changes
    • Publication-ready Outputs: High-quality figures and comprehensive results
  • Each script represents a modular component of the analysis workflow, allowing researchers to process single-cell data from raw counts to biological interpretation

Script Description
01-samples_merger.R Merges multiple scRNA-seq samples from different conditions into a unified dataset for analysis
02-scRNA-seqr.R Performs comprehensive scRNA-seq analysis, including QC, clustering, cell type annotation, and differential expression analysis
03-multiple-datasets-integrator.R Integrates multiple scRNA-seq datasets using Harmony to remove batch effects
04-umap-tsne-3d-plotr.R Creates interactive 3D visualizations of UMAP and t-SNE embeddings for enhanced data exploration
05-volcano_plotr.R Generates publication-ready volcano plots to visualize differential gene expression results
06-heatmapr.R Produces clustered heatmaps showing gene expression patterns across cell types and conditions
07-boxplotr.R Creates comparative box plots to analyze gene expression differences across experimental conditions and/or cell types

NOTE: The code snippets in this README are for demonstration and can be found in their respective script files.

Installation

  • Navigate to the directory containing the install_packages.R file
  • Open a terminal and run:
Rscript install_packages.R
  • Alternatively, if the above does not work, open the install_packages.R file in RStudio
  • Click the "Source" button in the top-right of the script editor OR in the R console run:
# Navigate to the script directory first, then:
source("install_packages.R")

Input Files Format

  • The analysis workflow is designed for processed scRNA-seq data following alignment.
  • Input should be in 10X Genomics format, consisting of the three standard files for each sample directory:
    • barcodes.tsv.gz
    • features.tsv.gz
    • matrix.mtx.gz

01. Merging Samples

This script performs the initial data integration step for single-cell RNA sequencing (scRNA-seq) analysis by merging multiple samples from two experimental conditions (e.g, Normal and Parkinson's Disease) into a unified Seurat object.

Purpose

  • Load individual scRNA-seq samples from both control and diseased conditions
  • Merge samples within each condition group
  • Combine both condition groups into a final merged dataset
  • Add appropriate metadata for sample tracking and condition identification
  • Export the consolidated data for downstream analysis

Workflow

A. Reading and Creating Seurat Object

sample1 <- Read10X(data.dir = "samples/Normal/GSM7290760")
sample1 <- CreateSeuratObject(counts = sample1, project = "N1")
  • Read10X() Reads the 10X Genomics format data from the specified directory containing three essential files:
    • barcodes.tsv.gz (cell identifiers)
    • features.tsv.gz (gene information)
    • matrix.mtx.gz (count matrix)
  • CreateSeuratObject() Converts the raw count data into a Seurat object, which is the standard data structure

B. Merging Multiple Samples

  • Merging all control samples:
normal_merged <- merge(sample1, y = c(sample2, sample3, sample4, sample5),
                       add.cell.ids = c("N1", "N2", "N3", "N4", "N5"))
  • Merging all diseased samples:
LM_merged <- merge(sample6, y = c(sample7, sample8, sample9, sample10),
                   add.cell.ids = c("LM1", "LM2", "LM3", "LM4", "LM5"))

C. Adding Condition Metadata

  • Control samples:
normal_merged <- AddMetaData(object = normal_merged,
                             metadata = "Normal", col.name = "condition")
  • Diseased samples:
LM_merged <- AddMetaData(object = LM_merged, 
                         metadata = "LM", col.name = "condition")

D. Final Merging

  • Merging control and diseased samples:
merged_samples <- merge(normal_merged, LM_merged)

02. Single Cell RNA Sequencing Analysis

This script performs comprehensive single-cell RNA sequencing analysis from quality control through cell type annotation and differential expression analysis.

Workflow

A. Data Loading & Quality Control

  • Loads merged dataset from previous script (samples merger)
rawdata <- merged_samples
dim(rawdata)  # 38224 features and 36289 cells
rm(merged_samples)
  • Calculates mitochondrial gene percentage
rawdata[["percent.mt"]] <- PercentageFeatureSet(rawdata, pattern = "^(?i)MT-")
  • Visualizes QC metrics using violin plots
Plot1 <- VlnPlot(rawdata, features = c("nFeature_RNA", "nCount_RNA", "percent.mt"),
                 group.by = "condition", ncol = 3)
Plot1
  • Applies quality filters based on:
    • Gene counts
    • RNA counts
    • Mitochondrial content
filtered_data <- subset(rawdata, subset = nFeature_RNA > 200
                        & nFeature_RNA < 6000 & nCount_RNA > 2000 & percent.mt < 10)
dim(filtered_data) # 38224 features and 16195 cells in filtered data
  • Adjust nFeature_RNA, nCount_RNA, and percent.mt values according to your dataset

Note:

  • Before saving plots, adjust the plot viewing window size in RStudio to match the intended resolution and aspect ratio.
  • After resizing the plot window, save the plot using ggsave() with the following format:
ggsave(file = "raw_data.png", Plot1, dpi = 600, bg = "white")

B. Data Normalization & Feature Selection

  • Normalizes data using log normalization
normalized_data <- NormalizeData(filtered_data, normalization.method = "LogNormalize", scale.factor = 10000)
  • Identifies highly variable genes
normalized_data <- FindVariableFeatures(normalized_data, selection.method = "vst", nfeatures = 2000)
  • Visualizes top variable features
top10 <- head(VariableFeatures(normalized_data), 10)
top10

C. Dimensionality Reduction & Clustering

  • Performs scaling of highly variable features and performs PCA
scaled_data <- ScaleData(normalized_data)
scaled_data <- RunPCA(scaled_data, features = VariableFeatures(object = scaled_data))
pca_plot <- DimPlot(scaled_data, reduction = "pca")
pca_plot

Note:

  • If you want to perform scaling of all genes, then run the following command instead of the previous one:
all.genes <- rownames(normalized_data)
scaled_data <- ScaleData(normalized_data, features = all.genes)
scaled_data <- RunPCA(scaled_data, features = all.genes)
pca_plot <- DimPlot(scaled_data, reduction = "pca")
pca_plot
  • Determines optimal dimensions using elbow plot
elbow <- ElbowPlot(scaled_data, ndim = 50)
elbow
  • Clusters cells using graph-based clustering
data <- FindNeighbors(scaled_data, dims = 1:20)
data <- FindClusters(data, resolution = 0.6, algorithm = 4)
levels(data)
  • Performs non-linear dimensionality reduction (UMAP & t-SNE)
data <- RunUMAP(data, dims = 1:20)
data <- RunTSNE(data, dims = 1:20)
  • Visualizes clusters with and without conditions
umap <- DimPlot(data, reduction = "umap", label = TRUE)
umap

tSNE <- DimPlot(data, reduction = "tsne", label = TRUE)
tSNE

umap_condition <- DimPlot(data, reduction = "umap", label = TRUE,
                          split.by = "condition")
umap_condition

tSNE_condition <- DimPlot(data, reduction = "tsne", label = TRUE,
                          split.by = "condition")
tSNE_condition

PCA and Elbow Plot


UMAP and t-SNE - Cell Clusters


UMAP - Clusters Grouped By Condition

umap-condition


t-SNE - Clusters Grouped By Condition

tsne-condition

D. Cell Type Annotation

  • Uses ScType database for automated cell type identification
  • Annotates clusters based on tissue markers
  • Assigns cell types and handles low-confidence annotations
  • Updates metadata with cell type classifications
# load gene set preparation function
source("https://raw.githubusercontent.com/IanevskiAleksandr/sc-type/master/R/gene_sets_prepare.R")
# load cell type annotation function
source("https://raw.githubusercontent.com/IanevskiAleksandr/sc-type/master/R/sctype_score_.R")

# DB file
db_ <- "https://raw.githubusercontent.com/IanevskiAleksandr/sc-type/master/ScTypeDB_full.xlsx";
# Available tissues: Brain, Lung, Adrenal, Immune system, Eye, Pancreas, Liver, Heart, Intestine, Muscle, Placenta, Spleen, Stomach, Thymus
tissue <- "Immune system"

# prepare gene sets
gs_list <- gene_sets_prepare(db_, tissue)

# ==============================================================================
# Run scType Annotation
# ==============================================================================
# check Seurat object version (scRNA-seq matrix extracted differently in Seurat v4/v5)
seurat_package_v5 <- isFALSE('counts' %in% names(attributes(data[["RNA"]])));
print(sprintf("Seurat object %s is used", ifelse(seurat_package_v5, "v5", "v4")))

# extract scaled scRNA-seq matrix
scRNAseqData_scaled <- if (seurat_package_v5) as.matrix(data[["RNA"]]$scale.data) else as.matrix(data[["RNA"]]@scale.data)

# run ScType
es.max <- sctype_score(scRNAseqData = scRNAseqData_scaled, scaled = TRUE, gs = gs_list$gs_positive, gs2 = gs_list$gs_negative)

# merge by cluster
cL_resutls <- do.call("rbind", lapply(unique(data@meta.data$seurat_clusters), function(cl){
  es.max.cl = sort(rowSums(es.max[ ,rownames(data@meta.data[data@meta.data$seurat_clusters==cl, ])]), decreasing = !0)
  head(data.frame(cluster = cl, type = names(es.max.cl), scores = es.max.cl, ncells = sum(data@meta.data$seurat_clusters==cl)), 10)
}))
sctype_scores <- cL_resutls %>% group_by(cluster) %>% top_n(n = 1, wt = scores)  

# set low-confident (low ScType score) clusters to "unknown"
sctype_scores$type[as.numeric(as.character(sctype_scores$scores)) < sctype_scores$ncells/4] <- "Unknown"
print(sctype_scores[,1:3])
write.csv(sctype_scores, file = "Celltypes.csv")

# Convert sctype_scores to a data frame if it isn't already
sctype_scores <- as.data.frame(sctype_scores)

# Extract the cluster and cell type columns
cluster_ids <- sctype_scores$cluster
cell_types <- sctype_scores$type

# Create a named vector for the new cluster identities
new_cluster_ids <- cell_types
names(new_cluster_ids) <- cluster_ids

# Rename the cluster identities in the object
data <- RenameIdents(data, new_cluster_ids)
levels(data)

# Overlay the identified cell types on UMAP plot
data@meta.data$sctype_classification = ""
for(j in unique(sctype_scores$cluster)){
  cl_type = sctype_scores[sctype_scores$cluster==j,]; 
  data@meta.data$sctype_classification[data@meta.data$seurat_clusters == j] = as.character(cl_type$type[1])
}

E. Manual Cell Type Annotation

  • If you want to annotate cell clusters based on your gene markers, then follow below:
  • Get the ScType database Excel file from here
  • Open the file and delete all existing data rows, keeping only the column headers (tissueType, cellName, geneSymbolmore1, geneSymbolmore2, and shortName)
  • The geneSymbolmore2 and shortName columns are optional and can be left blank.
  • Fill in the columns with your custom data:
    • tissueType: Your tissue of interest (e.g., "Immune")
    • cellName: The specific cell type (e.g., "T-cell")
    • geneSymbolmore1: The positive gene markers for that cell type, separated by commas (e.g., "CD3D,CD3E,CD8A")
    • geneSymbolmore2: Alternative gene marker names for geneSymbolmore1 (e.g., T3D). T3D is an alternative to CD3D
    • shortName: Short name of a specific cell type (e.g., Pro-B)
  • Save the file and provide its path in the code, as shown below
# DB file
db_ <- "data/ScTypeDB_full.xlsx";
tissue <- "Immune system"

F. Visualization & Analysis

  • Generates annotated UMAP/t-SNE plots
  • Creates condition-split visualizations
  • Counts cell distribution across types and conditions
  • Saves annotated dataset
# ==============================================================================
# Count Cells by Cell Type and Condition
# ==============================================================================
# Number of cells in each condition
# Create the table
num_of_cells <- table(data@meta.data$sctype_classification, data@meta.data$condition)
# Convert to data frame
num_of_cells_df <- as.data.frame.matrix(num_of_cells)
num_of_cells_df
# Save as CSV
write.csv(num_of_cells_df, file = "number_of_cells_in_conditions.csv", row.names = TRUE)

# ==============================================================================
# VISUALIZE ANNOTATED DATA
# ==============================================================================

# umap and tsne plots
umap_annt <- DimPlot(data, reduction = "umap", label = FALSE,
                     repel = TRUE, group.by = 'sctype_classification')        
umap_annt

tsne_annt <- DimPlot(data, reduction = "tsne", label = FALSE,
                     repel = TRUE, group.by = 'sctype_classification')        
tsne_annt

umap_annt_condition <- DimPlot(data, reduction = "umap", label = FALSE,
                               repel = TRUE, group.by = 'sctype_classification',
                               split.by = "condition")        
umap_annt_condition

tsne_annt_condition <- DimPlot(data, reduction = "tsne", label = FALSE,
                               repel = TRUE, group.by = 'sctype_classification',
                               split.by = "condition")        
tsne_annt_condition

UMAP and t-SNE Annotated Cell Clusters


UMAP - Annotated Cell Clusters Grouped By Condition

umap-annotated-condition


t-SNE - Annotated Cell Clusters Grouped By Condition

tsne-annotated-condition


G. Differential Expression Analysis

  • Subsets specific cell types if needed
# cells to keep
cells <- c("Memory CD4+ T cells",
           "Naive CD4+ T cells",
           "Cancer cells",
           "CD8+ NKT-like cells",
           "Natural killer  cells")
# Subset based on metadata column
data_subset <- subset(data, subset = sctype_classification %in% cells)
# Verify the subset
levels(data_subset)
table(data_subset@meta.data$sctype_classification)
table(data_subset@meta.data$condition)
saveRDS(data_subset, file = "data_subset.rds")
  • Performs differential expression between:
  • Same cell type across conditions (Normal vs PD)
      data$celltype.condition <- paste(data$sctype_classification, data$condition, sep = "_")
Idents(data) <- "celltype.condition"
levels(data)

dff_markers <- FindMarkers(data, ident.1 = "Memory CD4+ T cells_LM",
                           ident.2 = "Memory CD4+ T cells_Normal", min.pct = 0.1,
                           test.use = "wilcox", verbose = TRUE, only.pos = FALSE)
# Apply FDR
dff_markers$fdr = p.adjust(dff_markers$p_val, method='fdr')
write.csv(dff_markers, file = "Memory CD4+ T cells_DEGs.csv") # Required for volcano plot

dff_up_markers <- subset(dff_markers, avg_log2FC > 0.25)
dff_dn_markers <- subset(dff_markers, avg_log2FC < -0.25)
# Save degs results
write.csv(dff_up_markers, file = "Memory CD4+ T cells_up_markers.csv")
write.csv(dff_dn_markers, file = "Memory CD4+ T cells_dn_markers.csv")
  • Different cell types regardless of condition
Idents(data) <- "sctype_classification"
levels(data)
degs <- FindMarkers(data, ident.1 = "Cancer cells",
                    ident.2 = NULL, test.use = "wilcox",
                    only.pos = FALSE)
# Apply FDR
degs$fdr = p.adjust(degs$p_val, method='fdr')
write.csv(degs, file = "Cancer cells_DEGs.csv") # Required for volcano plot

degs_up <- subset(degs, avg_log2FC > 0.25)
degs_down <- subset(degs, avg_log2FC < -0.25)
# Save degs results
write.csv(degs_up, file = "Cancer_cells_up_markers.csv")
write.csv(degs_down, file = "Cancer_cells_dn_markers.csv")
  • Different cell types in different conditions
data$celltype.condition <- paste(data$sctype_classification, data$condition, sep = "_")
Idents(data) <- "celltype.condition"
levels(data)
# DEGs of Cancer cells in diseased condition vs All other cell types in Normal condition
degs <- FindMarkers(data, ident.1 = "Cancer cells_LM",
                    ident.2 = c("Macrophages_Normal",
                                "Natural killer  cells_Normal",
                                "Basophils_Normal",
                                "Non-classical monocytes_Normal",
                                "CD8+ NKT-like cells_Normal",
                                "Memory CD4+ T cells_Normal",
                                "Myeloid Dendritic cells_Normal",
                                "Pre-B cells_Normal",
                                "Endothelial_Normal",
                                "Classical Monocytes_Normal",
                                "Plasmacytoid Dendritic cells_Normal",
                                "Ξ³Ξ΄-T cells_Normal",
                                "Memory B cells_Normal",
                                "Naive CD4+ T cells_Normal"
                    ), test.use = "wilcox",
                    only.pos = FALSE)
# Apply FDR
degs$fdr = p.adjust(degs$p_val, method='fdr')
write.csv(degs, file = "Cancer cells_DEGs.csv") # Required for volcano plot

degs_up <- subset(degs, avg_log2FC > 0.25)
degs_down <- subset(degs, avg_log2FC < -0.25)
# Save degs results
write.csv(degs_up, file = "Cancer_cells_up_markers.csv")
write.csv(degs_down, file = "Cancer_cells_dn_markers.csv")
  • Identifies up/down-regulated genes
  • Applies statistical filtering and FDR correction
  • Exports marker gene lists

Note:

  • Join layers of the Seurat object before performing differential expression
data <- JoinLayers(data)

Outputs

  • The script produces:
    • Quality control plots
    • Clustering visualizations
    • Annotated cell type maps
    • Differential expression results
    • Processed datasets for downstream analysis (e.g Cell-cell communication, Trajectory analysis etc)

03. Multiple Datasets Integration

This script performs comprehensive integration of multiple single-cell RNA sequencing datasets to address technical variations while preserving biological signals.

Workflow

A. Data Preparation

  • Loads and combines samples from multiple experimental datasets
# Dataset 01 Samples
s1 <- Read10X("samples/GSE231559/GSM7290763")
s1 <- CreateSeuratObject(s1, project = "GSE231559-CRC1")

s2 <- Read10X("samples/GSE231559/GSM7290769")
s2 <- CreateSeuratObject(s2, project = "GSE231559-CRC2")

s3 <- Read10X("samples/GSE231559/GSM7290770")
s3 <- CreateSeuratObject(s3, project = "GSE231559-N1")

s4 <- Read10X("samples/GSE231559/GSM7290772")
s4 <- CreateSeuratObject(s4, project = "GSE231559-N2")

# Dataset 02 Samples
s5 <- Read10X("samples/GSE261012/GSM6432445")
s5 <- CreateSeuratObject(s5, project = "GSE261012-CRC1")

s6 <- Read10X("samples/GSE261012/GSM6432446")
s6 <- CreateSeuratObject(s6, project = "GSE261012-CRC2")

s7 <- Read10X("samples/GSE261012/GSM6432447")
s7 <- CreateSeuratObject(s7, project = "GSE261012-N1")

s8 <- Read10X("samples/GSE261012/GSM6432448")
s8 <- CreateSeuratObject(s8, project = "GSE261012-N2")
  • Adds metadata labels for dataset origin and experimental conditions
# Add metadata labels
s1$dataset <- "GSE231559";  s1$condition <- "CRC"
s2$dataset <- "GSE231559";  s2$condition <- "CRC"
s3$dataset <- "GSE231559";  s3$condition <- "Control"
s4$dataset <- "GSE231559";  s4$condition <- "Control"

s5$dataset <- "GSE261012";  s5$condition <- "CRC"
s6$dataset <- "GSE261012";  s6$condition <- "CRC"
s7$dataset <- "GSE261012";  s7$condition <- "Control"
s8$dataset <- "GSE261012";  s8$condition <- "Control"
  • Merges all samples into a unified analysis object
# combine all samples
combined <- merge(s1, y = c(s2,s3,s4,s5,s6,s7,s8),
                  add.cell.ids = c("GSE231559-CRC1","GSE231559-CRC2","GSE231559-N1","GSE231559-N2",
                                   "GSE261012-CRC1","GSE261012-CRC2","GSE261012-N1","GSE261012-N2"))
# Verify the merge
table(combined$condition)
table(combined$orig.ident)
dim(combined)

B. Quality Control & Preprocessing

  • Performs quality control, normalization, highly variable features, scaling, and PCA of the data (same as scRNA-seqr)
  • Performs cell clustering
data <- FindNeighbors(data, dims = 1:20, reduction = "pca")
data <- FindClusters(data, resolution = 0.6, algorithm = 4,
                         cluster.name = "raw_clusters")
  • Generates low-dimensional embeddings (UMAP/t-SNE) showing raw cluster patterns
data <- RunUMAP(data, dims = 1:20, reduction = "pca",
                    reduction.name = "umap.raw")

data <- RunTSNE(data, dims = 1:20, reduction = "pca",
                    reduction.name = "tsne.raw")
  • Visualizes inherent batch effects and sample separation
umap_data <- DimPlot(data, reduction = "umap.raw")
umap_data

tsne_data <- DimPlot(data, reduction = "tsne.raw")
tsne_data

umap_condition_data <- DimPlot(data, reduction = "umap.raw", group.by = "condition")
umap_condition_data

tsne_condition_data <- DimPlot(data, reduction = "tsne.raw", group.by = "condition")
tsne_condition_data

C. Batch Effect Correction

  • Applies advanced integration methods to remove technical variability
  • Creates harmonized dimensional space while preserving biological variation
  • Maintains dataset-specific biological signals while minimizing batch effects
crt_data <- RunHarmony(data, group.by.vars = "dataset")

E. Integrated Data Analysis

  • Re-clusters cells using integrated dimensions
crt_data <- FindNeighbors(crt_data, reduction = "harmony", dims = 1:20)
crt_data <- FindClusters(crt_data, resolution = 0.6, algorithm = 4,
                         cluster.name = "harmony_clusters")
  • Generates new embeddings on corrected data and visualizes corrected data
crt_data <- RunUMAP(crt_data, reduction = "harmony",
                    dims = 1:20, reduction.name = "umap.harmony")
crt_data <- RunTSNE(crt_data, reduction = "harmony", dims = 1:20,
                    reduction.name = "tsne.harmony", reduction.key = "tsneharmony_")

umap_crt <- DimPlot(crt_data, reduction = "umap.harmony")
umap_crt

tsne_crt <- DimPlot(crt_data, reduction = "tsne.harmony")
tsne_crt

umap_crt_condition <- DimPlot(crt_data, reduction = "umap.harmony", group.by = "condition")
umap_crt_condition

tsne_crt_condition <- DimPlot(crt_data, reduction = "tsne.harmony", group.by = "condition")
tsne_crt_condition
  • Creates comparative visualizations showing integration effectiveness
# Add titles to individual UMAP plots
umap_before <- umap_data + ggtitle("Clusters Before Batch Correction")
umap_after  <- umap_crt  + ggtitle("Clusters After Batch Correction")

# umap
combine_umap <- umap_before + umap_after
combine_umap

# Add titles to individual t-SNE plots
tsne_before <- tsne_data + ggtitle("Clusters Before Batch Correction")
tsne_after  <- tsne_crt  + ggtitle("Clusters After Batch Correction")

# tsne
combine_tsne <- tsne_before + tsne_after
combine_tsne

UMAP - Before and After Batch-Effect Correction

umap-harmony


t-SNE - Before and After Batch-Effect Correction

tsne-harmony

F. Cell Type Annotation

  • Automates cell type identification using reference gene signatures
  • Assigns cell types to clusters based on marker expression patterns
  • Verifies cell type annotation
  • Quantifies cell distribution across types and conditions
  • Enables flexible subsetting for focused analysis of specific populations

G. Differential Expression Analysis

  • Performs comprehensive gene expression comparisons:
    • Within cell types across experimental conditions
    • Between different cell types regardless of conditions
    • Complex multi-factor comparisons between conditions
  • Identifies statistically significant differentially expressed genes

Outputs

  • This script produes:
    • Publication-ready visualizations
    • Fully annotated datasets
    • Comprehensive differential expression results

04. 3D UMAP and t-SNE Plots

This script creates interactive 3D visualizations of single-cell data using UMAP and t-SNE dimensionality reduction methods.

Workflow

A. 3D UMAP and t-SNE Generation

  • Computes UMAP and t-SNE with three components for 3D coordinates
  • Extracts embedding data and cluster information
  • Creates interactive 3D scatter plots colored by cell clusters
  • Exports visualizations in multiple formats (PNG, HTML)

B. Gene Expression Visualization

  • Maps gene expression patterns onto 3D UMAP and t-SNE space
  • Uses HLA-DQA1 gene as an example with adjustable expression scaling
  • Creates color-coded plots showing expression levels
  • Generates interactive plots with cell-level information

C. View 3D UMAP and t-SNE Plots

Description Link
3D umap 3d-umap
Gene Expression in 3D umap 3d-gene-umap
3D tsne 3d-tsne
Gene Expression in 3D tsne 3d-gene-tsne

Outputs

  • The script produces:
    • Interactive 3D UMAP and t-SNE plots
    • Gene expression overlays in 3D space
    • Multiple export formats (static PNG, interactive HTML)
    • Enhanced visualization capabilities for data exploration

05. Volcano Plot

This script creates volcano plots of specific cell types to visualize differentially expressed genes between experimental conditions.

Workflow

A. Data Preparation

  • Loads differential expression results from CSV files
  • Formats data for volcano plot visualization
  • Sets up gene identifiers and statistical columns
file_path <- "Cancer cells_DEGs.csv"
degs <- read_csv(file_path, show_col_types = FALSE)
degs <- as.data.frame(degs)              
rownames(degs) <- degs$gene              
degs$gene <- NULL

B. Plot Generation

  • Uses the EnhancedVolcano package for advanced plotting
  • Maps log2 fold changes against adjusted p-values (FDR)
  • Applies statistical cutoffs for significance
  • Customizes colors for different significance categories
volc <- EnhancedVolcano(
  degs,
  lab              = NA,
  x                = "avg_log2FC",
  y                = "fdr",
  pCutoff          = 0.05,
  FCcutoff         = 2,  # Adjust log FC value cutoff
  title            = "Cancer cells-LM vs All Other Cell Types-Normal",   # Change according to your celltype degs file
  pointSize        = 2.0,
  labSize          = 3.0,
  legendLabSize    = 10,
  legendPosition    = "right",
  legendIconSize   = 4.0,
  drawConnectors   = FALSE,
  widthConnectors  = 0.5,
  gridlines.major  = TRUE,
  gridlines.minor  = TRUE,
  col = c(
    'grey50',             # Insignificant p-value and log2FC
    '#1DCD9F',            # Insignificant log2FC
    'royalblue',          # Insignificant p-value
    '#FF0B55'             # Significant p-value and log2FC
  ))
  • Adjust FCcutoff and title according to your data

C. Visualization & Export

  • Creates publication-quality volcano plots
  • Highlights significantly up/down-regulated genes
  • Saves high-resolution images for analysis

volcano


06. Heatmap

This script generates gene-specific heatmaps to visualize gene expression patterns across cell types and conditions.

Workflow

A. Data Extraction

  • Selects genes of interest for visualization
genes_of_interest <- c(
  "CD247","RAC1","CALM1","CD3D","HLA-DQA1",
  "HLA-DQA2","HLA-DPA1","HLA-DRA","CD74","CD3G"
)
  • Extracts expression data from the Seurat object
  • Combines expression data with metadata
expr_mat <- GetAssayData(data, assay = "RNA", layer = "data")[genes_of_interest, ]
expr_df <- as.data.frame(t(expr_mat))
metadata <- data@meta.data
expr_df$celltype <- metadata$sctype_classification
expr_df$condition <- metadata$condition

B. Expression Analysis

  • Calculates average expression per cell type and condition
  • Creates expression matrices for heatmap plotting
  • Saves expression data for documentation
# Step 01 - Z-SCORE each gene across ALL CELLS (cell-level z-scores)
# This centers & scales each gene using all cells: mean = 0, sd = 1
expr_df_zcell <- expr_df %>%
  mutate(across(all_of(genes_of_interest), ~ scale(.)[, 1]))

Step 02 - AVERAGE z-scores WITHIN each celltype Γ— condition group
z_group_df <- expr_df_zcell %>%
  group_by(celltype, condition) %>%
  summarise(across(all_of(genes_of_interest), mean, .names = "{.col}"), .groups = "drop")

# Create a column name that combines celltype and condition (for rownames later)
z_group_df <- z_group_df %>%
  mutate(group = paste(celltype, condition, sep = "_"))

# Save the group-level z-scores table
write.csv(z_group_df, "Genes-z-scores.csv", row.names = FALSE)

C. Plot Generation

  • Prepares annotation colors for conditions and cell types
  • Uses ComplexHeatmap for clustered visualization
  • Applies color schemes and formatting
  • Exports high-quality heatmap images
# Build matrix genes x groups (ComplexHeatmap expects rows = genes, cols = samples/groups)
expr_mat_group <- as.matrix(z_group_df[, genes_of_interest])  # rows = groups, cols = genes
rownames(expr_mat_group) <- z_group_df$group

# transpose so rows = genes and columns = groups (as in your original)
expr_mat_group <- t(expr_mat_group)

# Get unique conditions and celltypes
conditions <- unique(z_group_df$condition)
celltypes <- unique(z_group_df$celltype)

# Create the desired column order: all CRC groups first, then all Control groups
desired_order <- c()
for(cond in conditions) {  # This will loop through conditions in their natural order
  for(ct in celltypes) {
    group_name <- paste(ct, cond, sep = "_")
    if(group_name %in% colnames(expr_mat_group)) {
      desired_order <- c(desired_order, group_name)
    }
  }
}

# Reorder the matrix columns
expr_mat_group <- expr_mat_group[, desired_order, drop = FALSE]

# Extract condition and celltype per group (match order from reordered matrix)
annotation_col <- data.frame(
  condition = factor(sub(".*_", "", colnames(expr_mat_group))),
  celltype = factor(sub("_.*", "", colnames(expr_mat_group)))
)
rownames(annotation_col) <- colnames(expr_mat_group)

# Define your full color vector
all_colors <- c(
  "#9e0142","#d53e4f","#f46d43","#fdae61","#fee08b",
  "#abdda4","#66c2a5","#3288bd","#5e4fa2","#6090D8",
  "#4C825D","#8CAE9E","#D5AB85","#99610A","#C38F16",
  "#FFCD00","#C0532B","#B50200","#67322E","#B47E83",
  "#807B7F","#8C86A0","#674D53","#508CA7","#0E2A4D"
)

# Extract unique cell types in the order they appear in annotation_col
celltypes <- unique(annotation_col$celltype)

# Assign colors to cell types (truncate color vector if needed)
celltype_colors <- setNames(all_colors[1:length(celltypes)], celltypes)

# Build annotation color list
ann_colors <- list(
  condition = c(LM = "#2c98a0", Normal = "#67dba5"),  # adjust for more conditions if needed
  celltype = celltype_colors
)

# Use a diverging color scale centered at 0 (z-scores)
hcolors <- rev(paletteer::paletteer_c("ggthemes::Red-Blue Diverging", n = 200))

ht <- Heatmap(
  expr_mat_group,
  name = "Z-score",
  col = hcolors,
  cluster_rows = TRUE,
  cluster_columns = TRUE,  # Turn off column clustering to use manual order
  show_row_names = TRUE,
  show_column_names = FALSE,
  top_annotation = HeatmapAnnotation(df = annotation_col, col = ann_colors,
                                     # Add borders to annotation cells
                                     gp = gpar(col = "black", lwd = 0.8)),
  # Add borders to heatmap cells
  rect_gp = gpar(col = "black", lwd = 0.8),
  heatmap_legend_param = list(
    legend_width = unit(0.8, "cm"),
    direction = "vertical",
    title_position = "topleft"
  ),
  use_raster = FALSE
)
ht

# Save as PNG (high res)
png("Heatmap.png", width = 3000, height = 2000, res = 300)
draw(ht)
dev.off()

07. Boxplot

This script creates box plots to compare gene expression across conditions and specific cell types.

Workflow

Condition-Based Comparison

  • Extracts expression data for specific genes
  • Compares the expression between the Control and Diseased conditions
  • Creates box plots with jittered points
  • Applies custom color schemes
# Create a data frame with gene expression and conditions
boxplot_df <- data.frame(gene = GetAssayData(data, assay = "RNA", layer = "data")["HSP90AA1",], 
                         condition = data$condition)

custom_colors <- c("LM" = "#FF3200FF", "Normal" = "#0076BBFF") #, "Metastasis" = "#2A9D8F")

box <- ggplot(boxplot_df, aes(x = condition, y = gene)) + 
  geom_jitter(aes(color = condition), width = 0.2, size = 1.5, alpha = 0.6, show.legend = FALSE) +  
  geom_boxplot(aes(fill = condition), width = 0.7, outlier.shape = NA, alpha = 0.6, color = "black") +  
  scale_fill_manual(values = custom_colors) +
  scale_color_manual(values = custom_colors) +
  labs(title = "HSP90AA1", x = "Condition", y = "Expression") +
  theme_minimal(base_size = 14) +
  theme(
    plot.title = element_text(size = 18, face = "bold", hjust = 0.5),
    axis.text.x = element_text(angle = 30, hjust = 1, size = 12),
    panel.grid.minor = element_blank(),
    panel.border = element_blank(),
    axis.line = element_line(color = "black"),
    axis.line.x.top    = element_blank(),
    axis.line.y.right  = element_blank(),
    axis.ticks.x.top   = element_blank(),
    axis.ticks.y.right = element_blank(),
    axis.ticks.x.bottom   = element_line(),
    axis.ticks.y.left = element_line(),
    panel.grid.major = element_line(color = "grey80", size = 0.4)
  )
box
# Save the plot
ggsave("HSP90AA1-condition.png", plot = box, bg = "white", dpi = 600)
  • Adjust gene and condition names according to your data

Cell Type-Specific Analysis

  • Filters data for specific cell types
  • Compares expression within cell types across conditions
  • Generates focused visualizations
# Specify the cell type of interest
target_celltype <- "Memory CD4+ T cells"

# Logical filter: keep rows that start with the cell type (e.g., "Cancer cells")
cell_indices <- startsWith(data$celltype.condition, paste0(target_celltype, "_"))

# Create filtered data frame for box plot
boxplot_df <- data.frame(
  gene = GetAssayData(data, assay = "RNA", layer = "data")["HSP90AA1", cell_indices], 
  condition = data$celltype.condition[cell_indices]
)

# Recalculate group levels for coloring
groups <- unique(boxplot_df$condition)

custom_colors <- setNames(
  ifelse(endsWith(groups, "_LM"), "#FF3200FF", "#0076BBFF"),
  groups
)

# Plot
box <- ggplot(boxplot_df, aes(x = condition, y = gene)) + 
  geom_jitter(aes(color = condition), width = 0.2, size = 1.5, alpha = 0.6, show.legend = FALSE) +  
  geom_boxplot(aes(fill = condition), width = 0.7, outlier.shape = NA, alpha = 0.6, color = "black") +  
  scale_fill_manual(values = custom_colors) +
  scale_color_manual(values = custom_colors) +
  labs(title = paste("HSP90AA1"), x = "Condition", y = "Expression") +
  theme_minimal(base_size = 14) +
  theme(
    plot.title = element_text(size = 18, face = "bold", hjust = 0.5),
    axis.text.x = element_text(angle = 30, hjust = 1, size = 12),
    panel.grid.minor = element_blank(),
    panel.border = element_blank(),
    axis.line = element_line(color = "black"),
    axis.line.x.top    = element_blank(),
    axis.line.y.right  = element_blank(),
    axis.ticks.x.top   = element_blank(),
    axis.ticks.y.right = element_blank(),
    axis.ticks.x.bottom   = element_line(),
    axis.ticks.y.left = element_line(),
    panel.grid.major = element_line(color = "grey80", size = 0.4)
  )
box
# Save the plot
ggsave("HSP90AA1-Memory-CD4-T-cells.png", plot = box, bg = "white", dpi = 600)
  • Adjust cell-type, gene, and condition names according to your data

Visualization Customization

  • Uses ggplot2 for advanced plotting
  • Applies consistent theming and styling
  • Creates publication-ready figures

πŸ“š Citation

  • If you use this analysis pipeline in your work, please cite the repository:
scRNA-seq-navigator. GitHub: https://github.com/usman4373/scRNA-seq-navigator

🀝 Acknowledgements

  • This pipeline implements established single-cell analysis methods, including Seurat (Stuart et al., 2019), Harmony (Korsunsky et al., 2019), and ScType (Ianevski et al., 2022). Please also cite the original publications of these tools when using their respective functionalities in your research.

  • Please also cite other tools used in the analysis pipeline:
    • dplyr - For efficient data manipulation and transformation
    • patchwork - For seamless arrangement of multiple ggplot2 plots
    • ggplot2 - For creating elegant and customizable data visualizations
    • openxlsx - For reading and writing Excel files in R
    • HGNChelper - For handling and validating HGNC gene symbols
    • plotly - For creating interactive web-based visualizations
    • ComplexHeatmap - For generating sophisticated and annotated heatmaps
    • circlize - For circular visualization and heatmap enhancements
    • paletteer - For accessing comprehensive color palette collections
    • EnhancedVolcano - For publication-ready volcano plot generation
    • viridis - For colorblind-friendly color scales in visualizations
    • htmlwidgets - For embedding interactive JavaScript visualizations in R
    • webshot2 - For converting interactive plots to static images
    • readr - For fast and tidy data import functionality
    • Other open-source R packages and dependencies

About

scRNA-seq-navigator is a post-alignment pipeline that utilizes Seurat for data processing and integration and ScType for automated cell type annotation.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages