GitHub - usman4373/scRNA-seq-navigator-Seurat: scRNA-seq-navigator is a post-alignment pipeline that utilizes Seurat for data processing and integration and ScType for automated cell type annotation.

📑 Table of contents

📝 Overview
Installation
Input Files Format
01. Merging Samples
02. Single Cell RNA Sequencing Analysis
03. Multiple Datasets Integration
04. 3D UMAP and t-SNE Plots
05. Volcano Plot
06. Heatmap
07. Boxplot
📚 Citation
🤝 Acknowledgements

📝 Overview

scRNA-seq-navigator contains a comprehensive single-cell RNA sequencing (scRNA-seq) analysis pipeline designed for processed data following alignment. The pipeline leverages the robust Seurat (Stuart et al., 2019) toolkit for essential analytical steps, including data integration, quality control, and batch-effect correction, and incorporates the ScType database (Ianevski et al., 2022) for refined cell-type annotation. It further facilitates differential gene expression analysis and multi-dimensional visualization to extract meaningful biological insights from complex single-cell datasets.

The analyses provide end-to-end solutions for:
- Data Integration & Processing: Merging multiple datasets and experimental conditions
- Batch Effect Correction: Addressing technical variations across different platforms and studies
- Cell Type Identification: Automated annotation using reference gene signatures
- Advanced Visualization: 2D/3D embeddings and interactive plots
- Differential Expression: Statistical analysis of gene expression changes
- Publication-ready Outputs: High-quality figures and comprehensive results
Each script represents a modular component of the analysis workflow, allowing researchers to process single-cell data from raw counts to biological interpretation

Script	Description
`01-samples_merger.R`	Merges multiple scRNA-seq samples from different conditions into a unified dataset for analysis
`02-scRNA-seqr.R`	Performs comprehensive scRNA-seq analysis, including QC, clustering, cell type annotation, and differential expression analysis
`03-multiple-datasets-integrator.R`	Integrates multiple scRNA-seq datasets using Harmony to remove batch effects
`04-umap-tsne-3d-plotr.R`	Creates interactive 3D visualizations of UMAP and t-SNE embeddings for enhanced data exploration
`05-volcano_plotr.R`	Generates publication-ready volcano plots to visualize differential gene expression results
`06-heatmapr.R`	Produces clustered heatmaps showing gene expression patterns across cell types and conditions
`07-boxplotr.R`	Creates comparative box plots to analyze gene expression differences across experimental conditions and/or cell types

NOTE: The code snippets in this README are for demonstration and can be found in their respective script files.

Installation

Navigate to the directory containing the install_packages.R file
Open a terminal and run:

Rscript install_packages.R

Alternatively, if the above does not work, open the install_packages.R file in RStudio
Click the "Source" button in the top-right of the script editor OR in the R console run:

# Navigate to the script directory first, then:
source("install_packages.R")

Input Files Format

The analysis workflow is designed for processed scRNA-seq data following alignment.
Input should be in 10X Genomics format, consisting of the three standard files for each sample directory:
- barcodes.tsv.gz
- features.tsv.gz
- matrix.mtx.gz

01. Merging Samples

This script performs the initial data integration step for single-cell RNA sequencing (scRNA-seq) analysis by merging multiple samples from two experimental conditions (e.g, Normal and Parkinson's Disease) into a unified Seurat object.

Purpose

Load individual scRNA-seq samples from both control and diseased conditions
Merge samples within each condition group
Combine both condition groups into a final merged dataset
Add appropriate metadata for sample tracking and condition identification
Export the consolidated data for downstream analysis

Workflow

A. Reading and Creating Seurat Object

sample1 <- Read10X(data.dir = "samples/Normal/GSM7290760")
sample1 <- CreateSeuratObject(counts = sample1, project = "N1")

Read10X() Reads the 10X Genomics format data from the specified directory containing three essential files:
- barcodes.tsv.gz (cell identifiers)
- features.tsv.gz (gene information)
- matrix.mtx.gz (count matrix)
CreateSeuratObject() Converts the raw count data into a Seurat object, which is the standard data structure

B. Merging Multiple Samples

Merging all control samples:

normal_merged <- merge(sample1, y = c(sample2, sample3, sample4, sample5),
                       add.cell.ids = c("N1", "N2", "N3", "N4", "N5"))

Merging all diseased samples:

LM_merged <- merge(sample6, y = c(sample7, sample8, sample9, sample10),
                   add.cell.ids = c("LM1", "LM2", "LM3", "LM4", "LM5"))

C. Adding Condition Metadata

Control samples:

normal_merged <- AddMetaData(object = normal_merged,
                             metadata = "Normal", col.name = "condition")

Diseased samples:

LM_merged <- AddMetaData(object = LM_merged, 
                         metadata = "LM", col.name = "condition")

D. Final Merging

Merging control and diseased samples:

merged_samples <- merge(normal_merged, LM_merged)

02. Single Cell RNA Sequencing Analysis

This script performs comprehensive single-cell RNA sequencing analysis from quality control through cell type annotation and differential expression analysis.

Workflow

A. Data Loading & Quality Control

Loads merged dataset from previous script (samples merger)

rawdata <- merged_samples
dim(rawdata)  # 38224 features and 36289 cells
rm(merged_samples)

Calculates mitochondrial gene percentage

rawdata[["percent.mt"]] <- PercentageFeatureSet(rawdata, pattern = "^(?i)MT-")

Visualizes QC metrics using violin plots

Plot1 <- VlnPlot(rawdata, features = c("nFeature_RNA", "nCount_RNA", "percent.mt"),
                 group.by = "condition", ncol = 3)
Plot1

Applies quality filters based on:
- Gene counts
- RNA counts
- Mitochondrial content

filtered_data <- subset(rawdata, subset = nFeature_RNA > 200
                        & nFeature_RNA < 6000 & nCount_RNA > 2000 & percent.mt < 10)
dim(filtered_data) # 38224 features and 16195 cells in filtered data

Adjust nFeature_RNA, nCount_RNA, and percent.mt values according to your dataset

Note:

Before saving plots, adjust the plot viewing window size in RStudio to match the intended resolution and aspect ratio.
After resizing the plot window, save the plot using ggsave() with the following format:

ggsave(file = "raw_data.png", Plot1, dpi = 600, bg = "white")

B. Data Normalization & Feature Selection

Normalizes data using log normalization

normalized_data <- NormalizeData(filtered_data, normalization.method = "LogNormalize", scale.factor = 10000)

Identifies highly variable genes

normalized_data <- FindVariableFeatures(normalized_data, selection.method = "vst", nfeatures = 2000)

Visualizes top variable features

top10 <- head(VariableFeatures(normalized_data), 10)
top10

C. Dimensionality Reduction & Clustering

Performs scaling of highly variable features and performs PCA

scaled_data <- ScaleData(normalized_data)
scaled_data <- RunPCA(scaled_data, features = VariableFeatures(object = scaled_data))
pca_plot <- DimPlot(scaled_data, reduction = "pca")
pca_plot

Note:

If you want to perform scaling of all genes, then run the following command instead of the previous one:

all.genes <- rownames(normalized_data)
scaled_data <- ScaleData(normalized_data, features = all.genes)
scaled_data <- RunPCA(scaled_data, features = all.genes)
pca_plot <- DimPlot(scaled_data, reduction = "pca")
pca_plot

Determines optimal dimensions using elbow plot

elbow <- ElbowPlot(scaled_data, ndim = 50)
elbow

Clusters cells using graph-based clustering

data <- FindNeighbors(scaled_data, dims = 1:20)
data <- FindClusters(data, resolution = 0.6, algorithm = 4)
levels(data)

Performs non-linear dimensionality reduction (UMAP & t-SNE)

data <- RunUMAP(data, dims = 1:20)
data <- RunTSNE(data, dims = 1:20)

Visualizes clusters with and without conditions

umap <- DimPlot(data, reduction = "umap", label = TRUE)
umap

tSNE <- DimPlot(data, reduction = "tsne", label = TRUE)
tSNE

umap_condition <- DimPlot(data, reduction = "umap", label = TRUE,
                          split.by = "condition")
umap_condition

tSNE_condition <- DimPlot(data, reduction = "tsne", label = TRUE,
                          split.by = "condition")
tSNE_condition

PCA and Elbow Plot

UMAP and t-SNE - Cell Clusters

UMAP - Clusters Grouped By Condition

t-SNE - Clusters Grouped By Condition

D. Cell Type Annotation

Uses ScType database for automated cell type identification
Annotates clusters based on tissue markers
Assigns cell types and handles low-confidence annotations
Updates metadata with cell type classifications

# load gene set preparation function
source("https://raw.githubusercontent.com/IanevskiAleksandr/sc-type/master/R/gene_sets_prepare.R")
# load cell type annotation function
source("https://raw.githubusercontent.com/IanevskiAleksandr/sc-type/master/R/sctype_score_.R")

# DB file
db_ <- "https://raw.githubusercontent.com/IanevskiAleksandr/sc-type/master/ScTypeDB_full.xlsx";
# Available tissues: Brain, Lung, Adrenal, Immune system, Eye, Pancreas, Liver, Heart, Intestine, Muscle, Placenta, Spleen, Stomach, Thymus
tissue <- "Immune system"

# prepare gene sets
gs_list <- gene_sets_prepare(db_, tissue)

# ==============================================================================
# Run scType Annotation
# ==============================================================================
# check Seurat object version (scRNA-seq matrix extracted differently in Seurat v4/v5)
seurat_package_v5 <- isFALSE('counts' %in% names(attributes(data[["RNA"]])));
print(sprintf("Seurat object %s is used", ifelse(seurat_package_v5, "v5", "v4")))

# extract scaled scRNA-seq matrix
scRNAseqData_scaled <- if (seurat_package_v5) as.matrix(data[["RNA"]]$scale.data) else as.matrix(data[["RNA"]]@scale.data)

# run ScType
es.max <- sctype_score(scRNAseqData = scRNAseqData_scaled, scaled = TRUE, gs = gs_list$gs_positive, gs2 = gs_list$gs_negative)

# merge by cluster
cL_resutls <- do.call("rbind", lapply(unique(data@meta.data$seurat_clusters), function(cl){
  es.max.cl = sort(rowSums(es.max[ ,rownames(data@meta.data[data@meta.data$seurat_clusters==cl, ])]), decreasing = !0)
  head(data.frame(cluster = cl, type = names(es.max.cl), scores = es.max.cl, ncells = sum(data@meta.data$seurat_clusters==cl)), 10)
}))
sctype_scores <- cL_resutls %>% group_by(cluster) %>% top_n(n = 1, wt = scores)  

# set low-confident (low ScType score) clusters to "unknown"
sctype_scores$type[as.numeric(as.character(sctype_scores$scores)) < sctype_scores$ncells/4] <- "Unknown"
print(sctype_scores[,1:3])
write.csv(sctype_scores, file = "Celltypes.csv")

# Convert sctype_scores to a data frame if it isn't already
sctype_scores <- as.data.frame(sctype_scores)

# Extract the cluster and cell type columns
cluster_ids <- sctype_scores$cluster
cell_types <- sctype_scores$type

# Create a named vector for the new cluster identities
new_cluster_ids <- cell_types
names(new_cluster_ids) <- cluster_ids

# Rename the cluster identities in the object
data <- RenameIdents(data, new_cluster_ids)
levels(data)

# Overlay the identified cell types on UMAP plot
data@meta.data$sctype_classification = ""
for(j in unique(sctype_scores$cluster)){
  cl_type = sctype_scores[sctype_scores$cluster==j,]; 
  data@meta.data$sctype_classification[data@meta.data$seurat_clusters == j] = as.character(cl_type$type[1])
}

E. Manual Cell Type Annotation

If you want to annotate cell clusters based on your gene markers, then follow below:
Get the ScType database Excel file from here
Open the file and delete all existing data rows, keeping only the column headers (tissueType, cellName, geneSymbolmore1, geneSymbolmore2, and shortName)
The geneSymbolmore2 and shortName columns are optional and can be left blank.
Fill in the columns with your custom data:
- tissueType: Your tissue of interest (e.g., "Immune")
- cellName: The specific cell type (e.g., "T-cell")
- geneSymbolmore1: The positive gene markers for that cell type, separated by commas (e.g., "CD3D,CD3E,CD8A")
- geneSymbolmore2: Alternative gene marker names for geneSymbolmore1 (e.g., T3D). T3D is an alternative to CD3D
- shortName: Short name of a specific cell type (e.g., Pro-B)
Save the file and provide its path in the code, as shown below

# DB file
db_ <- "data/ScTypeDB_full.xlsx";
tissue <- "Immune system"

F. Visualization & Analysis

Generates annotated UMAP/t-SNE plots
Creates condition-split visualizations
Counts cell distribution across types and conditions
Saves annotated dataset

# ==============================================================================
# Count Cells by Cell Type and Condition
# ==============================================================================
# Number of cells in each condition
# Create the table
num_of_cells <- table(data@meta.data$sctype_classification, data@meta.data$condition)
# Convert to data frame
num_of_cells_df <- as.data.frame.matrix(num_of_cells)
num_of_cells_df
# Save as CSV
write.csv(num_of_cells_df, file = "number_of_cells_in_conditions.csv", row.names = TRUE)

# ==============================================================================
# VISUALIZE ANNOTATED DATA
# ==============================================================================

# umap and tsne plots
umap_annt <- DimPlot(data, reduction = "umap", label = FALSE,
                     repel = TRUE, group.by = 'sctype_classification')        
umap_annt

tsne_annt <- DimPlot(data, reduction = "tsne", label = FALSE,
                     repel = TRUE, group.by = 'sctype_classification')        
tsne_annt

umap_annt_condition <- DimPlot(data, reduction = "umap", label = FALSE,
                               repel = TRUE, group.by = 'sctype_classification',
                               split.by = "condition")        
umap_annt_condition

tsne_annt_condition <- DimPlot(data, reduction = "tsne", label = FALSE,
                               repel = TRUE, group.by = 'sctype_classification',
                               split.by = "condition")        
tsne_annt_condition

UMAP and t-SNE Annotated Cell Clusters

UMAP - Annotated Cell Clusters Grouped By Condition

t-SNE - Annotated Cell Clusters Grouped By Condition

G. Differential Expression Analysis

Subsets specific cell types if needed

# cells to keep
cells <- c("Memory CD4+ T cells",
           "Naive CD4+ T cells",
           "Cancer cells",
           "CD8+ NKT-like cells",
           "Natural killer  cells")
# Subset based on metadata column
data_subset <- subset(data, subset = sctype_classification %in% cells)
# Verify the subset
levels(data_subset)
table(data_subset@meta.data$sctype_classification)
table(data_subset@meta.data$condition)
saveRDS(data_subset, file = "data_subset.rds")

Performs differential expression between:
Same cell type across conditions (Normal vs PD)

      data$celltype.condition <- paste(data$sctype_classification, data$condition, sep = "_")
Idents(data) <- "celltype.condition"
levels(data)

dff_markers <- FindMarkers(data, ident.1 = "Memory CD4+ T cells_LM",
                           ident.2 = "Memory CD4+ T cells_Normal", min.pct = 0.1,
                           test.use = "wilcox", verbose = TRUE, only.pos = FALSE)
# Apply FDR
dff_markers$fdr = p.adjust(dff_markers$p_val, method='fdr')
write.csv(dff_markers, file = "Memory CD4+ T cells_DEGs.csv") # Required for volcano plot

dff_up_markers <- subset(dff_markers, avg_log2FC > 0.25)
dff_dn_markers <- subset(dff_markers, avg_log2FC < -0.25)
# Save degs results
write.csv(dff_up_markers, file = "Memory CD4+ T cells_up_markers.csv")
write.csv(dff_dn_markers, file = "Memory CD4+ T cells_dn_markers.csv")

Different cell types regardless of condition

Idents(data) <- "sctype_classification"
levels(data)
degs <- FindMarkers(data, ident.1 = "Cancer cells",
                    ident.2 = NULL, test.use = "wilcox",
                    only.pos = FALSE)
# Apply FDR
degs$fdr = p.adjust(degs$p_val, method='fdr')
write.csv(degs, file = "Cancer cells_DEGs.csv") # Required for volcano plot

degs_up <- subset(degs, avg_log2FC > 0.25)
degs_down <- subset(degs, avg_log2FC < -0.25)
# Save degs results
write.csv(degs_up, file = "Cancer_cells_up_markers.csv")
write.csv(degs_down, file = "Cancer_cells_dn_markers.csv")

Different cell types in different conditions

data$celltype.condition <- paste(data$sctype_classification, data$condition, sep = "_")
Idents(data) <- "celltype.condition"
levels(data)
# DEGs of Cancer cells in diseased condition vs All other cell types in Normal condition
degs <- FindMarkers(data, ident.1 = "Cancer cells_LM",
                    ident.2 = c("Macrophages_Normal",
                                "Natural killer  cells_Normal",
                                "Basophils_Normal",
                                "Non-classical monocytes_Normal",
                                "CD8+ NKT-like cells_Normal",
                                "Memory CD4+ T cells_Normal",
                                "Myeloid Dendritic cells_Normal",
                                "Pre-B cells_Normal",
                                "Endothelial_Normal",
                                "Classical Monocytes_Normal",
                                "Plasmacytoid Dendritic cells_Normal",
                                "γδ-T cells_Normal",
                                "Memory B cells_Normal",
                                "Naive CD4+ T cells_Normal"
                    ), test.use = "wilcox",
                    only.pos = FALSE)
# Apply FDR
degs$fdr = p.adjust(degs$p_val, method='fdr')
write.csv(degs, file = "Cancer cells_DEGs.csv") # Required for volcano plot

degs_up <- subset(degs, avg_log2FC > 0.25)
degs_down <- subset(degs, avg_log2FC < -0.25)
# Save degs results
write.csv(degs_up, file = "Cancer_cells_up_markers.csv")
write.csv(degs_down, file = "Cancer_cells_dn_markers.csv")

Identifies up/down-regulated genes
Applies statistical filtering and FDR correction
Exports marker gene lists

Note:

Join layers of the Seurat object before performing differential expression

data <- JoinLayers(data)

Outputs

The script produces:
- Quality control plots
- Clustering visualizations
- Annotated cell type maps
- Differential expression results
- Processed datasets for downstream analysis (e.g Cell-cell communication, Trajectory analysis etc)

03. Multiple Datasets Integration

This script performs comprehensive integration of multiple single-cell RNA sequencing datasets to address technical variations while preserving biological signals.

Workflow

A. Data Preparation

Loads and combines samples from multiple experimental datasets

# Dataset 01 Samples
s1 <- Read10X("samples/GSE231559/GSM7290763")
s1 <- CreateSeuratObject(s1, project = "GSE231559-CRC1")

s2 <- Read10X("samples/GSE231559/GSM7290769")
s2 <- CreateSeuratObject(s2, project = "GSE231559-CRC2")

s3 <- Read10X("samples/GSE231559/GSM7290770")
s3 <- CreateSeuratObject(s3, project = "GSE231559-N1")

s4 <- Read10X("samples/GSE231559/GSM7290772")
s4 <- CreateSeuratObject(s4, project = "GSE231559-N2")

# Dataset 02 Samples
s5 <- Read10X("samples/GSE261012/GSM6432445")
s5 <- CreateSeuratObject(s5, project = "GSE261012-CRC1")

s6 <- Read10X("samples/GSE261012/GSM6432446")
s6 <- CreateSeuratObject(s6, project = "GSE261012-CRC2")

s7 <- Read10X("samples/GSE261012/GSM6432447")
s7 <- CreateSeuratObject(s7, project = "GSE261012-N1")

s8 <- Read10X("samples/GSE261012/GSM6432448")
s8 <- CreateSeuratObject(s8, project = "GSE261012-N2")

Adds metadata labels for dataset origin and experimental conditions

# Add metadata labels
s1$dataset <- "GSE231559";  s1$condition <- "CRC"
s2$dataset <- "GSE231559";  s2$condition <- "CRC"
s3$dataset <- "GSE231559";  s3$condition <- "Control"
s4$dataset <- "GSE231559";  s4$condition <- "Control"

s5$dataset <- "GSE261012";  s5$condition <- "CRC"
s6$dataset <- "GSE261012";  s6$condition <- "CRC"
s7$dataset <- "GSE261012";  s7$condition <- "Control"
s8$dataset <- "GSE261012";  s8$condition <- "Control"

Merges all samples into a unified analysis object

# combine all samples
combined <- merge(s1, y = c(s2,s3,s4,s5,s6,s7,s8),
                  add.cell.ids = c("GSE231559-CRC1","GSE231559-CRC2","GSE231559-N1","GSE231559-N2",
                                   "GSE261012-CRC1","GSE261012-CRC2","GSE261012-N1","GSE261012-N2"))
# Verify the merge
table(combined$condition)
table(combined$orig.ident)
dim(combined)

B. Quality Control & Preprocessing

Performs quality control, normalization, highly variable features, scaling, and PCA of the data (same as scRNA-seqr)
Performs cell clustering

data <- FindNeighbors(data, dims = 1:20, reduction = "pca")
data <- FindClusters(data, resolution = 0.6, algorithm = 4,
                         cluster.name = "raw_clusters")

Generates low-dimensional embeddings (UMAP/t-SNE) showing raw cluster patterns

data <- RunUMAP(data, dims = 1:20, reduction = "pca",
                    reduction.name = "umap.raw")

data <- RunTSNE(data, dims = 1:20, reduction = "pca",
                    reduction.name = "tsne.raw")

Visualizes inherent batch effects and sample separation

umap_data <- DimPlot(data, reduction = "umap.raw")
umap_data

tsne_data <- DimPlot(data, reduction = "tsne.raw")
tsne_data

umap_condition_data <- DimPlot(data, reduction = "umap.raw", group.by = "condition")
umap_condition_data

tsne_condition_data <- DimPlot(data, reduction = "tsne.raw", group.by = "condition")
tsne_condition_data

C. Batch Effect Correction

Applies advanced integration methods to remove technical variability
Creates harmonized dimensional space while preserving biological variation
Maintains dataset-specific biological signals while minimizing batch effects

crt_data <- RunHarmony(data, group.by.vars = "dataset")

E. Integrated Data Analysis

Re-clusters cells using integrated dimensions

crt_data <- FindNeighbors(crt_data, reduction = "harmony", dims = 1:20)
crt_data <- FindClusters(crt_data, resolution = 0.6, algorithm = 4,
                         cluster.name = "harmony_clusters")

Generates new embeddings on corrected data and visualizes corrected data

crt_data <- RunUMAP(crt_data, reduction = "harmony",
                    dims = 1:20, reduction.name = "umap.harmony")
crt_data <- RunTSNE(crt_data, reduction = "harmony", dims = 1:20,
                    reduction.name = "tsne.harmony", reduction.key = "tsneharmony_")

umap_crt <- DimPlot(crt_data, reduction = "umap.harmony")
umap_crt

tsne_crt <- DimPlot(crt_data, reduction = "tsne.harmony")
tsne_crt

umap_crt_condition <- DimPlot(crt_data, reduction = "umap.harmony", group.by = "condition")
umap_crt_condition

tsne_crt_condition <- DimPlot(crt_data, reduction = "tsne.harmony", group.by = "condition")
tsne_crt_condition

Creates comparative visualizations showing integration effectiveness

# Add titles to individual UMAP plots
umap_before <- umap_data + ggtitle("Clusters Before Batch Correction")
umap_after  <- umap_crt  + ggtitle("Clusters After Batch Correction")

# umap
combine_umap <- umap_before + umap_after
combine_umap

# Add titles to individual t-SNE plots
tsne_before <- tsne_data + ggtitle("Clusters Before Batch Correction")
tsne_after  <- tsne_crt  + ggtitle("Clusters After Batch Correction")

# tsne
combine_tsne <- tsne_before + tsne_after
combine_tsne

UMAP - Before and After Batch-Effect Correction

t-SNE - Before and After Batch-Effect Correction

F. Cell Type Annotation

Automates cell type identification using reference gene signatures
Assigns cell types to clusters based on marker expression patterns
Verifies cell type annotation
Quantifies cell distribution across types and conditions
Enables flexible subsetting for focused analysis of specific populations

G. Differential Expression Analysis

Performs comprehensive gene expression comparisons:
- Within cell types across experimental conditions
- Between different cell types regardless of conditions
- Complex multi-factor comparisons between conditions
Identifies statistically significant differentially expressed genes

Outputs

This script produes:
- Publication-ready visualizations
- Fully annotated datasets
- Comprehensive differential expression results

04. 3D UMAP and t-SNE Plots

This script creates interactive 3D visualizations of single-cell data using UMAP and t-SNE dimensionality reduction methods.

Workflow

A. 3D UMAP and t-SNE Generation

Computes UMAP and t-SNE with three components for 3D coordinates
Extracts embedding data and cluster information
Creates interactive 3D scatter plots colored by cell clusters
Exports visualizations in multiple formats (PNG, HTML)

B. Gene Expression Visualization

Maps gene expression patterns onto 3D UMAP and t-SNE space
Uses HLA-DQA1 gene as an example with adjustable expression scaling
Creates color-coded plots showing expression levels
Generates interactive plots with cell-level information

C. View 3D UMAP and t-SNE Plots

Description	Link
3D umap	3d-umap
Gene Expression in 3D umap	3d-gene-umap
3D tsne	3d-tsne
Gene Expression in 3D tsne	3d-gene-tsne

Outputs

The script produces:
- Interactive 3D UMAP and t-SNE plots
- Gene expression overlays in 3D space
- Multiple export formats (static PNG, interactive HTML)
- Enhanced visualization capabilities for data exploration

05. Volcano Plot

This script creates volcano plots of specific cell types to visualize differentially expressed genes between experimental conditions.

Workflow

A. Data Preparation

Loads differential expression results from CSV files
Formats data for volcano plot visualization
Sets up gene identifiers and statistical columns

file_path <- "Cancer cells_DEGs.csv"
degs <- read_csv(file_path, show_col_types = FALSE)
degs <- as.data.frame(degs)              
rownames(degs) <- degs$gene              
degs$gene <- NULL

B. Plot Generation

Uses the EnhancedVolcano package for advanced plotting
Maps log2 fold changes against adjusted p-values (FDR)
Applies statistical cutoffs for significance
Customizes colors for different significance categories

volc <- EnhancedVolcano(
  degs,
  lab              = NA,
  x                = "avg_log2FC",
  y                = "fdr",
  pCutoff          = 0.05,
  FCcutoff         = 2,  # Adjust log FC value cutoff
  title            = "Cancer cells-LM vs All Other Cell Types-Normal",   # Change according to your celltype degs file
  pointSize        = 2.0,
  labSize          = 3.0,
  legendLabSize    = 10,
  legendPosition    = "right",
  legendIconSize   = 4.0,
  drawConnectors   = FALSE,
  widthConnectors  = 0.5,
  gridlines.major  = TRUE,
  gridlines.minor  = TRUE,
  col = c(
    'grey50',             # Insignificant p-value and log2FC
    '#1DCD9F',            # Insignificant log2FC
    'royalblue',          # Insignificant p-value
    '#FF0B55'             # Significant p-value and log2FC
  ))

Adjust FCcutoff and title according to your data

C. Visualization & Export

Creates publication-quality volcano plots
Highlights significantly up/down-regulated genes
Saves high-resolution images for analysis

06. Heatmap

This script generates gene-specific heatmaps to visualize gene expression patterns across cell types and conditions.

Workflow

A. Data Extraction

Selects genes of interest for visualization

genes_of_interest <- c(
  "CD247","RAC1","CALM1","CD3D","HLA-DQA1",
  "HLA-DQA2","HLA-DPA1","HLA-DRA","CD74","CD3G"
)

Extracts expression data from the Seurat object
Combines expression data with metadata

expr_mat <- GetAssayData(data, assay = "RNA", layer = "data")[genes_of_interest, ]
expr_df <- as.data.frame(t(expr_mat))
metadata <- data@meta.data
expr_df$celltype <- metadata$sctype_classification
expr_df$condition <- metadata$condition

B. Expression Analysis

Calculates average expression per cell type and condition
Creates expression matrices for heatmap plotting
Saves expression data for documentation

# Step 01 - Z-SCORE each gene across ALL CELLS (cell-level z-scores)
# This centers & scales each gene using all cells: mean = 0, sd = 1
expr_df_zcell <- expr_df %>%
  mutate(across(all_of(genes_of_interest), ~ scale(.)[, 1]))

Step 02 - AVERAGE z-scores WITHIN each celltype × condition group
z_group_df <- expr_df_zcell %>%
  group_by(celltype, condition) %>%
  summarise(across(all_of(genes_of_interest), mean, .names = "{.col}"), .groups = "drop")

# Create a column name that combines celltype and condition (for rownames later)
z_group_df <- z_group_df %>%
  mutate(group = paste(celltype, condition, sep = "_"))

# Save the group-level z-scores table
write.csv(z_group_df, "Genes-z-scores.csv", row.names = FALSE)

C. Plot Generation

Prepares annotation colors for conditions and cell types
Uses ComplexHeatmap for clustered visualization
Applies color schemes and formatting
Exports high-quality heatmap images

# Build matrix genes x groups (ComplexHeatmap expects rows = genes, cols = samples/groups)
expr_mat_group <- as.matrix(z_group_df[, genes_of_interest])  # rows = groups, cols = genes
rownames(expr_mat_group) <- z_group_df$group

# transpose so rows = genes and columns = groups (as in your original)
expr_mat_group <- t(expr_mat_group)

# Get unique conditions and celltypes
conditions <- unique(z_group_df$condition)
celltypes <- unique(z_group_df$celltype)

# Create the desired column order: all CRC groups first, then all Control groups
desired_order <- c()
for(cond in conditions) {  # This will loop through conditions in their natural order
  for(ct in celltypes) {
    group_name <- paste(ct, cond, sep = "_")
    if(group_name %in% colnames(expr_mat_group)) {
      desired_order <- c(desired_order, group_name)
    }
  }
}

# Reorder the matrix columns
expr_mat_group <- expr_mat_group[, desired_order, drop = FALSE]

# Extract condition and celltype per group (match order from reordered matrix)
annotation_col <- data.frame(
  condition = factor(sub(".*_", "", colnames(expr_mat_group))),
  celltype = factor(sub("_.*", "", colnames(expr_mat_group)))
)
rownames(annotation_col) <- colnames(expr_mat_group)

# Define your full color vector
all_colors <- c(
  "#9e0142","#d53e4f","#f46d43","#fdae61","#fee08b",
  "#abdda4","#66c2a5","#3288bd","#5e4fa2","#6090D8",
  "#4C825D","#8CAE9E","#D5AB85","#99610A","#C38F16",
  "#FFCD00","#C0532B","#B50200","#67322E","#B47E83",
  "#807B7F","#8C86A0","#674D53","#508CA7","#0E2A4D"
)

# Extract unique cell types in the order they appear in annotation_col
celltypes <- unique(annotation_col$celltype)

# Assign colors to cell types (truncate color vector if needed)
celltype_colors <- setNames(all_colors[1:length(celltypes)], celltypes)

# Build annotation color list
ann_colors <- list(
  condition = c(LM = "#2c98a0", Normal = "#67dba5"),  # adjust for more conditions if needed
  celltype = celltype_colors
)

# Use a diverging color scale centered at 0 (z-scores)
hcolors <- rev(paletteer::paletteer_c("ggthemes::Red-Blue Diverging", n = 200))

ht <- Heatmap(
  expr_mat_group,
  name = "Z-score",
  col = hcolors,
  cluster_rows = TRUE,
  cluster_columns = TRUE,  # Turn off column clustering to use manual order
  show_row_names = TRUE,
  show_column_names = FALSE,
  top_annotation = HeatmapAnnotation(df = annotation_col, col = ann_colors,
                                     # Add borders to annotation cells
                                     gp = gpar(col = "black", lwd = 0.8)),
  # Add borders to heatmap cells
  rect_gp = gpar(col = "black", lwd = 0.8),
  heatmap_legend_param = list(
    legend_width = unit(0.8, "cm"),
    direction = "vertical",
    title_position = "topleft"
  ),
  use_raster = FALSE
)
ht

# Save as PNG (high res)
png("Heatmap.png", width = 3000, height = 2000, res = 300)
draw(ht)
dev.off()

07. Boxplot

This script creates box plots to compare gene expression across conditions and specific cell types.

Workflow

Condition-Based Comparison

Extracts expression data for specific genes
Compares the expression between the Control and Diseased conditions
Creates box plots with jittered points
Applies custom color schemes

# Create a data frame with gene expression and conditions
boxplot_df <- data.frame(gene = GetAssayData(data, assay = "RNA", layer = "data")["HSP90AA1",], 
                         condition = data$condition)

custom_colors <- c("LM" = "#FF3200FF", "Normal" = "#0076BBFF") #, "Metastasis" = "#2A9D8F")

box <- ggplot(boxplot_df, aes(x = condition, y = gene)) + 
  geom_jitter(aes(color = condition), width = 0.2, size = 1.5, alpha = 0.6, show.legend = FALSE) +  
  geom_boxplot(aes(fill = condition), width = 0.7, outlier.shape = NA, alpha = 0.6, color = "black") +  
  scale_fill_manual(values = custom_colors) +
  scale_color_manual(values = custom_colors) +
  labs(title = "HSP90AA1", x = "Condition", y = "Expression") +
  theme_minimal(base_size = 14) +
  theme(
    plot.title = element_text(size = 18, face = "bold", hjust = 0.5),
    axis.text.x = element_text(angle = 30, hjust = 1, size = 12),
    panel.grid.minor = element_blank(),
    panel.border = element_blank(),
    axis.line = element_line(color = "black"),
    axis.line.x.top    = element_blank(),
    axis.line.y.right  = element_blank(),
    axis.ticks.x.top   = element_blank(),
    axis.ticks.y.right = element_blank(),
    axis.ticks.x.bottom   = element_line(),
    axis.ticks.y.left = element_line(),
    panel.grid.major = element_line(color = "grey80", size = 0.4)
  )
box
# Save the plot
ggsave("HSP90AA1-condition.png", plot = box, bg = "white", dpi = 600)

Adjust gene and condition names according to your data

Cell Type-Specific Analysis

Filters data for specific cell types
Compares expression within cell types across conditions
Generates focused visualizations

# Specify the cell type of interest
target_celltype <- "Memory CD4+ T cells"

# Logical filter: keep rows that start with the cell type (e.g., "Cancer cells")
cell_indices <- startsWith(data$celltype.condition, paste0(target_celltype, "_"))

# Create filtered data frame for box plot
boxplot_df <- data.frame(
  gene = GetAssayData(data, assay = "RNA", layer = "data")["HSP90AA1", cell_indices], 
  condition = data$celltype.condition[cell_indices]
)

# Recalculate group levels for coloring
groups <- unique(boxplot_df$condition)

custom_colors <- setNames(
  ifelse(endsWith(groups, "_LM"), "#FF3200FF", "#0076BBFF"),
  groups
)

# Plot
box <- ggplot(boxplot_df, aes(x = condition, y = gene)) + 
  geom_jitter(aes(color = condition), width = 0.2, size = 1.5, alpha = 0.6, show.legend = FALSE) +  
  geom_boxplot(aes(fill = condition), width = 0.7, outlier.shape = NA, alpha = 0.6, color = "black") +  
  scale_fill_manual(values = custom_colors) +
  scale_color_manual(values = custom_colors) +
  labs(title = paste("HSP90AA1"), x = "Condition", y = "Expression") +
  theme_minimal(base_size = 14) +
  theme(
    plot.title = element_text(size = 18, face = "bold", hjust = 0.5),
    axis.text.x = element_text(angle = 30, hjust = 1, size = 12),
    panel.grid.minor = element_blank(),
    panel.border = element_blank(),
    axis.line = element_line(color = "black"),
    axis.line.x.top    = element_blank(),
    axis.line.y.right  = element_blank(),
    axis.ticks.x.top   = element_blank(),
    axis.ticks.y.right = element_blank(),
    axis.ticks.x.bottom   = element_line(),
    axis.ticks.y.left = element_line(),
    panel.grid.major = element_line(color = "grey80", size = 0.4)
  )
box
# Save the plot
ggsave("HSP90AA1-Memory-CD4-T-cells.png", plot = box, bg = "white", dpi = 600)

Adjust cell-type, gene, and condition names according to your data

Visualization Customization

Uses ggplot2 for advanced plotting
Applies consistent theming and styling
Creates publication-ready figures

📚 Citation

If you use this analysis pipeline in your work, please cite the repository:

scRNA-seq-navigator. GitHub: https://github.com/usman4373/scRNA-seq-navigator

🤝 Acknowledgements

This pipeline implements established single-cell analysis methods, including Seurat (Stuart et al., 2019), Harmony (Korsunsky et al., 2019), and ScType (Ianevski et al., 2022). Please also cite the original publications of these tools when using their respective functionalities in your research.
Please also cite other tools used in the analysis pipeline:
- dplyr - For efficient data manipulation and transformation
- patchwork - For seamless arrangement of multiple ggplot2 plots
- ggplot2 - For creating elegant and customizable data visualizations
- openxlsx - For reading and writing Excel files in R
- HGNChelper - For handling and validating HGNC gene symbols
- plotly - For creating interactive web-based visualizations
- ComplexHeatmap - For generating sophisticated and annotated heatmaps
- circlize - For circular visualization and heatmap enhancements
- paletteer - For accessing comprehensive color palette collections
- EnhancedVolcano - For publication-ready volcano plot generation
- viridis - For colorblind-friendly color scales in visualizations
- htmlwidgets - For embedding interactive JavaScript visualizations in R
- webshot2 - For converting interactive plots to static images
- readr - For fast and tidy data import functionality
- Other open-source R packages and dependencies

Name		Name	Last commit message	Last commit date
Latest commit History 40 Commits
data		data
.gitattributes		.gitattributes
01-samples_merger.R		01-samples_merger.R
02-scRNA-seqr.R		02-scRNA-seqr.R
03-multiple-datasets-integrator.R		03-multiple-datasets-integrator.R
04-umap-tsne-3d-plotr.R		04-umap-tsne-3d-plotr.R
05-volcano_plotr.R		05-volcano_plotr.R
06-heatmapr.R		06-heatmapr.R
07-boxplotr.R		07-boxplotr.R
LICENSE		LICENSE
README.md		README.md
install_packages.R		install_packages.R

Folders and files

Latest commit

History

Repository files navigation

📑 Table of contents

📝 Overview

NOTE: The code snippets in this README are for demonstration and can be found in their respective script files.

Installation

Input Files Format

01. Merging Samples

Purpose

Workflow

A. Reading and Creating Seurat Object

B. Merging Multiple Samples

C. Adding Condition Metadata

D. Final Merging

02. Single Cell RNA Sequencing Analysis

Workflow

A. Data Loading & Quality Control

B. Data Normalization & Feature Selection

C. Dimensionality Reduction & Clustering

PCA and Elbow Plot

UMAP and t-SNE - Cell Clusters

UMAP - Clusters Grouped By Condition

t-SNE - Clusters Grouped By Condition

D. Cell Type Annotation

E. Manual Cell Type Annotation

F. Visualization & Analysis

UMAP and t-SNE Annotated Cell Clusters

UMAP - Annotated Cell Clusters Grouped By Condition

t-SNE - Annotated Cell Clusters Grouped By Condition

G. Differential Expression Analysis

Outputs

03. Multiple Datasets Integration

Workflow

A. Data Preparation

B. Quality Control & Preprocessing

C. Batch Effect Correction

E. Integrated Data Analysis

UMAP - Before and After Batch-Effect Correction

t-SNE - Before and After Batch-Effect Correction

F. Cell Type Annotation

G. Differential Expression Analysis

Outputs

04. 3D UMAP and t-SNE Plots

Workflow

A. 3D UMAP and t-SNE Generation

B. Gene Expression Visualization

C. View 3D UMAP and t-SNE Plots

Outputs

05. Volcano Plot

Workflow

A. Data Preparation

B. Plot Generation

C. Visualization & Export

06. Heatmap

Workflow

A. Data Extraction

B. Expression Analysis

C. Plot Generation

07. Boxplot

Workflow

Condition-Based Comparison

Cell Type-Specific Analysis

Visualization Customization

📚 Citation

🤝 Acknowledgements

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Packages