- π Overview
- Installation
- Input Files Format
- 01. Merging Samples
- 02. Single Cell RNA Sequencing Analysis
- 03. Multiple Datasets Integration
- 04. 3D UMAP and t-SNE Plots
- 05. Volcano Plot
- 06. Heatmap
- 07. Boxplot
- π Citation
- π€ Acknowledgements
scRNA-seq-navigator contains a comprehensive single-cell RNA sequencing (scRNA-seq) analysis pipeline designed for processed data following alignment. The pipeline leverages the robust Seurat (Stuart et al., 2019) toolkit for essential analytical steps, including data integration, quality control, and batch-effect correction, and incorporates the ScType database (Ianevski et al., 2022) for refined cell-type annotation. It further facilitates differential gene expression analysis and multi-dimensional visualization to extract meaningful biological insights from complex single-cell datasets.
- The analyses provide end-to-end solutions for:
- Data Integration & Processing: Merging multiple datasets and experimental conditions
- Batch Effect Correction: Addressing technical variations across different platforms and studies
- Cell Type Identification: Automated annotation using reference gene signatures
- Advanced Visualization: 2D/3D embeddings and interactive plots
- Differential Expression: Statistical analysis of gene expression changes
- Publication-ready Outputs: High-quality figures and comprehensive results
-
Each script represents a modular component of the analysis workflow, allowing researchers to process single-cell data from raw counts to biological interpretation
| Script | Description |
|---|---|
01-samples_merger.R |
Merges multiple scRNA-seq samples from different conditions into a unified dataset for analysis |
02-scRNA-seqr.R |
Performs comprehensive scRNA-seq analysis, including QC, clustering, cell type annotation, and differential expression analysis |
03-multiple-datasets-integrator.R |
Integrates multiple scRNA-seq datasets using Harmony to remove batch effects |
04-umap-tsne-3d-plotr.R |
Creates interactive 3D visualizations of UMAP and t-SNE embeddings for enhanced data exploration |
05-volcano_plotr.R |
Generates publication-ready volcano plots to visualize differential gene expression results |
06-heatmapr.R |
Produces clustered heatmaps showing gene expression patterns across cell types and conditions |
07-boxplotr.R |
Creates comparative box plots to analyze gene expression differences across experimental conditions and/or cell types |
NOTE: The code snippets in this README are for demonstration and can be found in their respective script files.
- Navigate to the directory containing the
install_packages.Rfile - Open a terminal and run:
Rscript install_packages.R
- Alternatively, if the above does not work, open the
install_packages.Rfile in RStudio - Click the "Source" button in the top-right of the script editor OR in the R console run:
# Navigate to the script directory first, then:
source("install_packages.R")- The analysis workflow is designed for processed scRNA-seq data following alignment.
- Input should be in 10X Genomics format, consisting of the three standard files for each sample directory:
barcodes.tsv.gzfeatures.tsv.gzmatrix.mtx.gz
This script performs the initial data integration step for single-cell RNA sequencing (scRNA-seq) analysis by merging multiple samples from two experimental conditions (e.g, Normal and Parkinson's Disease) into a unified Seurat object.
- Load individual scRNA-seq samples from both control and diseased conditions
- Merge samples within each condition group
- Combine both condition groups into a final merged dataset
- Add appropriate metadata for sample tracking and condition identification
- Export the consolidated data for downstream analysis
sample1 <- Read10X(data.dir = "samples/Normal/GSM7290760")
sample1 <- CreateSeuratObject(counts = sample1, project = "N1")Read10X()Reads the 10X Genomics format data from the specified directory containing three essential files:barcodes.tsv.gz(cell identifiers)features.tsv.gz(gene information)matrix.mtx.gz(count matrix)
CreateSeuratObject()Converts the raw count data into a Seurat object, which is the standard data structure
- Merging all control samples:
normal_merged <- merge(sample1, y = c(sample2, sample3, sample4, sample5),
add.cell.ids = c("N1", "N2", "N3", "N4", "N5"))- Merging all diseased samples:
LM_merged <- merge(sample6, y = c(sample7, sample8, sample9, sample10),
add.cell.ids = c("LM1", "LM2", "LM3", "LM4", "LM5"))- Control samples:
normal_merged <- AddMetaData(object = normal_merged,
metadata = "Normal", col.name = "condition")- Diseased samples:
LM_merged <- AddMetaData(object = LM_merged,
metadata = "LM", col.name = "condition")- Merging control and diseased samples:
merged_samples <- merge(normal_merged, LM_merged)This script performs comprehensive single-cell RNA sequencing analysis from quality control through cell type annotation and differential expression analysis.
- Loads merged dataset from previous script (samples merger)
rawdata <- merged_samples
dim(rawdata) # 38224 features and 36289 cells
rm(merged_samples)- Calculates mitochondrial gene percentage
rawdata[["percent.mt"]] <- PercentageFeatureSet(rawdata, pattern = "^(?i)MT-")- Visualizes QC metrics using violin plots
Plot1 <- VlnPlot(rawdata, features = c("nFeature_RNA", "nCount_RNA", "percent.mt"),
group.by = "condition", ncol = 3)
Plot1- Applies quality filters based on:
- Gene counts
- RNA counts
- Mitochondrial content
filtered_data <- subset(rawdata, subset = nFeature_RNA > 200
& nFeature_RNA < 6000 & nCount_RNA > 2000 & percent.mt < 10)
dim(filtered_data) # 38224 features and 16195 cells in filtered data- Adjust
nFeature_RNA,nCount_RNA, andpercent.mtvalues according to your dataset
Note:
- Before saving plots, adjust the plot viewing window size in RStudio to match the intended resolution and aspect ratio.
- After resizing the plot window, save the plot using
ggsave()with the following format:
ggsave(file = "raw_data.png", Plot1, dpi = 600, bg = "white")- Normalizes data using log normalization
normalized_data <- NormalizeData(filtered_data, normalization.method = "LogNormalize", scale.factor = 10000)- Identifies highly variable genes
normalized_data <- FindVariableFeatures(normalized_data, selection.method = "vst", nfeatures = 2000)- Visualizes top variable features
top10 <- head(VariableFeatures(normalized_data), 10)
top10- Performs scaling of highly variable features and performs PCA
scaled_data <- ScaleData(normalized_data)
scaled_data <- RunPCA(scaled_data, features = VariableFeatures(object = scaled_data))
pca_plot <- DimPlot(scaled_data, reduction = "pca")
pca_plotNote:
- If you want to perform scaling of all genes, then run the following command instead of the previous one:
all.genes <- rownames(normalized_data)
scaled_data <- ScaleData(normalized_data, features = all.genes)
scaled_data <- RunPCA(scaled_data, features = all.genes)
pca_plot <- DimPlot(scaled_data, reduction = "pca")
pca_plot- Determines optimal dimensions using elbow plot
elbow <- ElbowPlot(scaled_data, ndim = 50)
elbow- Clusters cells using graph-based clustering
data <- FindNeighbors(scaled_data, dims = 1:20)
data <- FindClusters(data, resolution = 0.6, algorithm = 4)
levels(data)- Performs non-linear dimensionality reduction (UMAP & t-SNE)
data <- RunUMAP(data, dims = 1:20)
data <- RunTSNE(data, dims = 1:20)- Visualizes clusters with and without conditions
umap <- DimPlot(data, reduction = "umap", label = TRUE)
umap
tSNE <- DimPlot(data, reduction = "tsne", label = TRUE)
tSNE
umap_condition <- DimPlot(data, reduction = "umap", label = TRUE,
split.by = "condition")
umap_condition
tSNE_condition <- DimPlot(data, reduction = "tsne", label = TRUE,
split.by = "condition")
tSNE_condition
- Uses ScType database for automated cell type identification
- Annotates clusters based on tissue markers
- Assigns cell types and handles low-confidence annotations
- Updates metadata with cell type classifications
# load gene set preparation function
source("https://raw.githubusercontent.com/IanevskiAleksandr/sc-type/master/R/gene_sets_prepare.R")
# load cell type annotation function
source("https://raw.githubusercontent.com/IanevskiAleksandr/sc-type/master/R/sctype_score_.R")
# DB file
db_ <- "https://raw.githubusercontent.com/IanevskiAleksandr/sc-type/master/ScTypeDB_full.xlsx";
# Available tissues: Brain, Lung, Adrenal, Immune system, Eye, Pancreas, Liver, Heart, Intestine, Muscle, Placenta, Spleen, Stomach, Thymus
tissue <- "Immune system"
# prepare gene sets
gs_list <- gene_sets_prepare(db_, tissue)
# ==============================================================================
# Run scType Annotation
# ==============================================================================
# check Seurat object version (scRNA-seq matrix extracted differently in Seurat v4/v5)
seurat_package_v5 <- isFALSE('counts' %in% names(attributes(data[["RNA"]])));
print(sprintf("Seurat object %s is used", ifelse(seurat_package_v5, "v5", "v4")))
# extract scaled scRNA-seq matrix
scRNAseqData_scaled <- if (seurat_package_v5) as.matrix(data[["RNA"]]$scale.data) else as.matrix(data[["RNA"]]@scale.data)
# run ScType
es.max <- sctype_score(scRNAseqData = scRNAseqData_scaled, scaled = TRUE, gs = gs_list$gs_positive, gs2 = gs_list$gs_negative)
# merge by cluster
cL_resutls <- do.call("rbind", lapply(unique(data@meta.data$seurat_clusters), function(cl){
es.max.cl = sort(rowSums(es.max[ ,rownames(data@meta.data[data@meta.data$seurat_clusters==cl, ])]), decreasing = !0)
head(data.frame(cluster = cl, type = names(es.max.cl), scores = es.max.cl, ncells = sum(data@meta.data$seurat_clusters==cl)), 10)
}))
sctype_scores <- cL_resutls %>% group_by(cluster) %>% top_n(n = 1, wt = scores)
# set low-confident (low ScType score) clusters to "unknown"
sctype_scores$type[as.numeric(as.character(sctype_scores$scores)) < sctype_scores$ncells/4] <- "Unknown"
print(sctype_scores[,1:3])
write.csv(sctype_scores, file = "Celltypes.csv")
# Convert sctype_scores to a data frame if it isn't already
sctype_scores <- as.data.frame(sctype_scores)
# Extract the cluster and cell type columns
cluster_ids <- sctype_scores$cluster
cell_types <- sctype_scores$type
# Create a named vector for the new cluster identities
new_cluster_ids <- cell_types
names(new_cluster_ids) <- cluster_ids
# Rename the cluster identities in the object
data <- RenameIdents(data, new_cluster_ids)
levels(data)
# Overlay the identified cell types on UMAP plot
data@meta.data$sctype_classification = ""
for(j in unique(sctype_scores$cluster)){
cl_type = sctype_scores[sctype_scores$cluster==j,];
data@meta.data$sctype_classification[data@meta.data$seurat_clusters == j] = as.character(cl_type$type[1])
}- If you want to annotate cell clusters based on your gene markers, then follow below:
- Get the ScType database Excel file from here
- Open the file and delete all existing data rows, keeping only the column headers (
tissueType,cellName,geneSymbolmore1,geneSymbolmore2, andshortName) - The
geneSymbolmore2andshortNamecolumns are optional and can be left blank. - Fill in the columns with your custom data:
tissueType: Your tissue of interest (e.g., "Immune")cellName: The specific cell type (e.g., "T-cell")geneSymbolmore1: The positive gene markers for that cell type, separated by commas (e.g., "CD3D,CD3E,CD8A")geneSymbolmore2: Alternative gene marker names forgeneSymbolmore1(e.g., T3D). T3D is an alternative to CD3DshortName: Short name of a specific cell type (e.g., Pro-B)
- Save the file and provide its path in the code, as shown below
# DB file
db_ <- "data/ScTypeDB_full.xlsx";
tissue <- "Immune system"- Generates annotated UMAP/t-SNE plots
- Creates condition-split visualizations
- Counts cell distribution across types and conditions
- Saves annotated dataset
# ==============================================================================
# Count Cells by Cell Type and Condition
# ==============================================================================
# Number of cells in each condition
# Create the table
num_of_cells <- table(data@meta.data$sctype_classification, data@meta.data$condition)
# Convert to data frame
num_of_cells_df <- as.data.frame.matrix(num_of_cells)
num_of_cells_df
# Save as CSV
write.csv(num_of_cells_df, file = "number_of_cells_in_conditions.csv", row.names = TRUE)
# ==============================================================================
# VISUALIZE ANNOTATED DATA
# ==============================================================================
# umap and tsne plots
umap_annt <- DimPlot(data, reduction = "umap", label = FALSE,
repel = TRUE, group.by = 'sctype_classification')
umap_annt
tsne_annt <- DimPlot(data, reduction = "tsne", label = FALSE,
repel = TRUE, group.by = 'sctype_classification')
tsne_annt
umap_annt_condition <- DimPlot(data, reduction = "umap", label = FALSE,
repel = TRUE, group.by = 'sctype_classification',
split.by = "condition")
umap_annt_condition
tsne_annt_condition <- DimPlot(data, reduction = "tsne", label = FALSE,
repel = TRUE, group.by = 'sctype_classification',
split.by = "condition")
tsne_annt_condition- Subsets specific cell types if needed
# cells to keep
cells <- c("Memory CD4+ T cells",
"Naive CD4+ T cells",
"Cancer cells",
"CD8+ NKT-like cells",
"Natural killer cells")
# Subset based on metadata column
data_subset <- subset(data, subset = sctype_classification %in% cells)
# Verify the subset
levels(data_subset)
table(data_subset@meta.data$sctype_classification)
table(data_subset@meta.data$condition)
saveRDS(data_subset, file = "data_subset.rds")- Performs differential expression between:
- Same cell type across conditions (Normal vs PD)
data$celltype.condition <- paste(data$sctype_classification, data$condition, sep = "_")
Idents(data) <- "celltype.condition"
levels(data)
dff_markers <- FindMarkers(data, ident.1 = "Memory CD4+ T cells_LM",
ident.2 = "Memory CD4+ T cells_Normal", min.pct = 0.1,
test.use = "wilcox", verbose = TRUE, only.pos = FALSE)
# Apply FDR
dff_markers$fdr = p.adjust(dff_markers$p_val, method='fdr')
write.csv(dff_markers, file = "Memory CD4+ T cells_DEGs.csv") # Required for volcano plot
dff_up_markers <- subset(dff_markers, avg_log2FC > 0.25)
dff_dn_markers <- subset(dff_markers, avg_log2FC < -0.25)
# Save degs results
write.csv(dff_up_markers, file = "Memory CD4+ T cells_up_markers.csv")
write.csv(dff_dn_markers, file = "Memory CD4+ T cells_dn_markers.csv")- Different cell types regardless of condition
Idents(data) <- "sctype_classification"
levels(data)
degs <- FindMarkers(data, ident.1 = "Cancer cells",
ident.2 = NULL, test.use = "wilcox",
only.pos = FALSE)
# Apply FDR
degs$fdr = p.adjust(degs$p_val, method='fdr')
write.csv(degs, file = "Cancer cells_DEGs.csv") # Required for volcano plot
degs_up <- subset(degs, avg_log2FC > 0.25)
degs_down <- subset(degs, avg_log2FC < -0.25)
# Save degs results
write.csv(degs_up, file = "Cancer_cells_up_markers.csv")
write.csv(degs_down, file = "Cancer_cells_dn_markers.csv")- Different cell types in different conditions
data$celltype.condition <- paste(data$sctype_classification, data$condition, sep = "_")
Idents(data) <- "celltype.condition"
levels(data)
# DEGs of Cancer cells in diseased condition vs All other cell types in Normal condition
degs <- FindMarkers(data, ident.1 = "Cancer cells_LM",
ident.2 = c("Macrophages_Normal",
"Natural killer cells_Normal",
"Basophils_Normal",
"Non-classical monocytes_Normal",
"CD8+ NKT-like cells_Normal",
"Memory CD4+ T cells_Normal",
"Myeloid Dendritic cells_Normal",
"Pre-B cells_Normal",
"Endothelial_Normal",
"Classical Monocytes_Normal",
"Plasmacytoid Dendritic cells_Normal",
"Ξ³Ξ΄-T cells_Normal",
"Memory B cells_Normal",
"Naive CD4+ T cells_Normal"
), test.use = "wilcox",
only.pos = FALSE)
# Apply FDR
degs$fdr = p.adjust(degs$p_val, method='fdr')
write.csv(degs, file = "Cancer cells_DEGs.csv") # Required for volcano plot
degs_up <- subset(degs, avg_log2FC > 0.25)
degs_down <- subset(degs, avg_log2FC < -0.25)
# Save degs results
write.csv(degs_up, file = "Cancer_cells_up_markers.csv")
write.csv(degs_down, file = "Cancer_cells_dn_markers.csv")- Identifies up/down-regulated genes
- Applies statistical filtering and FDR correction
- Exports marker gene lists
Note:
- Join layers of the Seurat object before performing differential expression
data <- JoinLayers(data)- The script produces:
- Quality control plots
- Clustering visualizations
- Annotated cell type maps
- Differential expression results
- Processed datasets for downstream analysis (e.g Cell-cell communication, Trajectory analysis etc)
This script performs comprehensive integration of multiple single-cell RNA sequencing datasets to address technical variations while preserving biological signals.
- Loads and combines samples from multiple experimental datasets
# Dataset 01 Samples
s1 <- Read10X("samples/GSE231559/GSM7290763")
s1 <- CreateSeuratObject(s1, project = "GSE231559-CRC1")
s2 <- Read10X("samples/GSE231559/GSM7290769")
s2 <- CreateSeuratObject(s2, project = "GSE231559-CRC2")
s3 <- Read10X("samples/GSE231559/GSM7290770")
s3 <- CreateSeuratObject(s3, project = "GSE231559-N1")
s4 <- Read10X("samples/GSE231559/GSM7290772")
s4 <- CreateSeuratObject(s4, project = "GSE231559-N2")
# Dataset 02 Samples
s5 <- Read10X("samples/GSE261012/GSM6432445")
s5 <- CreateSeuratObject(s5, project = "GSE261012-CRC1")
s6 <- Read10X("samples/GSE261012/GSM6432446")
s6 <- CreateSeuratObject(s6, project = "GSE261012-CRC2")
s7 <- Read10X("samples/GSE261012/GSM6432447")
s7 <- CreateSeuratObject(s7, project = "GSE261012-N1")
s8 <- Read10X("samples/GSE261012/GSM6432448")
s8 <- CreateSeuratObject(s8, project = "GSE261012-N2")- Adds metadata labels for dataset origin and experimental conditions
# Add metadata labels
s1$dataset <- "GSE231559"; s1$condition <- "CRC"
s2$dataset <- "GSE231559"; s2$condition <- "CRC"
s3$dataset <- "GSE231559"; s3$condition <- "Control"
s4$dataset <- "GSE231559"; s4$condition <- "Control"
s5$dataset <- "GSE261012"; s5$condition <- "CRC"
s6$dataset <- "GSE261012"; s6$condition <- "CRC"
s7$dataset <- "GSE261012"; s7$condition <- "Control"
s8$dataset <- "GSE261012"; s8$condition <- "Control"- Merges all samples into a unified analysis object
# combine all samples
combined <- merge(s1, y = c(s2,s3,s4,s5,s6,s7,s8),
add.cell.ids = c("GSE231559-CRC1","GSE231559-CRC2","GSE231559-N1","GSE231559-N2",
"GSE261012-CRC1","GSE261012-CRC2","GSE261012-N1","GSE261012-N2"))
# Verify the merge
table(combined$condition)
table(combined$orig.ident)
dim(combined)- Performs quality control, normalization, highly variable features, scaling, and PCA of the data (same as scRNA-seqr)
- Performs cell clustering
data <- FindNeighbors(data, dims = 1:20, reduction = "pca")
data <- FindClusters(data, resolution = 0.6, algorithm = 4,
cluster.name = "raw_clusters")- Generates low-dimensional embeddings (UMAP/t-SNE) showing raw cluster patterns
data <- RunUMAP(data, dims = 1:20, reduction = "pca",
reduction.name = "umap.raw")
data <- RunTSNE(data, dims = 1:20, reduction = "pca",
reduction.name = "tsne.raw")- Visualizes inherent batch effects and sample separation
umap_data <- DimPlot(data, reduction = "umap.raw")
umap_data
tsne_data <- DimPlot(data, reduction = "tsne.raw")
tsne_data
umap_condition_data <- DimPlot(data, reduction = "umap.raw", group.by = "condition")
umap_condition_data
tsne_condition_data <- DimPlot(data, reduction = "tsne.raw", group.by = "condition")
tsne_condition_data- Applies advanced integration methods to remove technical variability
- Creates harmonized dimensional space while preserving biological variation
- Maintains dataset-specific biological signals while minimizing batch effects
crt_data <- RunHarmony(data, group.by.vars = "dataset")- Re-clusters cells using integrated dimensions
crt_data <- FindNeighbors(crt_data, reduction = "harmony", dims = 1:20)
crt_data <- FindClusters(crt_data, resolution = 0.6, algorithm = 4,
cluster.name = "harmony_clusters")- Generates new embeddings on corrected data and visualizes corrected data
crt_data <- RunUMAP(crt_data, reduction = "harmony",
dims = 1:20, reduction.name = "umap.harmony")
crt_data <- RunTSNE(crt_data, reduction = "harmony", dims = 1:20,
reduction.name = "tsne.harmony", reduction.key = "tsneharmony_")
umap_crt <- DimPlot(crt_data, reduction = "umap.harmony")
umap_crt
tsne_crt <- DimPlot(crt_data, reduction = "tsne.harmony")
tsne_crt
umap_crt_condition <- DimPlot(crt_data, reduction = "umap.harmony", group.by = "condition")
umap_crt_condition
tsne_crt_condition <- DimPlot(crt_data, reduction = "tsne.harmony", group.by = "condition")
tsne_crt_condition- Creates comparative visualizations showing integration effectiveness
# Add titles to individual UMAP plots
umap_before <- umap_data + ggtitle("Clusters Before Batch Correction")
umap_after <- umap_crt + ggtitle("Clusters After Batch Correction")
# umap
combine_umap <- umap_before + umap_after
combine_umap
# Add titles to individual t-SNE plots
tsne_before <- tsne_data + ggtitle("Clusters Before Batch Correction")
tsne_after <- tsne_crt + ggtitle("Clusters After Batch Correction")
# tsne
combine_tsne <- tsne_before + tsne_after
combine_tsne- Automates cell type identification using reference gene signatures
- Assigns cell types to clusters based on marker expression patterns
- Verifies cell type annotation
- Quantifies cell distribution across types and conditions
- Enables flexible subsetting for focused analysis of specific populations
- Performs comprehensive gene expression comparisons:
- Within cell types across experimental conditions
- Between different cell types regardless of conditions
- Complex multi-factor comparisons between conditions
- Identifies statistically significant differentially expressed genes
- This script produes:
- Publication-ready visualizations
- Fully annotated datasets
- Comprehensive differential expression results
This script creates interactive 3D visualizations of single-cell data using UMAP and t-SNE dimensionality reduction methods.
- Computes UMAP and t-SNE with three components for 3D coordinates
- Extracts embedding data and cluster information
- Creates interactive 3D scatter plots colored by cell clusters
- Exports visualizations in multiple formats (PNG, HTML)
- Maps gene expression patterns onto 3D UMAP and t-SNE space
- Uses
HLA-DQA1gene as an example with adjustable expression scaling - Creates color-coded plots showing expression levels
- Generates interactive plots with cell-level information
| Description | Link |
|---|---|
| 3D umap | 3d-umap |
| Gene Expression in 3D umap | 3d-gene-umap |
| 3D tsne | 3d-tsne |
| Gene Expression in 3D tsne | 3d-gene-tsne |
- The script produces:
- Interactive 3D UMAP and t-SNE plots
- Gene expression overlays in 3D space
- Multiple export formats (static PNG, interactive HTML)
- Enhanced visualization capabilities for data exploration
This script creates volcano plots of specific cell types to visualize differentially expressed genes between experimental conditions.
- Loads differential expression results from CSV files
- Formats data for volcano plot visualization
- Sets up gene identifiers and statistical columns
file_path <- "Cancer cells_DEGs.csv"
degs <- read_csv(file_path, show_col_types = FALSE)
degs <- as.data.frame(degs)
rownames(degs) <- degs$gene
degs$gene <- NULL- Uses the
EnhancedVolcanopackage for advanced plotting - Maps log2 fold changes against adjusted p-values (FDR)
- Applies statistical cutoffs for significance
- Customizes colors for different significance categories
volc <- EnhancedVolcano(
degs,
lab = NA,
x = "avg_log2FC",
y = "fdr",
pCutoff = 0.05,
FCcutoff = 2, # Adjust log FC value cutoff
title = "Cancer cells-LM vs All Other Cell Types-Normal", # Change according to your celltype degs file
pointSize = 2.0,
labSize = 3.0,
legendLabSize = 10,
legendPosition = "right",
legendIconSize = 4.0,
drawConnectors = FALSE,
widthConnectors = 0.5,
gridlines.major = TRUE,
gridlines.minor = TRUE,
col = c(
'grey50', # Insignificant p-value and log2FC
'#1DCD9F', # Insignificant log2FC
'royalblue', # Insignificant p-value
'#FF0B55' # Significant p-value and log2FC
))- Adjust
FCcutoffandtitleaccording to your data
- Creates publication-quality volcano plots
- Highlights significantly up/down-regulated genes
- Saves high-resolution images for analysis
This script generates gene-specific heatmaps to visualize gene expression patterns across cell types and conditions.
- Selects genes of interest for visualization
genes_of_interest <- c(
"CD247","RAC1","CALM1","CD3D","HLA-DQA1",
"HLA-DQA2","HLA-DPA1","HLA-DRA","CD74","CD3G"
)- Extracts expression data from the Seurat object
- Combines expression data with metadata
expr_mat <- GetAssayData(data, assay = "RNA", layer = "data")[genes_of_interest, ]
expr_df <- as.data.frame(t(expr_mat))
metadata <- data@meta.data
expr_df$celltype <- metadata$sctype_classification
expr_df$condition <- metadata$condition- Calculates average expression per cell type and condition
- Creates expression matrices for heatmap plotting
- Saves expression data for documentation
# Step 01 - Z-SCORE each gene across ALL CELLS (cell-level z-scores)
# This centers & scales each gene using all cells: mean = 0, sd = 1
expr_df_zcell <- expr_df %>%
mutate(across(all_of(genes_of_interest), ~ scale(.)[, 1]))
Step 02 - AVERAGE z-scores WITHIN each celltype Γ condition group
z_group_df <- expr_df_zcell %>%
group_by(celltype, condition) %>%
summarise(across(all_of(genes_of_interest), mean, .names = "{.col}"), .groups = "drop")
# Create a column name that combines celltype and condition (for rownames later)
z_group_df <- z_group_df %>%
mutate(group = paste(celltype, condition, sep = "_"))
# Save the group-level z-scores table
write.csv(z_group_df, "Genes-z-scores.csv", row.names = FALSE)- Prepares annotation colors for conditions and cell types
- Uses
ComplexHeatmapfor clustered visualization - Applies color schemes and formatting
- Exports high-quality heatmap images
# Build matrix genes x groups (ComplexHeatmap expects rows = genes, cols = samples/groups)
expr_mat_group <- as.matrix(z_group_df[, genes_of_interest]) # rows = groups, cols = genes
rownames(expr_mat_group) <- z_group_df$group
# transpose so rows = genes and columns = groups (as in your original)
expr_mat_group <- t(expr_mat_group)
# Get unique conditions and celltypes
conditions <- unique(z_group_df$condition)
celltypes <- unique(z_group_df$celltype)
# Create the desired column order: all CRC groups first, then all Control groups
desired_order <- c()
for(cond in conditions) { # This will loop through conditions in their natural order
for(ct in celltypes) {
group_name <- paste(ct, cond, sep = "_")
if(group_name %in% colnames(expr_mat_group)) {
desired_order <- c(desired_order, group_name)
}
}
}
# Reorder the matrix columns
expr_mat_group <- expr_mat_group[, desired_order, drop = FALSE]
# Extract condition and celltype per group (match order from reordered matrix)
annotation_col <- data.frame(
condition = factor(sub(".*_", "", colnames(expr_mat_group))),
celltype = factor(sub("_.*", "", colnames(expr_mat_group)))
)
rownames(annotation_col) <- colnames(expr_mat_group)
# Define your full color vector
all_colors <- c(
"#9e0142","#d53e4f","#f46d43","#fdae61","#fee08b",
"#abdda4","#66c2a5","#3288bd","#5e4fa2","#6090D8",
"#4C825D","#8CAE9E","#D5AB85","#99610A","#C38F16",
"#FFCD00","#C0532B","#B50200","#67322E","#B47E83",
"#807B7F","#8C86A0","#674D53","#508CA7","#0E2A4D"
)
# Extract unique cell types in the order they appear in annotation_col
celltypes <- unique(annotation_col$celltype)
# Assign colors to cell types (truncate color vector if needed)
celltype_colors <- setNames(all_colors[1:length(celltypes)], celltypes)
# Build annotation color list
ann_colors <- list(
condition = c(LM = "#2c98a0", Normal = "#67dba5"), # adjust for more conditions if needed
celltype = celltype_colors
)
# Use a diverging color scale centered at 0 (z-scores)
hcolors <- rev(paletteer::paletteer_c("ggthemes::Red-Blue Diverging", n = 200))
ht <- Heatmap(
expr_mat_group,
name = "Z-score",
col = hcolors,
cluster_rows = TRUE,
cluster_columns = TRUE, # Turn off column clustering to use manual order
show_row_names = TRUE,
show_column_names = FALSE,
top_annotation = HeatmapAnnotation(df = annotation_col, col = ann_colors,
# Add borders to annotation cells
gp = gpar(col = "black", lwd = 0.8)),
# Add borders to heatmap cells
rect_gp = gpar(col = "black", lwd = 0.8),
heatmap_legend_param = list(
legend_width = unit(0.8, "cm"),
direction = "vertical",
title_position = "topleft"
),
use_raster = FALSE
)
ht
# Save as PNG (high res)
png("Heatmap.png", width = 3000, height = 2000, res = 300)
draw(ht)
dev.off()This script creates box plots to compare gene expression across conditions and specific cell types.
- Extracts expression data for specific genes
- Compares the expression between the Control and Diseased conditions
- Creates box plots with jittered points
- Applies custom color schemes
# Create a data frame with gene expression and conditions
boxplot_df <- data.frame(gene = GetAssayData(data, assay = "RNA", layer = "data")["HSP90AA1",],
condition = data$condition)
custom_colors <- c("LM" = "#FF3200FF", "Normal" = "#0076BBFF") #, "Metastasis" = "#2A9D8F")
box <- ggplot(boxplot_df, aes(x = condition, y = gene)) +
geom_jitter(aes(color = condition), width = 0.2, size = 1.5, alpha = 0.6, show.legend = FALSE) +
geom_boxplot(aes(fill = condition), width = 0.7, outlier.shape = NA, alpha = 0.6, color = "black") +
scale_fill_manual(values = custom_colors) +
scale_color_manual(values = custom_colors) +
labs(title = "HSP90AA1", x = "Condition", y = "Expression") +
theme_minimal(base_size = 14) +
theme(
plot.title = element_text(size = 18, face = "bold", hjust = 0.5),
axis.text.x = element_text(angle = 30, hjust = 1, size = 12),
panel.grid.minor = element_blank(),
panel.border = element_blank(),
axis.line = element_line(color = "black"),
axis.line.x.top = element_blank(),
axis.line.y.right = element_blank(),
axis.ticks.x.top = element_blank(),
axis.ticks.y.right = element_blank(),
axis.ticks.x.bottom = element_line(),
axis.ticks.y.left = element_line(),
panel.grid.major = element_line(color = "grey80", size = 0.4)
)
box
# Save the plot
ggsave("HSP90AA1-condition.png", plot = box, bg = "white", dpi = 600)- Adjust gene and condition names according to your data
- Filters data for specific cell types
- Compares expression within cell types across conditions
- Generates focused visualizations
# Specify the cell type of interest
target_celltype <- "Memory CD4+ T cells"
# Logical filter: keep rows that start with the cell type (e.g., "Cancer cells")
cell_indices <- startsWith(data$celltype.condition, paste0(target_celltype, "_"))
# Create filtered data frame for box plot
boxplot_df <- data.frame(
gene = GetAssayData(data, assay = "RNA", layer = "data")["HSP90AA1", cell_indices],
condition = data$celltype.condition[cell_indices]
)
# Recalculate group levels for coloring
groups <- unique(boxplot_df$condition)
custom_colors <- setNames(
ifelse(endsWith(groups, "_LM"), "#FF3200FF", "#0076BBFF"),
groups
)
# Plot
box <- ggplot(boxplot_df, aes(x = condition, y = gene)) +
geom_jitter(aes(color = condition), width = 0.2, size = 1.5, alpha = 0.6, show.legend = FALSE) +
geom_boxplot(aes(fill = condition), width = 0.7, outlier.shape = NA, alpha = 0.6, color = "black") +
scale_fill_manual(values = custom_colors) +
scale_color_manual(values = custom_colors) +
labs(title = paste("HSP90AA1"), x = "Condition", y = "Expression") +
theme_minimal(base_size = 14) +
theme(
plot.title = element_text(size = 18, face = "bold", hjust = 0.5),
axis.text.x = element_text(angle = 30, hjust = 1, size = 12),
panel.grid.minor = element_blank(),
panel.border = element_blank(),
axis.line = element_line(color = "black"),
axis.line.x.top = element_blank(),
axis.line.y.right = element_blank(),
axis.ticks.x.top = element_blank(),
axis.ticks.y.right = element_blank(),
axis.ticks.x.bottom = element_line(),
axis.ticks.y.left = element_line(),
panel.grid.major = element_line(color = "grey80", size = 0.4)
)
box
# Save the plot
ggsave("HSP90AA1-Memory-CD4-T-cells.png", plot = box, bg = "white", dpi = 600)- Adjust cell-type, gene, and condition names according to your data
- Uses ggplot2 for advanced plotting
- Applies consistent theming and styling
- Creates publication-ready figures
- If you use this analysis pipeline in your work, please cite the repository:
scRNA-seq-navigator. GitHub: https://github.com/usman4373/scRNA-seq-navigator
-
This pipeline implements established single-cell analysis methods, including Seurat (Stuart et al., 2019), Harmony (Korsunsky et al., 2019), and ScType (Ianevski et al., 2022). Please also cite the original publications of these tools when using their respective functionalities in your research.
- Please also cite other tools used in the analysis pipeline:
- dplyr - For efficient data manipulation and transformation
- patchwork - For seamless arrangement of multiple ggplot2 plots
- ggplot2 - For creating elegant and customizable data visualizations
- openxlsx - For reading and writing Excel files in R
- HGNChelper - For handling and validating HGNC gene symbols
- plotly - For creating interactive web-based visualizations
- ComplexHeatmap - For generating sophisticated and annotated heatmaps
- circlize - For circular visualization and heatmap enhancements
- paletteer - For accessing comprehensive color palette collections
- EnhancedVolcano - For publication-ready volcano plot generation
- viridis - For colorblind-friendly color scales in visualizations
- htmlwidgets - For embedding interactive JavaScript visualizations in R
- webshot2 - For converting interactive plots to static images
- readr - For fast and tidy data import functionality
- Other open-source R packages and dependencies




















