This session will introduce participants to the statistical foundations required for analyzing RNA-seq count data. We will focus on generalized linear models (GLMs) and the principles of statistical testing that underpin differential gene expression analysis. Participants will learn how to pre-process and filter a count matrix, fit appropriate models, and interpret key statistical outputs that set the stage for downstream analyses.
- Experience with R (RStudio Installed)
- Understand how an RNA-seq count matrix is generated
- edgeR
- ggplot2
- dplyr
- MASS
- Task of differential gene expression: pairwise vs multigroup comparisons
- Preprocess for fair comparison: Filtering / Normalization
- Statistical testing: Exact test for pairwise
- Linear regression
- Poisson and negative binomial distributions
- Generalized linear model
- Sample analysis using edgeR (R notebook)
- Materials created by Ryan Huang, with figures from the following sources:
- Past MiCM slides: Intro to RNA-seq and Statistics in R (Adrien Osakwe)
- QLSC600 slides: myself and Megan Ng
- RNA-seq lecture by Peter N. Robinson
- Tutorial from Berge and Clement: https://statomics.github.io/SGA/sequencing_countData.html
Workshop created as part of the McGill Initiative in Computational Medicine