An R package for analyzing intron splicing order using Integer Linear Programming (ILP) algorithms. This package processes RNA-seq data to determine the most likely splicing order of introns within transcripts.
- Complete analysis pipeline: From BAM files to comprehensive reports
- ILP-based algorithms: Find most likely splicing orders using optimization methods
- Interactive visualizations: Generate HTML reports with interactive MLO networks
#install.packages("Rsymphony", repos = "<https://cran.r-project.org>")
#options(BioC_mirror="<https://mirrors.westlake.edu.cn/bioconductor>")
BiocManager::install("lpSolve")
# Install the package from GitHub
devtools::install_github("limeng12/intronOrder")The package has the following dependencies which will be installed automatically:
options(BioC_mirror = "https://mirrors.westlake.edu.cn/bioconductor")
options(repos = c(CRAN = "https://mirrors.westlake.edu.cn/CRAN/"))
# CRAN
install.packages(c("devtools", "Rcpp", "plyr", "dplyr", "igraph", "stringr",
"dbscan", "ggplot2", "ggraph", "tidygraph", "reshape2",
"gridExtra", "data.table", "plotly", "DT", "jsonlite",
"htmltools", "scales", "testthat", "knitr", "rmarkdown", "readr"))
# Bioconductor
if (!require("BiocManager")) install.packages("BiocManager")
BiocManager::install(c("GenomicAlignments", "Rsamtools", "GenomicRanges", "BiocStyle"))
conda create -n r_iso -c conda-forge -c bioconda \
r-base=4.4 r-devtools \
r-rcpp r-plyr r-dplyr r-igraph r-stringr r-dbscan \
r-ggplot2 r-ggraph r-tidygraph r-reshape2 r-gridextra \
r-data.table r-plotly r-dt r-jsonlite r-htmltools \
r-scales r-testthat r-knitr r-rmarkdown r-readr \
bioconductor-genomicalignments bioconductor-rsamtools \
bioconductor-genomicranges bioconductor-biocstyle -y
#install.packages("Rsymphony", repos = "<https://cran.r-project.org>")
#options(BioC_mirror="<https://mirrors.westlake.edu.cn/bioconductor>")
BiocManager::install("lpSolve")
# Install the package from GitHub
devtools::install_github("limeng12/intronOrder")
# Load the package
library(intronOrder)
#library(Rsymphony)
library(lpSolve)
# Get example data paths from package. Bam file and 12 columns bed files
bedfile <- system.file("extdata",
"Schizosaccharomyces_pombe.ASM294v2.43.chr_nothick.bed",
package = "intronOrder")
bamfile <- system.file("extdata",
"SRR6144325_junction_only.bam",
package = "intronOrder")
#idmap <- system.file("extdata", "pombe_ensembl_gene_id_trans_id_map.tsv", package = "intronOrder")
# Step 1: Generate intron splicing order data from BAM
iso_results <- getIsoFromBam(
bed_file = bedfile,
bam_file = bamfile,
output_file = "example_iso.tsv"
)
# Step 2: Run complete analysis pipeline
results <- run_iso_analysis(
bed_file = bedfile,
iso_files = c("example_iso.tsv"),
output_file = "results.tsv",
gene_trans_map = "",#idmap,
read_cov_threshold = 0.95,
trim_trans_id_by_dot = FALSE,
alpha = 0.1
)
# Step 3: Generate reports and visualizations
igraph_list <- results$key_re_list
# Interactive HTML report
generate_splicing_order_report(
igraph_list,
output_file = "results.splicing_order_report.html"
)
# PDF visualizations
#draw_mlo_plot(igraph_list, "results.plot.pdf")
#draw_mol_table_plot(igraph_list, "results.table.pdf")The package provides a comprehensive workflow:
- Input: BED annotation file + BAM alignment file
- Output: Intron splicing order pairs (iso_file)
- Intron extraction: Parse BED file for exon/intron positions
- Isoform building: Construct adjacency matrices from splicing data
- Order calculation: Use ILP algorithms to find most likely order
- Statistical analysis: Calculate entropy, correlation, p-values
- Heterogeneity: Measure splicing pattern variability
- Interactive HTML report: Browse transcripts, view MLO networks
After running the pipeline, you'll get:
- Tabular results (
results.tsv): Main analysis results - HTML report (
*.splicing_order_report.html): Interactive visualization - PDF plots (
*.plot.pdf): MLO network visualizations - Table plots (
*.table.pdf): Probability matrices
# Process multiple BAM files
bam_files <- c("sample1.bam", "sample2.bam", "sample3.bam")
iso_files <- c()
for (i in seq_along(bam_files)) {
iso_file <- paste0("sample", i, "_iso.tsv")
getIsoFromBam(
bed_file = "annotation.bed",
bam_file = bam_files[i],
output_file = iso_file,
n_threads = 4 # Use multiple threads
)
iso_files <- c(iso_files, iso_file)
}
# Combine and analyze
results <- run_iso_analysis(
bed_file = "annotation.bed",
iso_files = iso_files,
output_file = "combined_results.tsv"
)run_iso_analysis(): Main analysis pipelinegetIsoFromBam(): Generate iso_file from BAMcalculate_most_likely_order(): ILP-based order calculation
generate_splicing_order_report(): Interactive HTML reportdraw_mlo_plot(): MLO network PDF plotsdraw_mol_table_plot(): Probability matrix PDF plots
- Standard 12-column BED format without thick
- Must contain exon block information (columns 10-12)
- Transcript IDs should be unique
gene_id transcript_id gene_symbol
ENSG000001 transcript1 GeneA
ENSG000002 transcript2 GeneB
transcript_id left_intron right_junction strand cover_count junction_count
- "No valid transcripts found"
- Check BED file format and chromosome naming
- "No reads in BAM file for transcript"
- Verify BAM file has reads in transcript regions
- Check alignment quality and MAPQ scores
If you use intronOrder in your research, please cite:
Li, M. (2020). Calculating the most likely intron splicing orders in S. pombe, fruit fly, Arabidopsis thaliana, and humans.
For issues, feature requests, or questions: - Create an issue on GitHub - Check the documentation: ?intronOrder - See function help: ?run_iso_analysis
MIT License - see LICENSE file for details. `

