Skip to content

chae-jh/wasp-qtl-pipeline

Repository files navigation

WASP Snakemake Pipeline 1.0.0

Authors

Jeongho Chae
Benjamin McMichael
Date: 2025-01-24

Quickstart

This analysis utilises a snakemake pipeline to process ATAC-seq data. Once the pipeline has been cloned to the analysis directory (preferably in scratch space) using the command:

git clone https://sc.unc.edu/dept-fureylab/wasp_chromatin.git

There's no only prerequisite for running and the command for executing the WASP_chromatin pipeline is:

sbatch Snakemake_SLURMsubmission.sbatch

Update previous scripts

  • Updated module versions to reflect newer versions of tools
  • Organized the temp directory and final results directory for clarify
  • Implement the moveout function

Available Module Versions (from project_config.yaml)

Used Previous versions for comparison with past WASP results.
Pervious versions should be used together rather than mixing them with current versions.

  • WASP
    • Previous WASP version : 2019.12
    • Current WASP version : 2023.02
  • python
    • Previous python version : 3.6.6
    • Current python version : 3.9.6
  • bowtie2
    • Previous bowtie2 version : 2.4.1
    • Current bowtie2 version : 2.4.5
  • samtools
    • Previous samtools version : 1.12
    • Current samtools version : 1.21

Snakemake Pipeline

  • Updated Perl scripts and created a snakemake pipeline
  • Scripts used previously in the analysis of a subset of ATAC data can be found here:
/proj/fureylab/projects/CD_allelic_imbalance/wasp/wasp_scripts

Existing genotype data

  • WASP uses genotype data to infer snp's and allelic usage
  • Genotype data for use with WASP can be found here:
/proj/fureylab/data/Genotypes/human/imputed_vcfs_hg38

Pipeline Rules

Implemented WASP so that the rmdup_pe rule is not executed redundantly, assuming that the reads for WASP have already had duplicates removed. However, if removeDupReads is set to TRUE in project_config.yaml, the remove duplicates rule will be executed.

  1. Rule all
  • Defines the final expected output files.
  1. Rule find_intersecting_snps
  • Identifies sequencing reads that overlap known SNPs.
  • Generates three sets of reads:
    • Reads that require remapping (alternative alleles substituted) -> FASTQ file
    • Reads that require remapping (reference alleles retained) -> BAM file
    • Reads that do not require remapping -> BAM file
  1. Rule remap_bowtiew2
  • Re-aligns the remapping reads to the reference genome using Bowtie2.
  • Produces a BAM file with newly mapped reads.
  1. Rule sort_index_remapBam
  • Sorts and indexes the remapped BAM file for efficient processing.
  1. Rule filter_remapped_reads
  • Compares remapped reads with their original versions.
  • Discard reads that do not map back to their original locations, reducing allele-specific mapping bias.
  1. Rule merge_bams
  • Merges the filtered remapped BAM file with the original non-remapped BAM file generated from Rule 2.

For Rule 7 and 8, the rules will be executed only when removeDupReads is set to TRUE in project_config.yaml

  1. Rule sort_index_mergeBam
  • Sorts and indexes the merged BAM file for efficient processing.
  1. Rule rmdup_pe
  • Removes PCR duplicates from paired-end reads.
  • Keeps only unique reads to prevent bias in allele-spcific analysis.
  1. Rule sort_index_rmdup_pe
  • Sorts and indexes the final BAM file for downstream analysis.

moveOutFiles

Moves final results files for locally run samples to permenant space. set moveOutFiles to TRUE in project_config.yaml after running the pipeline, checking that everything had run correctly, and then rerun the pipeline using the same submission statement.

About

A reproducible Snakemake pipeline for WASP (allele-specific read mapping) to eliminate mapping bias through automated remapping and deduplication.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors