Jeongho Chae
Benjamin McMichael
Date: 2025-01-24
This analysis utilises a snakemake pipeline to process ATAC-seq data. Once the pipeline has been cloned to the analysis directory (preferably in scratch space) using the command:
git clone https://sc.unc.edu/dept-fureylab/wasp_chromatin.git
There's no only prerequisite for running and the command for executing the WASP_chromatin pipeline is:
sbatch Snakemake_SLURMsubmission.sbatch
- Updated module versions to reflect newer versions of tools
- Organized the temp directory and final results directory for clarify
- Implement the moveout function
Used Previous versions for comparison with past WASP results.
Pervious versions should be used together rather than mixing them with current versions.
- WASP
- Previous WASP version : 2019.12
- Current WASP version : 2023.02
- python
- Previous python version : 3.6.6
- Current python version : 3.9.6
- bowtie2
- Previous bowtie2 version : 2.4.1
- Current bowtie2 version : 2.4.5
- samtools
- Previous samtools version : 1.12
- Current samtools version : 1.21
- Updated Perl scripts and created a snakemake pipeline
- Scripts used previously in the analysis of a subset of ATAC data can be found here:
/proj/fureylab/projects/CD_allelic_imbalance/wasp/wasp_scripts
- WASP uses genotype data to infer snp's and allelic usage
- Genotype data for use with WASP can be found here:
/proj/fureylab/data/Genotypes/human/imputed_vcfs_hg38
Implemented WASP so that the rmdup_pe rule is not executed redundantly, assuming that the reads for WASP have already had duplicates removed. However, if removeDupReads is set to TRUE in project_config.yaml, the remove duplicates rule will be executed.
- Rule all
- Defines the final expected output files.
- Rule find_intersecting_snps
- Identifies sequencing reads that overlap known SNPs.
- Generates three sets of reads:
- Reads that require remapping (alternative alleles substituted) -> FASTQ file
- Reads that require remapping (reference alleles retained) -> BAM file
- Reads that do not require remapping -> BAM file
- Rule remap_bowtiew2
- Re-aligns the remapping reads to the reference genome using Bowtie2.
- Produces a BAM file with newly mapped reads.
- Rule sort_index_remapBam
- Sorts and indexes the remapped BAM file for efficient processing.
- Rule filter_remapped_reads
- Compares remapped reads with their original versions.
- Discard reads that do not map back to their original locations, reducing allele-specific mapping bias.
- Rule merge_bams
- Merges the filtered remapped BAM file with the original non-remapped BAM file generated from Rule 2.
For Rule 7 and 8, the rules will be executed only when removeDupReads is set to TRUE in project_config.yaml
- Rule sort_index_mergeBam
- Sorts and indexes the merged BAM file for efficient processing.
- Rule rmdup_pe
- Removes PCR duplicates from paired-end reads.
- Keeps only unique reads to prevent bias in allele-spcific analysis.
- Rule sort_index_rmdup_pe
- Sorts and indexes the final BAM file for downstream analysis.
Moves final results files for locally run samples to permenant space. set moveOutFiles to TRUE in project_config.yaml after running the pipeline, checking that everything had run correctly, and then rerun the pipeline using the same submission statement.