PATCH pipeline implementation Nextflow pipeline for processing host transcriptomic and genomic sequencing data.
The pipeline was written by the Cancer Bioinformatics and Translational Systems Biology group at King's College London, UK.
- After QC steps (
FastQC,trimmomatic), sequencing reads were aligned to the host reference genome (HISAT2for transcriptomics andbwafor whole genome sequencing). - Extracting unaligned reads (
SAMtools) - De novo assembley of host unmapped reads (
SPAdes) - Pathogen classification using 3 tools:
Kraken2,BLASTn,Centrifugewhere the consensus of two or more is taken forward. - Classified reads from the pathogen of interest are extracted and functionally annotated using
BLASTnagainst indexed RefSeq for transcripts/genomes of the pathogen of interest.
- A custom combined reference genome is created using the host and pathogen of interest reference genoemes (
bwa) - Whole genome sequencing data is aligned to the combined reference genome (
bwa) - Discordant reads where one read maps to the pathogen of interest and it's mate to the host reference genome are extracted (
SAMtools) - Filtering of duplicated reads and alignemnt quality (MAPQ scores), (
Picard tools,SAMtools) - As before - classified reads from the pathogen of interest are extracted and functionally annotated using
BLASTnagainst indexed RefSeq for transcripts/genomes of the pathogen of interest. - Discordant read coordinates extracted (
Bedtools)
The pipeline was written by the Cancer Bioinformatics and Translational Systems Biology group at King's College London, UK.
Pipeline development and implementation by Radhika Kataria.
Study concept and design Radhika Kataria, Anita Grigoriadis, Saeed Shoaie
