Skip to content

UPHL-BioNGS/Grandeur

Repository files navigation

Grandeur

Named after Grandeur Peak.

Image Credit: ryancornia Location: 40.707, -111.76, 8,299 ft (2,421 m) summit. Trail Info: https://utah.com/hiking/grandeur-peak


Grandeur is a species-agnostic sequencing analysis workflow developed by @erinyoung at the Utah Public Health Laboratory (UPHL). Built on Nextflow, the pipeline provides quality control (QC), de novo assembly, taxonomic profiling, and in silico serotyping for paired-end Illumina data.

While intended to augment the CDC's PHOENIX workflow, Grandeur also functions as a powerful standalone pipeline.

Quick Start

Dependencies

Basic Usage

# Execution with FASTQ reads (Singularity)
nextflow run UPHL-BioNGS/Grandeur -profile singularity --reads <path_to_fastqs>

# Execution with existing Assemblies (Docker)
nextflow run UPHL-BioNGS/Grandeur -profile docker --fastas <path_to_fastas>

# Execution of a full Phylogenetic Analysis
nextflow run UPHL-BioNGS/Grandeur -profile singularity --sample_sheet samples.csv --msa

Acknowledgements

Workflow Architecture

Grandeur is modular and executes the following stages based on inputs and flags:

  1. De Novo Alignment: Cleaning reads with fastp and assembling with SPAdes.
  2. Taxonomic Profiling: Rapid identification via SKANI, Kraken2, Mash, and Sylph.
  3. Quality Assessment: Assembly metrics via QUAST, CheckM2, and FastQC.
  4. Subtyping: Organism-specific serotyping (e.g., Kleborate, SeqSero2, Legionella SBT).
  5. Phylogenetic Analysis (Optional): Core genome alignment and Maximum Likelihood trees.

Key Parameters

Parameter Description
--sample_sheet CSV with sample,fastq_1,fastq_2
--reads Directory with paired-end FASTQ files
--fastas Directory with FASTA files
--outdir Directory to save results (default: grandeur)
--msa Toggle: Run phylogenetic analysis
--skip_extras Toggle: Run only core assembly/QC
--current_datasets Toggle: Download NCBI references via datasets

Documentation

Detailed guides, FAQ, and process explanations are located in the Grandeur Wiki.

NF-CORE style docs can be found in docs

Help message

nextflow run UPHL-BioNGS/Grandeur --help

Running the workflow with --help should display a help message like the following.


 N E X T F L O W   ~  version 25.10.4

Launching `UPHL-BioNGS/Grandeur` [furious_bardeen] DSL2 - revision: 8a4591f1a0

Typical pipeline command:

  nextflow run UPHL-BioNGS/Grandeur -profile docker --sample_sheet samplesheet.csv --outdir grandeur

--help                [boolean, string] Show the help message for all top level parameters. When a parameter is given to `--help`, the full help message of that parameter will be printed.
--help_full           [boolean]         Show the help message for all non-hidden parameters.
--show_hidden         [boolean]         Show all hidden parameters in the help message. This needs to be used in combination with `--help` or `--help_full`.

Input/output options
  --sample_sheet      [string] csv with sample,read1,read2
  --fasta_list        [string] A sample sheet for fasta files
  --outdir            [string] The output directory where the results will be saved. You have to use absolute paths to storage on Cloud infrastructure. [default: grandeur]

Reference files/paths
  --checkm2_db        [string] prepared checkm2 reference file
  --kraken2_db        [string] directory of kraken2 database
  --mash_db           [string] prepared mash reference msh file
  --sylph_db          [string] prepared sylph reference file
  --reference_genomes [string] list of genomes (in fasta format) for ANI references

workflow values
  --min_core_genes    [integer] minimum number of genes in core genome alignment for iqtree2 (default is 500) [default: 500]
  --min_core_per      [number]  minimum percentage number of core genes in core genome alignment for iqtree2 (default is 0.5 or 50%) [default: 0.5]

Subworkflow toggles
  --msa               [boolean] toggles whether or not phylogenetic analysis will be run on samples

 !! Hiding 27 param(s), use the `--show_hidden` parameter to show them !!
------------------------------------------------------

Technical Support

Issues and problems should be submitted to the GitHub Issues page.

Acknowledgements

Grandeur wouldn't be possible without the following tools:

  • nf-tools - for keeping the schema functional
  • nf-docs - creation of nf-core style docs found in docs
  • amrfinderplus - identification of genes associated with antimicrobial resistence
  • bakta - gene prediction
  • checkm2 - assembly QC
  • datasets - downloads genomes from NCBI
  • drprg - TB AMR predictions
  • elgato - Legionella pneumophila Sequence Based Typing (SBT)
  • emmtyper - Group A Strep "emm" typing
  • enatools - download fastq files from the ENA
  • fastp - cleaning reads
  • fastqc - fastq file QC
  • gotree - stats and visualization of newick files
  • heatcluster - visualizes SNP matrix from SNP dists
  • iqtree - phylogenetic tree creation - used after core genome alignment
  • kaptive - Vibrio and Acinetobacter subtyping
  • kleborate - Klebsiella and Escherichia serotyping
  • kraken2 - contamination
  • mash - species identifier
  • mashtree - tree based on mash distances (not impacted by size of core genome)
  • meningotype - Neisseria subtyping
  • mlst - identification of MLST subtype
  • multiqc - summarizes QC efforts
  • mykrobe - Mycobacterium subtyping
  • ngmaster - Neisseria subtyping
  • panaroo - core genome alignment - optional (set with params.msa = true)
  • pbptyper - Penicillin Binding Protein (PBP) typer for Streptococcus pneumoniae assemblies
  • plasmidfinder - MLST typing for plasmids
  • prokka - gene annotation - used for core genome alignment
  • quast - contig QC
  • roary - core genome alignement - optional (set with params.msa = true)
  • seqsero2 - Salmonella serotyping
  • seqsero2S - Salmonella serotyping
  • serotypefinder - E. coli serotyping
  • shigapass - Shigella serotyping
  • ska2 - sequencing comparison
  • skani - ani comparison
  • snp-dists - SNP matrix - used after core genome aligment
  • spades - de novo alignment
  • spestimator - species estimation
  • sylph - taxonomic profiling

The expected tools are split into multiple processes. Each process has its own wiki page that we encourage users to view.