Named after Grandeur Peak.
Image Credit: ryancornia Location: 40.707, -111.76, 8,299 ft (2,421 m) summit. Trail Info: https://utah.com/hiking/grandeur-peak
Grandeur is a species-agnostic sequencing analysis workflow developed by @erinyoung at the Utah Public Health Laboratory (UPHL). Built on Nextflow, the pipeline provides quality control (QC), de novo assembly, taxonomic profiling, and in silico serotyping for paired-end Illumina data.
While intended to augment the CDC's PHOENIX workflow, Grandeur also functions as a powerful standalone pipeline.
- Nextflow (>= 25.0.0)
- Apptainer/Singularity or Docker
# Execution with FASTQ reads (Singularity)
nextflow run UPHL-BioNGS/Grandeur -profile singularity --reads <path_to_fastqs>
# Execution with existing Assemblies (Docker)
nextflow run UPHL-BioNGS/Grandeur -profile docker --fastas <path_to_fastas>
# Execution of a full Phylogenetic Analysis
nextflow run UPHL-BioNGS/Grandeur -profile singularity --sample_sheet samples.csv --msaGrandeur is modular and executes the following stages based on inputs and flags:
- De Novo Alignment: Cleaning reads with
fastpand assembling withSPAdes. - Taxonomic Profiling: Rapid identification via
SKANI,Kraken2,Mash, andSylph. - Quality Assessment: Assembly metrics via
QUAST,CheckM2, andFastQC. - Subtyping: Organism-specific serotyping (e.g.,
Kleborate,SeqSero2,LegionellaSBT). - Phylogenetic Analysis (Optional): Core genome alignment and Maximum Likelihood trees.
| Parameter | Description |
|---|---|
--sample_sheet |
CSV with sample,fastq_1,fastq_2 |
--reads |
Directory with paired-end FASTQ files |
--fastas |
Directory with FASTA files |
--outdir |
Directory to save results (default: grandeur) |
--msa |
Toggle: Run phylogenetic analysis |
--skip_extras |
Toggle: Run only core assembly/QC |
--current_datasets |
Toggle: Download NCBI references via datasets |
Detailed guides, FAQ, and process explanations are located in the Grandeur Wiki.
NF-CORE style docs can be found in docs
nextflow run UPHL-BioNGS/Grandeur --helpRunning the workflow with --help should display a help message like the following.
N E X T F L O W ~ version 25.10.4
Launching `UPHL-BioNGS/Grandeur` [furious_bardeen] DSL2 - revision: 8a4591f1a0
Typical pipeline command:
nextflow run UPHL-BioNGS/Grandeur -profile docker --sample_sheet samplesheet.csv --outdir grandeur
--help [boolean, string] Show the help message for all top level parameters. When a parameter is given to `--help`, the full help message of that parameter will be printed.
--help_full [boolean] Show the help message for all non-hidden parameters.
--show_hidden [boolean] Show all hidden parameters in the help message. This needs to be used in combination with `--help` or `--help_full`.
Input/output options
--sample_sheet [string] csv with sample,read1,read2
--fasta_list [string] A sample sheet for fasta files
--outdir [string] The output directory where the results will be saved. You have to use absolute paths to storage on Cloud infrastructure. [default: grandeur]
Reference files/paths
--checkm2_db [string] prepared checkm2 reference file
--kraken2_db [string] directory of kraken2 database
--mash_db [string] prepared mash reference msh file
--sylph_db [string] prepared sylph reference file
--reference_genomes [string] list of genomes (in fasta format) for ANI references
workflow values
--min_core_genes [integer] minimum number of genes in core genome alignment for iqtree2 (default is 500) [default: 500]
--min_core_per [number] minimum percentage number of core genes in core genome alignment for iqtree2 (default is 0.5 or 50%) [default: 0.5]
Subworkflow toggles
--msa [boolean] toggles whether or not phylogenetic analysis will be run on samples
!! Hiding 27 param(s), use the `--show_hidden` parameter to show them !!
------------------------------------------------------
Issues and problems should be submitted to the GitHub Issues page.
Grandeur wouldn't be possible without the following tools:
- nf-tools - for keeping the schema functional
- nf-docs - creation of nf-core style docs found in docs
- amrfinderplus - identification of genes associated with antimicrobial resistence
- bakta - gene prediction
- checkm2 - assembly QC
- datasets - downloads genomes from NCBI
- drprg - TB AMR predictions
- elgato - Legionella pneumophila Sequence Based Typing (SBT)
- emmtyper - Group A Strep "emm" typing
- enatools - download fastq files from the ENA
- fastp - cleaning reads
- fastqc - fastq file QC
- gotree - stats and visualization of newick files
- heatcluster - visualizes SNP matrix from SNP dists
- iqtree - phylogenetic tree creation - used after core genome alignment
- kaptive - Vibrio and Acinetobacter subtyping
- kleborate - Klebsiella and Escherichia serotyping
- kraken2 - contamination
- mash - species identifier
- mashtree - tree based on mash distances (not impacted by size of core genome)
- meningotype - Neisseria subtyping
- mlst - identification of MLST subtype
- multiqc - summarizes QC efforts
- mykrobe - Mycobacterium subtyping
- ngmaster - Neisseria subtyping
- panaroo - core genome alignment - optional (set with params.msa = true)
- pbptyper - Penicillin Binding Protein (PBP) typer for Streptococcus pneumoniae assemblies
- plasmidfinder - MLST typing for plasmids
- prokka - gene annotation - used for core genome alignment
- quast - contig QC
- roary - core genome alignement - optional (set with params.msa = true)
- seqsero2 - Salmonella serotyping
- seqsero2S - Salmonella serotyping
- serotypefinder - E. coli serotyping
- shigapass - Shigella serotyping
- ska2 - sequencing comparison
- skani - ani comparison
- snp-dists - SNP matrix - used after core genome aligment
- spades - de novo alignment
- spestimator - species estimation
- sylph - taxonomic profiling
The expected tools are split into multiple processes. Each process has its own wiki page that we encourage users to view.
