Skip to content

manastast/t2t-qc

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 
 
 
 
 
 
 

Repository files navigation

###################################################################################

T2T–QC PIPELINE

Telomere-to-Telomere Assembly Quality Check

###################################################################################

DOI License: MIT

t2t-qc

Pipeline for telomere-to-telomere QC of genome assemblies.

Performs:

  • Basic assembly length statistics
  • Telomere motif scanning at contig ends
  • Identification of “primary” contigs (covering ~98% assembly length)
  • N-run (gap) detection and per-contig summary
  • Clustering of gaps into biological gap groups
  • Genome-wide and per-chromosome gap-distribution plots

INPUT

Required input: a genome assembly FASTA file.

Run with:

qsub -v FASTA=assembly.fa,PREFIX=my_sample t2t_qc.pbs

Optional parameters:

TELOMERE_MOTIF     (default: TTTAGGG)
TELOMERE_WINDOW    (default: 200000)
THREADS            (default: 8)
GAP_CLUSTER_DIST   (default: 1000000)
CONDA_ENV          (path to Conda environment)

Create a conda environment:

conda create -n t2tqc samtools gnuplot python minimap2 -c bioconda -c conda-forge

Run using an activated environment:

conda activate t2tqc
qsub -v FASTA=assembly.fa,PREFIX=my_sample t2t_qc.pbs

Or pass the environment directly:

qsub -v FASTA=assembly.fa,PREFIX=my_sample,CONDA_ENV=/path/to/env t2t_qc.pbs

OUTPUT FILES

The following outputs are generated (PREFIX = chosen prefix):

Basic statistics

  • PREFIX.lengths.tsv

Telomere scan results

  • PREFIX.contigs_telomeres.tsv

Primary contigs

  • PREFIX.contigs_primary.tsv
  • primary_chroms.tsv

Gap detection

  • PREFIX.gaps.stats.tsv
  • PREFIX.gaps.bed

Gap clustering

  • PREFIX.gaps.clustered.bed
  • PREFIX.gaps.clusters.stats.tsv

Plots

  • PREFIX.gaps_per_chr.N_clusters.png
  • PREFIX.gaps_genome.N_clusters.png

DIRECTORY STRUCTURE

t2t-qc/
  t2t_qc.pbs
  README.md
  scripts/
    00_basic_stats.sh
    01_telomere_scan.sh
    02_primary_contigs.sh
    03_gap_stats.sh
    04_cluster_gaps.sh
    05_plot_per_chr.sh
    06_plot_genome.sh

CITATION

If you use t2t-qc in your research, please cite:

Boutsika, A. (2025). t2t-qc: Pipeline for telomere-to-telomere genome assembly QC  
(Version 1.0.0) [Computer software]. Zenodo.  
https://doi.org/10.5281/zenodo.17855666

BibTeX:

@software{boutsika_2025_t2tqc,
  author       = {Boutsika, Anastasia},
  title        = {t2t-qc: Pipeline for telomere-to-telomere genome assembly QC},
  month        = jan,
  year         = 2025,
  publisher    = {Zenodo},
  version      = {1.0.0},
  doi          = {10.5281/zenodo.17855666},
  url          = {https://doi.org/10.5281/zenodo.17855666}
}

About

Pipeline for telomere-to-telomere QC of genome assemblies: assembly stats, telomere detection, gap analysis, clustering, and visualization.

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages