SVJedi-graph is a structural variation (SV) genotyper for long read data. It takes as input a variant file (VCF), a reference genome (fasta) and a long read file (fasta/fastq) and outputs the initial variant file with an additional column containing genotyping information (VCF).
SVjedi-graph is based on a representation of the genome and the different SV alleles in a variation graph. After building this variation graph from the reference genome sequence and the input variant file, long reads are mapped on this graph using minigraph1. Then it estimates the genotype of each variant in a given individual sample based on allele-specific alignment counts.
Currently, SVJedi-graph can genotype five types of SVs: deletions, insertions, duplications, inversions and translocations (intra- and inter-chromosomal).
SVJedi-graph requires :
- Python (3.8.13 or higher)
- minigraph
conda install -c bioconda svjedi-graphgit clone https://gitlab.inria.fr/sromain/svjedi-graph.git./svjedi-graph.py -v <inputVCF> -r <refFA> -q <longreadsFQ> [ -p <output_prefix> -t <threads> -ms <minsupport> ]For all variants, the SVTYPE tag must be present in the INFO field (SVTYPE=DEL or SVTYPE=INS or SVTYPE=INV or SVTYPE=BND). Insertions need to be sequence-resolved with the full inserted sequence characterized and reported in the ALT field of the VCF file. As duplications are a special case of insertions, SVJedi-graph supports also duplications, as long as their duplicated sequence is characterized and reported similarly to insertions. More details are given in SV representation in VCF.
To check that SVJedi-graph behaves as expected on your device, you can run:
cd test-dir/
./run_test.shTo explore the output files on a small dataset, run:
mkdir outputfiles
cd outputfiles
./../svjedi-graph.py -v ../test-dir/test.vcf -r ../test-dir/reference_genome.fasta -q ../test-dir/simulated_reads.fastq.gz -p test-v--vcfVCF file containing the set of SVs to genotype.-r--refFASTA file containing the reference genome (on which the SVs have been identified).-q--readsFASTQ file containing the long reads used to genotype. If you have multiple FASTQ files for one individual, use,as a filename separator.-p--prefixPrefix of output files.-t--threadsNumber of threads to use for the mapping step.-ms--minsupportMinimum number of alignments to genotype a SV (default: 3>=).
Main output file:
<prefix>_genotype.vcfGenotyped SVs set in VCF format.
Intermediate output files:
<prefix>.gfaVariation graph in GFA format.<prefix>.gafMapping results from minigraph in GAF format.<prefix>_informative_aln.jsonJson dictionnary of read supports for each input SV's alleles.
Here are the information needed for SVJedi-graph to genotype the following SV types. All variants must have the CHROM and POS fields defined, with the chromosome names in the reference genome file and variant file that must be the same. The SVTYPE tag must be present in the INFO field (SVTYPE=DEL or SVTYPE=INS or SVTYPE=INV or SVTYPE=BND). Then additional information is required according to SV type:
-
Deletion
INFOfield must containSVTYPE=DELINFOfield must containEND=pos(withposbeing the end position of the deleted segment)
-
Insertion
INFOfield must containSVTYPE=INSALTfield must contain the sequence of the insertion
-
Duplication
- must be defined as an insertion event whith
CHRandPOScorresponding to the position of insertion of the novel copy INFOfield must containSVTYPE=INSALTfield must contain the sequence of the duplication
- must be defined as an insertion event whith
-
Inversion
INFOfield must containSVTYPE=INVINFOfield must containEND=postag, withposbeing the second breakpoint position
-
Intra-chromosomal translocation
INFOfield must containSVTYPE=BNDALTfield must be formated as:t[pos[,t]pos],]pos]tor[pos[t, withposindicating the second breakpoint position and brackets directions indicating which parts of the two chromosomes should be joined together
Sandra Romain, Claire Lemaitre, SVJedi-graph: improving the genotyping of close and overlapping structural variants with long reads using a variation graph, Bioinformatics, Volume 39, Issue Supplement_1, June 2023, Pages i270–i278, https://doi.org/10.1093/bioinformatics/btad237
SVJedi-graph is a Genscale tool developed by Sandra Romain and Claire Lemaitre. For any bug report or feedback, please use the Github Issues form.
Footnotes
-
Li, H., Feng, X. & Chu, C. The design and construction of reference pangenome graphs with minigraph. Genome Biol 21, 265 (2020). https://doi.org/10.1186/s13059-020-02168-z ↩