Tags: ACEnglish/kanpig
Tags
Faster, Smaller, More Accurate Genotyping Kanpig now has sub-commands `gt` and `plup`. The new `plup` command will extract reads and their SVs from a bam file into a small file that's useful for long-term storage of reads. The `gt` command can now parse these plup files much more quickly than parsing a bam. The `gt` command will now (optionally) parse PS and HP tags in a bam to increase genotyping accuracy as well as record long-range phasing information in SV genotypes. The `gt` command now uses kmedoid clustering instead of kmeans, resulting in a modest improvement to genotyping accuracy.
Consistency Upgrades * New filtering of haplotypes without paths increases accuracy * New path scoring improves accuracy and consistency * ZS and SS FORMAT fields replaced by KS reporting the score * Requiring reads to span the full variant graph window including --chunksize buffer increases accuracy * Exhaustive search of partial haplotypes * Slight runtime reduction from avoidance of redundant path searches
Improvements * ~8% speed increase from less work in the path-searching * Partial haplotypes bug fix increases accuracy * Fixed SQ and FT fields * Dedicated writing thread helps reduce memory usage by preventing a backlog of completed variants while reading * Default --out is stdout to allow easier compression/indexing (e.g. kanpig .. | bcftools sort -O z -o out.vcf.gz) * IUPAC codes are fixed by kanpig according to vcf specifications (Issue #1) * Fixed filtering of symbolic alts and BNDs * Argument validation
v0.2.0 * Up to 40% reduction in runtime * Hemizygous and sex chromosome aware genotyping with new `--ploidy-bed` * Variants with alternate alleles of stars, monozygotic reference, and BNDs are filtered out * PathScores now compared with average of size and sequence similarity for increased accuracy
PreviousNext