Release notes from deepvariant

DeepVariant 1.10.0

2026-03-05T23:36:43Z

DeepVariant:

Continuous phasing: Long-read variant calls (PacBio and ONT) are now natively phased and phased output is generated for both vcf and gvcf formats.
Fuzzy channels: Added “fuzzy channel” logic to ONT model for better homopolymer resolution. This results in ~20-25% error reduction compared to existing methods.
RNA-seq support: RNA-seq model and now supported as a model type. A case-study has been added for RNA-seq data.
Postprocessing improvement: Implemented a new multiallelic variant post-processing method called “product” which is enabled for all modes except for WES.
Steamlining input parameters: run_deepvariant and run_deepsomatic now reads parameters from model.example_info.json files which must be present with the models to run.

DeepSomatic:

Small model in DeepSomatic: Introduced small models for tumor-normal modes in DeepSomatic improving the runtime between 12% to 40%.

Pangenome-aware DeepVariant:

Local reassembly improvements: Improvements in local reassembly process with de-bruijn graph that reduces total errors by ~18% in HG002 T2T truth set.

Contributions:

Ehud Amitai (@ehudamitai) from Ultima genomics for the algorithm development of multiallelic variant post-processing method that is available as “product” option.
Vasiliy Strelnikov (@vaxyzek) for streamlining the run_deepvariant script by enabling automatic flag loading using model.example_info.json files.
Sowmiya Nagarajan (@sonagarajan) - for helping to update the RNA-seq model.
Shezan Rohinton Mirzan (@shezanmirzan) for migrating small model to Keras 3 and modernizing core infrastructure.
Francisco Unda (@fcoUnda) for enhancing read sampling stability, fixing non-determinism, and creating robust read sampling approach at high coverages.
Alec Zhang (@az-e) for providing essential internal updates and maintenance to the codebase.

v1.10.0-beta

2025-10-09T17:48:00Z

This beta release focuses solely on DeepVariant, with no updates for pangenome-aware DeepVariant or DeepTrio. We encourage users to provide feedback, report bugs, and offer suggestions to help us improve.

Code is available on the r1.10.0-beta branch.
Docker: google/deepvariant:1.10.0-beta
Docker (GPU): google/deepvariant:1.10.0-beta-gpu
We have updated the metrics page with the latest accuracy / runtime results.

Key updates are detailed below.

Continuous Phasing

It is now possible for DeepVariant to natively emit a phased VCF for long reads (PacBio and ONT), leveraging the long-range information from these reads to accurately phase variants and assign a haplotype.

To enable this feature, you must set the following flags when running with run_deepvariant:

--make_examples_extra_args="phase_reads=true,output_phase_info=true,output_local_read_phasing=/tmp/read-phasing_debug@${N_SHARDS}.tsv" \
--postprocess_variants_extra_args="phased_reads_input_path=/tmp/read-phasing_debug@${N_SHARDS}.tsv"

Make sure that N_SHARDS matches the sharding set globally.

model.example_info.json

Models can now be packaged with an extra file called model.example_info.json which carries the flags needed to generate examples (model inputs) when running inference. Here is an example of what this looks like:

{
  "version": "1.10.0-beta",
  "shape": [100, 147, 10],
  "channels": [1, 2, 3, 4, 5, 6, 7, 26, 9, 10],
  "flags_for_calling": {
    "alt_aligned_pileup": "diff_channels",
    "call_small_model_examples": true,
    "keep_supplementary_alignments": true,
    "max_reads_per_partition": 600,
    "min_mapping_quality": 1,
    "parse_sam_aux_fields": true,
    "partition_size": 25000,
    "phase_reads": true,
    "pileup_image_height": 100,
    "pileup_image_width": 147,
    "realign_reads": false,
    "small_model_indel_gq_threshold": 16,
    "small_model_snp_gq_threshold": 15,
    "small_model_vaf_context_window_size": 51,
    "sort_by_haplotypes": true,
    "track_ref_reads": true,
    "trained_small_model_path": "/opt/smallmodels/pacbio",
    "trim_reads_for_pileup": true,
    "vsc_min_fraction_indels": 0.12
  }

}

The flags used to generate examples are specific to each model, and it is important that they are set correctly for a given model to match the characteristics the model was trained on.

How is model.example_info.json useful?

DeepVariant can be run in two ways. The first way is to use the run_deepvariant command, which automatically sets options and runs each stage of DeepVariant.

The second way is to run these stages (make_examples, call_variants, and postprocess_variants) individually. This method can be significantly faster and more efficient because make_examples and call_variants can be parallelized - even across multiple machines. However, previously, this approach required that the flags for make_examples be set manually, which makes constructing more efficient pipelines tricky. With this change, users can provide the make_examples stage with the --checkpoint flag, and the model_example_info.json flag will be read in and used to set the flags appropriate for the given model.

Using model.example_info.json:

Here is an example illustrating how you could make use this setup:

[email protected]" \ --checkpoint "/opt/models/pacbio" \ --task=1">

make_examples \
  --mode calling \
  --ref hg38.fa \
  --reads pacbio_input.bam \
  --examples "[email protected]" \
  --checkpoint "/opt/models/pacbio" \
  --task=1

The logs should report the flags that are then set using model.example_info.json:

[make_examples_core.py:3794] Flags for calling:
alt_aligned_pileup: diff_channels
call_small_model_examples: True
keep_supplementary_alignments: True
…

Docker Images are Streamlined

Docker images have been simplified to have fewer layers and to remove unnecessary files / layers. The table below illustrates the difference in terms of disk size and the number of layers.

Version	Size	Number of Layers
1.9	6.1GB	114
1.10.0-beta	4.8GB	23

This reduces the size by ~21% and the number of layers by ~80%.

Additional Updates

This list is not exhaustive, and smaller bug fixes and improvements may not be listed here.

The ONT model now uses a new input channel, READ_SUPPORTS_VARIANT_FUZZY that indicates support for a variant based on a fuzzy, rather than exact match.
The ONT model now sets alt_aligned_pileup=’rows’, meaning that alternative alignments are encoded using additional pileup rows in the model input, rather than additional channels.
The PacBio now uses the --keep_supplementary_alignments flag which leads to a slight improvement in accuracy.
Tensorflow updated from 2.13.1 to 2.16.1.
CUDA has been updated from 11.8 to 12.3 and cuDNN has been updated from 8.6.0 to 8.9.0 in our GPU docker image.
Use std::stable_sort instead of std::sort for pileup image rows. This leads to consistent pileup image generation.

Note: Some outputs (e.g. VCF) may still report v1.9 in the header as we did not update all version references.

DeepVariant 1.9.0

2025-05-13T20:01:31Z

DeepVariant:

In this version we have updated our training scheme for the HG002 sample with the newly released HG002-T2T truth set which improves accuracy against that truth set.
Our labeling method has been updated to accommodate the complex representation of variants which are more common in the new HG002 T2T truth set.
Faster inference (~20% runtime reduction) achieved by improving call_variants by improving numpy array and tensor handling

DeepSomatic:

In this release, we are introducing FFPE_WGS_TUMOR_ONLY and FFPE_WES_TUMOR_ONLY models.
The WGS and WGS_TUMOR_ONLY models have been retrained with all datasets described in the manuscript, tumor-in-normal and normal contamination datasets.
Overall, we see improved generalization because of training dataset updates. We highly recommend updating to 1.9.0 for DeepSomatic analysis.

DeepTrio:

Very large speed improvement - reduced runtime by 80%. This is achieved by introducing the small model scheme to DeepTrio. We observe similar or better accuracy compared to previous versions.
We observe the inclusion of Small model improves de novo variant accuracy for DeepTrio.

Pangenome-aware DeepVariant:

All models have been trained with the HG002 T2T truth set which shows improved accuracy in the new T2T truth set.

We are thankful for the contributions from:

Ben Soudry (@ben-soudry) -- For helping to refactor the channels interface and simplifying the process of adding new channels.
Mike Kruskal (@mkruskal-google) -- For helping to upgrade tensorflow and protobuf versions.
Sowmiya Nagarajan (@strangest-quark) -- Working on phasing candidate variants.
Suchismita Tripathy (@sushi15) -- Improving the SNP and INDEL metrics reporting during training.
Francisco Unda (@fcoUnda) -- Improving the downsampling approach in make_examples to improve representations for low allele frequency variants.
Vasiliy Strelnikov (@vaxyzek) - adding deepsomatic capabilities into nf-core: nf-core/modules#6622
Sam Yadav (@yadavs33-roche) and Seraj Ahmad (@ahmads9-roche) for their contribution to improve the examples shuffle code.

Student researchers:

Mobin Asri (@mobinasri) -- Further improving the implementation of pangenome-aware DeepVariant.
Farica Zhuang (@faricazjj) -- For contributing to the phasing method within DeepVariant.

DeepVariant 1.8.0

2024-12-09T23:51:20Z

In this release:

Small model integration: Speed increased by ~1.7x (40% runtime reduction) for WGS, PacBio, and ONT by introduction of additional small model. The small model identifies easy-to-call sites and invokes the standard DeepVariant model for harder sites. We observe similar or improved accuracies and confidence calibration with this combination. Use of the small model can be disabled with --disable_small_model=true option. For details, please see small model details doc.
Pangenome-aware variant calling: Added a new ability to directly use information from a pangenome in the process of variant calling. This improves accuracy with both BAMs mapped with standard BWA and with BAMs using vg-Giraffe to a pangenome. Error reduction is ~30% with vg-Giraffe mapped WGS, 10% with BWA-mapped WGS, and 5% for BWA-mapped WES. See details in metrics page.
Configure a fast pipeline: Optional mode to increase efficiency for high-throughput GPU implementations. Configurations which pipeline example generation with GPU-based variant calling to increase utilization of GPU resources. See case study for details.
Introduced new Mas-Seq models for variant calling with Kinnex kits/Mas-Seq data. See case study for details.
PacBio models are now trained with labels from the Platinum Pedigree, which reduces errors by 34% on this more comprehensive truth set including very difficult parts of the genome.
Added SPRQ data to PacBio training datasets, improving accuracy for SPRQ chemistry. Updated the PacBio case study data to 2024 SPRQ release. Reduced error on SPRQ chemistry by 27% percent relative to DeepVariant v1.6. Updating to DeepVariant v1.8 is recommended for SPRQ.
Updated how model file metadata is specified, to accommodate more flexible ways of specifying channels. Custom models now require an accompanying example_info.json file containing the image shape details generated during training image generation in make_examples and call_variants stage. An example use of custom model is T7 cas-study where you can see example_info.json file is downloaded in this section to successfully run DeepVariant.

We are thankful for the contributions from:

Mobin Asri (@mobinasri) and Juan Carlos Mier (@jmier2) on pangenome-aware DeepVariant work.
Ralf W. Grosse-Kunstleve (@rwgk) for helping to migrate from CLIF to pybind.
Shiyi Yin (@yinshiyi) for Mas-Seq model work.
Maya Venkatraman (@mv2731) for helping to explore model architectures.
Ben Soudry (@ben-soudry) for helping to streamline channel inputs.
Atilla Kiraly (@akiraly1) and Yuchen Zhou (@Yuchen-95) on explainability work.
Jorge Gonzalez Mendez (@jgonzalezmendez) on improving the C++ code quality.
Stephanie Steele (@stesteele) for helping migrate python code to C++.

r1.8.0: Fix typos and doc formatting

2024-12-04T04:48:58Z

PiperOrigin-RevId: 702568997

DeepVariant 1.6.1

2024-03-19T19:20:10Z

In this release:

We fixed a bug in call_variants that caused the step to freeze in cases where there were no examples. This bug was observed and reported in #764, #769, google/deepsomatic#8.
Updated libssw library from 1.2.4 to 1.2.5.
The same model files are used for v1.6.0 and v1.6.1 for all technologies.

DeepVariant 1.6.0

2023-10-24T05:09:46Z

Improved support for haploid regions, chrX and chY. Users can specify haploid regions with a flag. Updated case studies show usage and metrics.
Added pangenome workflow (FASTQ-to-VCF mapping with VG and DeepVariant calling). Case study demonstrates improved accuracy
Substantial improvements to DeepTrio de novo accuracy by specifically training DeepTrio for this use case (for chr20 at 30x HG002-HG003-HG004, false negatives reduced from 8 to 0 with DeepTrio v1.4, false positives reduced from 5 to 0).
We have added multi-processing ability in postprocess_variants which reduces 48 minutes to 30 minutes for Illumina WGS and 56 minutes to 33 minutes with PacBio.
We have added new models trained with Complete genomics data, and added case studies.
We have added NovaSeqX to the training data for the WGS model.
We have migrated our training and inference platform from Slim to Keras.
Force calling with approximate phasing is now available.

We are sincerely grateful to

@wkwan and @paulinesho for the contribution to helping in Keras move.
@lucasbrambrink for enabling multiprocessing in postprocess_variants.
@MSamman, @akiraly1 for their contributions.
PacBio: William Rowell (@williamrowell), Nathaniel Echols for their feedback and testing.
UCSC: Benedict Paten(@benedictpaten), Shloka Negi (@shlokanegi), Jimin Park (@jimin001), Mobin Asri (@mobinasri) for the feedback.

DeepVariant 1.5.0

2023-02-28T18:11:51Z

New model datatype: --model_type ONT_R104 is a new option. Starting from v1.5, DeepVariant natively supports ONT R10.4 simplex and duplex data.
- For older ONT chemistry, please continue to use PEPPER-Margin-DeepVariant.
Incorporated PacBio Revio training data in DeepVariant PacBio model. In our evaluations this single model performs well on both Sequel II and Revio datatypes. Please use DeepVariant v1.5 and later for Revio data.
Incorporated Element Biosciences data in WGS models. We found that we could jointly train a short-read WGS model with both Illumina and Element data. Inclusion of Element data improves accuracy on Element without negative effect on Illumina. Please use the WGS model for best results on either Illumina or Element data.
Added vg/Giraffe-mapped BAMs to DeepVariant WGS training data (alongside existing BWA). We observed that a single model can be trained for strong results with both BWA and vg/Giraffe.
Improved DeepVariant WES model for 100bps exome sequencing thanks to user-reported issues (including #586 and #592).
Thanks to Tong Zhu from Nvidia for his suggestion to improve the logic for shuffling reads.
Thanks to Doron Shem-Tov (@doron-st) and Ilya Soifer (@ilyasoifer) from Ultima Genomics for adding new functionalities enabled by flags --enable_joint_realignment and --p_error.
Thanks to Dennis Yelizarov for improving Google-internal infrastructure for running make_examples.
Updated TensorFlow version to 2.11.0. Updated htslib version to 1.13.

DeepVariant 1.4.0

2024-05-15T20:11:42Z

Simplified DeepVariant PacBio by introducing approximate haplotagging. This means PacBio users who run DeepVariant no longer need to run DeepVariant+WhatsHap+DeepVariant. See PacBio case study for more information.
For Illumina WGS and WES, we add an additional feature of read insert size (insert_size) . This reduces errors by 4-10% for Illumina WGS and WES model. Thanks @lucasbrambrink for implementing this feature.
Reduced the runtime of the postprocess_variants step by 10-30%. Thanks @MosheWagner for optimizing the code.
Included experimental code which explores use of Keras for model architecture. This is not used in production methods, but may be informative to developers seeking examples of Keras applied to similar problems. Thanks @wkwan and @paulinesho for their contributions.
We did not include OpenVINO by default in the Docker images we released. Users can still build their own Docker images with the option turned on as needed.
Updated 2022-10-17: We have released an Illumina RNA-seq model and added an RNA-seq case study.

DeepVariant 1.3.0

2021-12-10T07:12:31Z

Improved the DeepTrio PacBio models on PacBio Sequel II Chemistry v2.2 by including this data in the training dataset.
Improved call_variants speed for PacBio models (both DeepVariant and DeepTrio) by reducing the default window width from 221 to 199, without tradeoff on accuracy. Thanks to @lucasbrambrink for conducting the experiments to find a better window width for PacBio.
Introduced a new flag --normalize_reads in make_examples, which normalizes Indel candidates at the reads level.This flag is useful to reduce rare cases where an indel variant is not left-normalized. This feature is mainly relevant to joint calling of large cohorts for joint calling, or cases where read mappings have been surjected from one reference to another. It is currently set to False by default. To enable it, add --normalize_reads=true directly to the make_examples binary. If you’re using the run_deepvariant one-step approach, add --make_examples_extra_args="normalize_reads=true". Currently we don’t recommend turning this flag on for long reads due to potential runtime increase.
Added an --aux_fields_to_keep flag to the make_examples step, and set the default to only the auxiliary fields that DeepVariant currently uses. This reduces memory use for input BAM files that have large auxiliary fields that aren’t used in variant calling. Thanks to @williamrowell and @rhallPB for reporting this issue.
Reduced the frequency of logging in make_examples as well as call_variants to address the issue reported in #491.