cmclean released this
Feb 9, 2018
This release fixes issue #27 and adds support for creating the MIN_DP field in gVCF records.
pichuan released this
Jan 30, 2018
Release two separate models for calling genome and exome sequencing data. Significant improvement of Indel F1 on exome data.
Provide capability to produce gVCF files as output from DeepVariant [doc]: gVCF files are required as input for analyses that create a set of variants in a cohort of individuals, such as cohort merging or joint genotyping.
Training data: All models are trained with a benchmarking-compatible strategy: That is, we never train on any data from the HG002 sample, or from chromosome 20 from any sample.
Whole genome sequencing model: We used training data from both genome sequencing data as well as exome sequencing data.
In order to increase diversity of training data, we also used the downsample_fraction flag when making training examples.
Whole exome sequencing model: We started from a trained WGS model as a checkpoint, then we continue to train only on WES data above. We also use various downsample fractions for the training data.
DeepVariant now provides deterministic output by rounding QUAL field to one digit past the decimal when writing to VCF.
Update the model input data representation from 7 channels to 6.
Add a post-processing step to variant calls to eliminate rare inconsistent haplotypes [description].
Expand the excluded contigs list to include common problematic contigs on GRCh38 [GitHub issue].
It is now possible to run DeepVariant workflows on GCP with pre-emptible GPUs.
scott7z released this
Dec 13, 2017
· 9 commits to r0.4 since this release
This fixes a problem with htslib_gcp_oauth when network access is unavailable.
rpoplin released this
Dec 4, 2017
This is the initial open source release of DeepVariant!
It includes a model trained on 9 replicates of NA12878 / HG001 as well as copies each downsampled at 50% coverage. In our tests this additional training data means DeepVariant can generalize to a wider variety of input sequencing data. This produced approximately 100 million training examples. We use the truth set v.3.3.2 from Genome in a Bottle for training. The underlying model is Inception V3.
See historical release notes for more details.