Format

Send to

Choose Destination
Nat Commun. 2019 Mar 1;10(1):998. doi: 10.1038/s41467-019-09025-z.

A multi-task convolutional deep neural network for variant calling in single molecule sequencing.

Author information

1
Department of Computer Science, The University of Hong Kong, Hong Kong, 999077, China. rbluo@cs.hku.hk.
2
Department of Computer Science, Johns Hopkins University, Baltimore, 21218, MD, USA. rbluo@cs.hku.hk.
3
Human Genome Sequencing Center, Baylor College of Medicine, Houston, 77030, TX, USA.
4
Department of Computer Science, The University of Hong Kong, Hong Kong, 999077, China.
5
Department of Computer Science, Johns Hopkins University, Baltimore, 21218, MD, USA.

Abstract

The accurate identification of DNA sequence variants is an important, but challenging task in genomics. It is particularly difficult for single molecule sequencing, which has a per-nucleotide error rate of ~5-15%. Meeting this demand, we developed Clairvoyante, a multi-task five-layer convolutional neural network model for predicting variant type (SNP or indel), zygosity, alternative allele and indel length from aligned reads. For the well-characterized NA12878 human sample, Clairvoyante achieves 99.67, 95.78, 90.53% F1-score on 1KP common variants, and 98.65, 92.57, 87.26% F1-score for whole-genome analysis, using Illumina, PacBio, and Oxford Nanopore data, respectively. Training on a second human sample shows Clairvoyante is sample agnostic and finds variants in less than 2 h on a standard server. Furthermore, we present 3,135 variants that are missed using Illumina but supported independently by both PacBio and Oxford Nanopore reads. Clairvoyante is available open-source ( https://github.com/aquaskyline/Clairvoyante ), with modules to train, utilize and visualize the model.

PMID:
30824707
PMCID:
PMC6397153
DOI:
10.1038/s41467-019-09025-z
[Indexed for MEDLINE]
Free PMC Article

Supplemental Content

Full text links

Icon for Nature Publishing Group Icon for PubMed Central
Loading ...
Support Center