Format

Send to

Choose Destination
Nat Biotechnol. 2019 Aug 12. doi: 10.1038/s41587-019-0217-9. [Epub ahead of print]

Accurate circular consensus long-read sequencing improves variant detection and assembly of a human genome.

Author information

1
Pacific Biosciences, Menlo Park, CA, USA.
2
Google Inc., Mountain View, CA, USA.
3
Center for Bioinformatics, Saarland University, Saarbrücken, Germany.
4
Max Planck Institute for Informatics, Saarbrücken, Germany.
5
Graduate School of Computer Science, Saarland University, Saarbrücken, Germany.
6
DNAnexus, Mountain View, CA, USA.
7
National Institute of Standards and Technology, Gaithersburg, MD, USA.
8
Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA.
9
Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA.
10
Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, Bethesda, MD, USA.
11
Max Planck Institute of Molecular Cell Biology and Genetics, Dresden, Germany.
12
Agricultural Genomics Institute, Chinese Academy of Agricultural Sciences, Shenzhen, China.
13
Dana-Farber Cancer Institute, Boston, MA, USA.
14
Pacific Biosciences, Menlo Park, CA, USA. drank@pacb.com.
15
Pacific Biosciences, Menlo Park, CA, USA. mhunkapiller@pacb.com.

Abstract

The DNA sequencing technologies in use today produce either highly accurate short reads or less-accurate long reads. We report the optimization of circular consensus sequencing (CCS) to improve the accuracy of single-molecule real-time (SMRT) sequencing (PacBio) and generate highly accurate (99.8%) long high-fidelity (HiFi) reads with an average length of 13.5 kilobases (kb). We applied our approach to sequence the well-characterized human HG002/NA24385 genome and obtained precision and recall rates of at least 99.91% for single-nucleotide variants (SNVs), 95.98% for insertions and deletions <50 bp (indels) and 95.99% for structural variants. Our CCS method matches or exceeds the ability of short-read sequencing to detect small variants and structural variants. We estimate that 2,434 discordances are correctable mistakes in the 'genome in a bottle' (GIAB) benchmark set. Nearly all (99.64%) variants can be phased into haplotypes, further improving variant detection. De novo genome assembly using CCS reads alone produced a contiguous and accurate genome with a contig N50 of >15 megabases (Mb) and concordance of 99.997%, substantially outperforming assembly with less-accurate long reads.

PMID:
31406327
DOI:
10.1038/s41587-019-0217-9

Supplemental Content

Full text links

Icon for Nature Publishing Group
Loading ...
Support Center