Fast and accurate genomic analyses using genome graphs

Nat Genet. 2019 Feb;51(2):354-362. doi: 10.1038/s41588-018-0316-4. Epub 2019 Jan 14.

Abstract

The human reference genome serves as the foundation for genomics by providing a scaffold for alignment of sequencing reads, but currently only reflects a single consensus haplotype, thus impairing analysis accuracy. Here we present a graph reference genome implementation that enables read alignment across 2,800 diploid genomes encompassing 12.6 million SNPs and 4.0 million insertions and deletions (indels). The pipeline processes one whole-genome sequencing sample in 6.5 h using a system with 36 CPU cores. We show that using a graph genome reference improves read mapping sensitivity and produces a 0.5% increase in variant calling recall, with unaffected specificity. Structural variations incorporated into a graph genome can be genotyped accurately under a unified framework. Finally, we show that iterative augmentation of graph genomes yields incremental gains in variant calling accuracy. Our implementation is an important advance toward fulfilling the promise of graph genomes to radically enhance the scalability and accuracy of genomic analyses.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Genome, Human / genetics*
  • Genomics / methods
  • Humans
  • Polymorphism, Single Nucleotide / genetics
  • Sequence Alignment / methods
  • Sequence Analysis, DNA / methods
  • Sequence Deletion / genetics
  • Whole Genome Sequencing / methods