Display Settings:

Format

Send to:

Choose Destination
We are sorry, but NCBI web applications do not support your browser and may not function properly. More information
Nat Genet. 2011 May;43(5):491-8. doi: 10.1038/ng.806. Epub 2011 Apr 10.

A framework for variation discovery and genotyping using next-generation DNA sequencing data.

Author information

  • 1Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, Massachusetts, USA. depristo@broadinstitute.org

Abstract

Recent advances in sequencing technology make it possible to comprehensively catalog genetic variation in population samples, creating a foundation for understanding human disease, ancestry and evolution. The amounts of raw data produced are prodigious, and many computational steps are required to translate this output into high-quality variant calls. We present a unified analytic framework to discover and genotype variation among multiple samples simultaneously that achieves sensitive and specific results across five sequencing technologies and three distinct, canonical experimental designs. Our process includes (i) initial read mapping; (ii) local realignment around indels; (iii) base quality score recalibration; (iv) SNP discovery and genotyping to find all potential variants; and (v) machine learning to separate true segregating variation from machine artifacts common to next-generation sequencing technologies. We here discuss the application of these tools, instantiated in the Genome Analysis Toolkit, to deep whole-genome, whole-exome capture and multi-sample low-pass (∼4×) 1000 Genomes Project datasets.

PMID:
21478889
[PubMed - indexed for MEDLINE]
PMCID:
PMC3083463
Free PMC Article

Images from this publication.See all images (6)Free text

Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6
PubMed Commons home

PubMed Commons

0 comments
How to join PubMed Commons

    Supplemental Content

    Icon for Nature Publishing Group Icon for PubMed Central
    Loading ...
    Write to the Help Desk