Display Settings:

Format

Send to:

Choose Destination
    BMC Genomics. 2011 Jul 25;12:375.

    Identification of genomic indels and structural variations using split reads.

    Source

    Department of Genetics, Albert Einstein College of Medicine, Bronx, NY 10461, USA. zhengdong.zhang@einstein.yu.edu

    Abstract

    BACKGROUND:

    Recent studies have demonstrated the genetic significance of insertions, deletions, and other more complex structural variants (SVs) in the human population. With the development of the next-generation sequencing technologies, high-throughput surveys of SVs on the whole-genome level have become possible. Here we present split-read identification, calibrated (SRiC), a sequence-based method for SV detection.

    RESULTS:

    We start by mapping each read to the reference genome in standard fashion using gapped alignment. Then to identify SVs, we score each of the many initial mappings with an assessment strategy designed to take into account both sequencing and alignment errors (e.g. scoring more highly events gapped in the center of a read). All current SV calling methods have multilevel biases in their identifications due to both experimental and computational limitations (e.g. calling more deletions than insertions). A key aspect of our approach is that we calibrate all our calls against synthetic data sets generated from simulations of high-throughput sequencing (with realistic error models). This allows us to calculate sensitivity and the positive predictive value under different parameter-value scenarios and for different classes of events (e.g. long deletions vs. short insertions). We run our calculations on representative data from the 1000 Genomes Project. Coupling the observed numbers of events on chromosome 1 with the calibrations gleaned from the simulations (for different length events) allows us to construct a relatively unbiased estimate for the total number of SVs in the human genome across a wide range of length scales. We estimate in particular that an individual genome contains ~670,000 indels/SVs.

    CONCLUSIONS:

    Compared with the existing read-depth and read-pair approaches for SV identification, our method can pinpoint the exact breakpoints of SV events, reveal the actual sequence content of insertions, and cover the whole size spectrum for deletions. Moreover, with the advent of the third-generation sequencing technologies that produce longer reads, we expect our method to be even more useful.

    PMID:
    21787423
    [PubMed - indexed for MEDLINE]
    PMCID:
    PMC3161018
    Free PMC Article

    Images from this publication.See all images (7) Free text

    Figure 5
    Figure 6
    Figure 7
    Figure 3
    Figure 4
    Figure 2
    Figure 1

      Supplemental Content

      Click here to read Click here to read

      Recent activity

      Your browsing activity is empty.

      Activity recording is turned off.

      Turn recording back on

      See more...
      Write to the Help Desk