Format

Send to

Choose Destination
Bioinformatics. 2014 Dec 15;30(24):3458-66. doi: 10.1093/bioinformatics/btu714. Epub 2014 Oct 28.

Characterization of structural variants with single molecule and hybrid sequencing approaches.

Author information

1
Department of Computer Science, Brown University, RI Department of Genetics and Genomic Sciences, Icahn School of Medicine, Mount Sinai, NY Institute for Genomics and Multiscale Biology, Icahn School of Medicine, Mount Sinai, NY School of Natural Sciences, University of California, Merced, CA Pacific Biosciences, Menlo Park, CA Center for Computational Molecular Biology, Brown University, RI.
2
Department of Computer Science, Brown University, RI Department of Genetics and Genomic Sciences, Icahn School of Medicine, Mount Sinai, NY Institute for Genomics and Multiscale Biology, Icahn School of Medicine, Mount Sinai, NY School of Natural Sciences, University of California, Merced, CA Pacific Biosciences, Menlo Park, CA Center for Computational Molecular Biology, Brown University, RI Department of Computer Science, Brown University, RI Department of Genetics and Genomic Sciences, Icahn School of Medicine, Mount Sinai, NY Institute for Genomics and Multiscale Biology, Icahn School of Medicine, Mount Sinai, NY School of Natural Sciences, University of California, Merced, CA Pacific Biosciences, Menlo Park, CA Center for Computational Molecular Biology, Brown University, RI.

Abstract

MOTIVATION:

Structural variation is common in human and cancer genomes. High-throughput DNA sequencing has enabled genome-scale surveys of structural variation. However, the short reads produced by these technologies limit the study of complex variants, particularly those involving repetitive regions. Recent 'third-generation' sequencing technologies provide single-molecule templates and longer sequencing reads, but at the cost of higher per-nucleotide error rates.

RESULTS:

We present MultiBreak-SV, an algorithm to detect structural variants (SVs) from single molecule sequencing data, paired read sequencing data, or a combination of sequencing data from different platforms. We demonstrate that combining low-coverage third-generation data from Pacific Biosciences (PacBio) with high-coverage paired read data is advantageous on simulated chromosomes. We apply MultiBreak-SV to PacBio data from four human fosmids and show that it detects known SVs with high sensitivity and specificity. Finally, we perform a whole-genome analysis on PacBio data from a complete hydatidiform mole cell line and predict 1002 high-probability SVs, over half of which are confirmed by an Illumina-based assembly.

PMID:
25355789
PMCID:
PMC4253835
DOI:
10.1093/bioinformatics/btu714
[Indexed for MEDLINE]
Free PMC Article

Supplemental Content

Full text links

Icon for Silverchair Information Systems Icon for PubMed Central
Loading ...
Support Center