Format

Send to

Choose Destination
See comment in PubMed Commons below
Nat Methods. 2011 Jan;8(1):61-5. doi: 10.1038/nmeth.1527. Epub 2010 Nov 21.

Limitations of next-generation genome sequence assembly.

Author information

1
Department of Genome Sciences, University of Washington School of Medicine and Howard Hughes Medical Institute, Seattle, Washington, USA.

Abstract

High-throughput sequencing technologies promise to transform the fields of genetics and comparative biology by delivering tens of thousands of genomes in the near future. Although it is feasible to construct de novo genome assemblies in a few months, there has been relatively little attention to what is lost by sole application of short sequence reads. We compared the recent de novo assemblies using the short oligonucleotide analysis package (SOAP), generated from the genomes of a Han Chinese individual and a Yoruban individual, to experimentally validated genomic features. We found that de novo assemblies were 16.2% shorter than the reference genome and that 420.2 megabase pairs of common repeats and 99.1% of validated duplicated sequences were missing from the genome. Consequently, over 2,377 coding exons were completely missing. We conclude that high-quality sequencing approaches must be considered in conjunction with high-throughput sequencing for comparative genomics analyses and studies of genome evolution.

Comment in

PMID:
21102452
PMCID:
PMC3115693
DOI:
10.1038/nmeth.1527
[Indexed for MEDLINE]
Free PMC Article
PubMed Commons home

PubMed Commons

0 comments
How to join PubMed Commons

    Supplemental Content

    Full text links

    Icon for Nature Publishing Group Icon for PubMed Central
    Loading ...
    Support Center