Display Settings:

Format

Send to:

Choose Destination
See comment in PubMed Commons below
PLoS One. 2011;6(8):e23501. doi: 10.1371/journal.pone.0023501. Epub 2011 Aug 18.

Meraculous: de novo genome assembly with short paired-end reads.

Author information

  • 1U.S. Department of Energy Joint Genome Institute, Walnut Creek, California, United States of America. jchapman@lbl.gov

Abstract

We describe a new algorithm, meraculous, for whole genome assembly of deep paired-end short reads, and apply it to the assembly of a dataset of paired 75-bp Illumina reads derived from the 15.4 megabase genome of the haploid yeast Pichia stipitis. More than 95% of the genome is recovered, with no errors; half the assembled sequence is in contigs longer than 101 kilobases and in scaffolds longer than 269 kilobases. Incorporating fosmid ends recovers entire chromosomes. Meraculous relies on an efficient and conservative traversal of the subgraph of the k-mer (deBruijn) graph of oligonucleotides with unique high quality extensions in the dataset, avoiding an explicit error correction step as used in other short-read assemblers. A novel memory-efficient hashing scheme is introduced. The resulting contigs are ordered and oriented using paired reads separated by ∼280 bp or ∼3.2 kbp, and many gaps between contigs can be closed using paired-end placements. Practical issues with the dataset are described, and prospects for assembling larger genomes are discussed.

PMID:
21876754
[PubMed - indexed for MEDLINE]
PMCID:
PMC3158087
Free PMC Article

Images from this publication.See all images (5)Free text

Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
PubMed Commons home

PubMed Commons

0 comments
How to join PubMed Commons

    Supplemental Content

    Icon for Public Library of Science Icon for PubMed Central
    Loading ...
    Write to the Help Desk