Display Settings:

Format

Send to:

Choose Destination
See comment in PubMed Commons below
J Comput Biol. 2009 Aug;16(8):1101-16. doi: 10.1089/cmb.2009.0047.

Maximum likelihood genome assembly.

Author information

  • 1Department of Computer Science, University of Toronto , Toronto, Canada.

Abstract

Whole genome shotgun assembly is the process of taking many short sequenced segments (reads) and reconstructing the genome from which they originated. We demonstrate how the technique of bidirected network flow can be used to explicitly model the double-stranded nature of DNA for genome assembly. By combining an algorithm for the Chinese Postman Problem on bidirected graphs with the construction of a bidirected de Bruijn graph, we are able to find the shortest double-stranded DNA sequence that contains a given set of k-long DNA molecules. This is the first exact polynomial time algorithm for the assembly of a double-stranded genome. Furthermore, we propose a maximum likelihood framework for assembling the genome that is the most likely source of the reads, in lieu of the standard maximum parsimony approach (which finds the shortest genome subject to some constraints). In this setting, we give a bidirected network flow-based algorithm that, by taking advantage of high coverage, accurately estimates the copy counts of repeats in a genome. Our second algorithm combines these predicted copy counts with matepair data in order to assemble the reads into contigs. We run our algorithms on simulated read data from Escherichia coli and predict copy counts with extremely high accuracy, while assembling long contigs.

PMID:
19645596
[PubMed - indexed for MEDLINE]
PMCID:
PMC3154397
Free PMC Article

Images from this publication.See all images (5)Free text

FIG. 1.
FIG. 2
FIG. 3.
FIG. 4.
FIG. 5.
PubMed Commons home

PubMed Commons

0 comments
How to join PubMed Commons

    Supplemental Content

    Icon for Mary Ann Liebert, Inc. Icon for PubMed Central
    Loading ...
    Write to the Help Desk