Format

Send to

Choose Destination
See comment in PubMed Commons below
J Comput Biol. 2013 Apr;20(4):359-71. doi: 10.1089/cmb.2012.0098. Epub 2012 Jul 17.

Pathset graphs: a novel approach for comprehensive utilization of paired reads in genome assembly.

Author information

1
Department of Computer Science and Engineering, University of California, San Diego, La Jolla, CA, USA.

Abstract

One of the key advances in genome assembly that has led to a significant improvement in contig lengths has been improved algorithms for utilization of paired reads (mate-pairs). While in most assemblers, mate-pair information is used in a post-processing step, the recently proposed Paired de Bruijn Graph (PDBG) approach incorporates the mate-pair information directly in the assembly graph structure. However, the PDBG approach faces difficulties when the variation in the insert sizes is high. To address this problem, we first transform mate-pairs into edge-pair histograms that allow one to better estimate the distance between edges in the assembly graph that represent regions linked by multiple mate-pairs. Further, we combine the ideas of mate-pair transformation and PDBGs to construct new data structures for genome assembly: pathsets and pathset graphs.

PMID:
22803627
PMCID:
PMC3619201
DOI:
10.1089/cmb.2012.0098
[Indexed for MEDLINE]
Free PMC Article
PubMed Commons home

PubMed Commons

0 comments
How to join PubMed Commons

    Supplemental Content

    Full text links

    Icon for Mary Ann Liebert, Inc. Icon for PubMed Central
    Loading ...
    Support Center