Format

Send to

Choose Destination
Nat Biotechnol. 2019 May;37(5):540-546. doi: 10.1038/s41587-019-0072-8. Epub 2019 Apr 1.

Assembly of long, error-prone reads using repeat graphs.

Author information

1
Department of Computer Science and Engineering, University of California, San Diego, CA, USA.
2
Graduate Program in Bioinformatics and Systems Biology, University of California, San Diego, CA, USA.
3
Research School of Computer Science, Australian National University, Canberra, Australian Capital Territory, Australia.
4
Department of Computer Science and Engineering, University of California, San Diego, CA, USA. ppevzner@ucsd.edu.

Abstract

Accurate genome assembly is hampered by repetitive regions. Although long single molecule sequencing reads are better able to resolve genomic repeats than short-read data, most long-read assembly algorithms do not provide the repeat characterization necessary for producing optimal assemblies. Here, we present Flye, a long-read assembly algorithm that generates arbitrary paths in an unknown repeat graph, called disjointigs, and constructs an accurate repeat graph from these error-riddled disjointigs. We benchmark Flye against five state-of-the-art assemblers and show that it generates better or comparable assemblies, while being an order of magnitude faster. Flye nearly doubled the contiguity of the human genome assembly (as measured by the NGA50 assembly quality metric) compared with existing assemblers.

PMID:
30936562
DOI:
10.1038/s41587-019-0072-8
[Indexed for MEDLINE]

Supplemental Content

Full text links

Icon for Nature Publishing Group
Loading ...
Support Center