Format

Send to

Choose Destination
Bioinformatics. 2014 Jan 1;30(1):31-7. doi: 10.1093/bioinformatics/btt310. Epub 2013 Jun 3.

Informed and automated k-mer size selection for genome assembly.

Author information

1
Department of Computer Science and Engineering and Department of Biochemistry and Molecular Biology, The Pennsylvania State University, University Park, PA 16802, USA.

Abstract

MOTIVATION:

Genome assembly tools based on the de Bruijn graph framework rely on a parameter k, which represents a trade-off between several competing effects that are difficult to quantify. There is currently a lack of tools that would automatically estimate the best k to use and/or quickly generate histograms of k-mer abundances that would allow the user to make an informed decision.

RESULTS:

We develop a fast and accurate sampling method that constructs approximate abundance histograms with several orders of magnitude performance improvement over traditional methods. We then present a fast heuristic that uses the generated abundance histograms for putative k values to estimate the best possible value of k. We test the effectiveness of our tool using diverse sequencing datasets and find that its choice of k leads to some of the best assemblies.

AVAILABILITY:

Our tool KmerGenie is freely available at: http://kmergenie.bx.psu.edu/.

PMID:
23732276
DOI:
10.1093/bioinformatics/btt310
[Indexed for MEDLINE]

Supplemental Content

Full text links

Icon for Silverchair Information Systems
Loading ...
Support Center