Send to

Choose Destination
Bioinformatics. 2014 Jan 1;30(1):31-7. doi: 10.1093/bioinformatics/btt310. Epub 2013 Jun 3.

Informed and automated k-mer size selection for genome assembly.

Author information

Department of Computer Science and Engineering and Department of Biochemistry and Molecular Biology, The Pennsylvania State University, University Park, PA 16802, USA.



Genome assembly tools based on the de Bruijn graph framework rely on a parameter k, which represents a trade-off between several competing effects that are difficult to quantify. There is currently a lack of tools that would automatically estimate the best k to use and/or quickly generate histograms of k-mer abundances that would allow the user to make an informed decision.


We develop a fast and accurate sampling method that constructs approximate abundance histograms with several orders of magnitude performance improvement over traditional methods. We then present a fast heuristic that uses the generated abundance histograms for putative k values to estimate the best possible value of k. We test the effectiveness of our tool using diverse sequencing datasets and find that its choice of k leads to some of the best assemblies.


Our tool KmerGenie is freely available at:

[Indexed for MEDLINE]

Supplemental Content

Full text links

Icon for Silverchair Information Systems
Loading ...
Support Center