Send to

Choose Destination
BMC Evol Biol. 2015 Feb 10;15:13. doi: 10.1186/s12862-015-0283-7.

Automatic selection of partitioning schemes for phylogenetic analyses using iterative k-means clustering of site rates.

Author information

Office of Research Information Services, Office of the CIO, Smithsonian Institution, Washington, D.C., USA.
Department of Entomology, Rutgers University, New Brunswick, New Jersey, USA.
School of Life Sciences, Arizona State University, Tempe, AZ, USA.
Zoologisches Forschungsmuseum Alexander Koenig (ZFMK)/Zentrum für Molekulare Biodiversitätsforschung (ZMB), Bonn, Germany.
Ecology Evolution and Genetics, Research School of Biology, Australian National University, Canberra, ACT, Australia.
National Evolutionary Synthesis Center, Durham, NC, USA.
Department of Biological Sciences, Macquarie University, Sydney, Australia.



Model selection is a vital part of most phylogenetic analyses, and accounting for the heterogeneity in evolutionary patterns across sites is particularly important. Mixture models and partitioning are commonly used to account for this variation, and partitioning is the most popular approach. Most current partitioning methods require some a priori partitioning scheme to be defined, typically guided by known structural features of the sequences, such as gene boundaries or codon positions. Recent evidence suggests that these a priori boundaries often fail to adequately account for variation in rates and patterns of evolution among sites. Furthermore, new phylogenomic datasets such as those assembled from ultra-conserved elements lack obvious structural features on which to define a priori partitioning schemes. The upshot is that, for many phylogenetic datasets, partitioned models of molecular evolution may be inadequate, thus limiting the accuracy of downstream phylogenetic analyses.


We present a new algorithm that automatically selects a partitioning scheme via the iterative division of the alignment into subsets of similar sites based on their rates of evolution. We compare this method to existing approaches using a wide range of empirical datasets, and show that it consistently leads to large increases in the fit of partitioned models of molecular evolution when measured using AICc and BIC scores. In doing so, we demonstrate that some related approaches to solving this problem may have been associated with a small but important bias.


Our method provides an alternative to traditional approaches to partitioning, such as dividing alignments by gene and codon position. Because our method is data-driven, it can be used to estimate partitioned models for all types of alignments, including those that are not amenable to traditional approaches to partitioning.

[Indexed for MEDLINE]
Free PMC Article

Supplemental Content

Full text links

Icon for BioMed Central Icon for PubMed Central
Loading ...
Support Center