Towards optimal distance functions for stochastic substitution models

J Theor Biol. 2009 Sep 21;260(2):294-307. doi: 10.1016/j.jtbi.2009.05.028. Epub 2009 Jun 6.

Abstract

Distance based reconstruction methods of phylogenetic trees consist of two independent parts: first, inter-species distances are inferred assuming some stochastic model of sequence evolution; then the inferred distances are used to construct a tree. In this paper we concentrate on the task of inter-species distance estimation. Specifically, we characterize the family of valid distance functions for the assumed substitution model and show that deliberate selection of distance function significantly improves the accuracy of distance estimates and, consequently, also improves the accuracy of the reconstructed tree. Our contribution consists of three parts: first, we present a general framework for constructing families of additive distance functions for stochastic evolutionary models. Then, we present a method for selecting (near) optimal distance functions, and we conclude by presenting simulation results which support our theoretical analysis.

MeSH terms

  • Base Sequence
  • DNA / genetics
  • Evolution, Molecular
  • Markov Chains
  • Models, Genetic*
  • Phylogeny*
  • Stochastic Processes

Substances

  • DNA