Quantifying the impact of dependent evolution among sites in phylogenetic inference

Syst Biol. 2011 Jan;60(1):60-73. doi: 10.1093/sysbio/syq074. Epub 2010 Nov 15.

Abstract

Nearly all commonly used methods of phylogenetic inference assume that characters in an alignment evolve independently of one another. This assumption is attractive for simplicity and computational tractability but is not biologically reasonable for RNAs and proteins that have secondary and tertiary structures. Here, we simulate RNA and protein-coding DNA sequence data under a general model of dependence in order to assess the robustness of traditional methods of phylogenetic inference to violation of the assumption of independence among sites. We find that the accuracy of independence-assuming methods is reduced by the dependence among sites; for proteins this reduction is relatively mild, but for RNA this reduction may be substantial. We introduce the concept of effective sequence length and its utility for considering information content in phylogenetics.

Publication types

  • Research Support, N.I.H., Extramural

MeSH terms

  • Animals
  • Base Sequence
  • Bombyx / genetics
  • Computer Simulation
  • DNA / chemistry
  • DNA / genetics
  • Escherichia coli / genetics
  • Evolution, Molecular*
  • Models, Genetic*
  • Myoglobin / genetics
  • Nucleic Acid Conformation
  • Phylogeny*
  • Protein Conformation
  • Proteins / chemistry*
  • Proteins / genetics*
  • RNA / chemistry*
  • RNA / genetics*
  • Sequence Alignment / methods
  • Sperm Whale / genetics

Substances

  • Myoglobin
  • Proteins
  • RNA
  • DNA