Influence function for robust phylogenetic reconstructions

Mol Biol Evol. 2008 May;25(5):869-73. doi: 10.1093/molbev/msn030. Epub 2008 Feb 7.

Abstract

Based on the computation of the influence function, a tool to measure the impact of each piece of sampled data on the statistical inference of a parameter, we propose to analyze the support of the maximum-likelihood (ML) tree for each site. We provide a new tool for filtering data sets (nucleotides, amino acids, and others) in the context of ML phylogenetic reconstructions. Because different sites support different phylogenic topologies in different ways, outlier sites, that is, sites with a very negative influence value, are important: they can drastically change the topology resulting from the statistical inference. Therefore, these outlier sites must be clearly identified and their effects accounted for before drawing biological conclusions from the inferred tree. A matrix containing 158 fungal terminals all belonging to Chytridiomycota, Zygomycota, and Glomeromycota is analyzed. We show that removing the strongest outlier from the analysis strikingly modifies the ML topology, with a loss of as many as 20% of the internal nodes. As a result, estimating the topology on the filtered data set results in a topology with enhanced bootstrap support. From this analysis, the polyphyletic status of the fungal phyla Chytridiomycota and Zygomycota is reinforced, suggesting the necessity of revisiting the systematics of these fungal groups. We show the ability of influence function to produce new evolution hypotheses.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Biological Evolution*
  • Data Interpretation, Statistical
  • Fungi / classification
  • Fungi / genetics
  • Likelihood Functions
  • Phylogeny*
  • Software
  • Statistics as Topic*