A method for accurate inference of population size from serially sampled genealogies distorted by selection

Mol Biol Evol. 2011 Nov;28(11):3171-81. doi: 10.1093/molbev/msr153. Epub 2011 Jun 16.

Abstract

The serial coalescent extends traditional coalescent theory to include genealogies in which not all individuals were sampled at the same time. Inference in this framework is powerful because population size and evolutionary rate may be estimated independently. However, when the sequences in question are affected by selection acting at many sites, the genealogies may differ significantly from their neutral expectation, and inference of demographic parameters may become inaccurate. I demonstrate that this inaccuracy is severe when the mutation rate and strength of selection are jointly large, and I develop a new likelihood calculation that, while approximate, improves the accuracy of population size estimates. When used in a Bayesian parameter estimation context, the new calculation allows for estimation of the shape of the pairwise coalescent rate function and can be used to detect the presence of selection acting at many sites in a sequence. Using the new method, I investigate two sets of dengue virus sequences from Puerto Rico and Thailand, and show that both genealogies are likely to have been distorted by selection.

Publication types

  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • Bayes Theorem
  • Dengue Virus / genetics
  • Evolution, Molecular*
  • Genetics, Population / methods*
  • Likelihood Functions
  • Models, Genetic*
  • Mutation / genetics
  • Population Density*
  • Puerto Rico
  • Selection, Genetic*
  • Thailand