Analysis of the overdispersed clock in the short-term evolution of hepatitis C virus: Using the E1/E2 gene sequences to infer infection dates in a single source outbreak

Mol Biol Evol. 2006 Jun;23(6):1242-53. doi: 10.1093/molbev/msk012. Epub 2006 Apr 3.

Abstract

The assumption of a molecular clock for dating events from sequence information is often frustrated by the presence of heterogeneity among evolutionary rates due, among other factors, to positively selected sites. In this work, our goal is to explore methods to estimate infection dates from sequence analysis. One such method, based on site stripping for clock detection, was proposed to unravel the clocklike molecular evolution in sequences showing high variability of evolutionary rates and in the presence of positive selection. Other alternatives imply accommodating heterogeneity in evolutionary rates at various levels, without eliminating any information from the data. Here we present the analysis of a data set of hepatitis C virus (HCV) sequences from 24 patients infected by a single individual with known dates of infection. We first used a simple criterion of relative substitution rate for site removal prior to a regression analysis. Time was regressed on maximum likelihood pairwise evolutionary distances between the sequences sampled from the source individual and infected patients. We show that it is indeed the fastest evolving sites that disturb the molecular clock and that these sites correspond to positively selected codons. The high computational efficiency of the regression analysis allowed us to compare the site-stripping scheme with random removal of sites. We demonstrate that removing the fast-evolving sites significantly increases the accuracy of estimation of infection times based on a single substitution rate. However, the time-of-infection estimations improved substantially when a more sophisticated and computationally demanding Bayesian method was used. This method was used with the same data set but keeping all the sequence positions in the analysis. Consequently, despite the distortion introduced by positive selection on evolutionary rates, it is possible to obtain quite accurate estimates of infection dates, a result of especial relevance for molecular epidemiology studies.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Bayes Theorem
  • Disease Outbreaks*
  • Evolution, Molecular*
  • Hepacivirus / genetics*
  • Hepatitis C / epidemiology*
  • Hepatitis C / genetics
  • Humans
  • Molecular Epidemiology
  • Phylogeny
  • RNA, Viral / genetics
  • Viral Envelope Proteins / genetics*

Substances

  • E1 protein, Hepatitis C virus
  • RNA, Viral
  • Viral Envelope Proteins
  • glycoprotein E2, Hepatitis C virus