Phylogenetic Analysis Reveals That ERVs "Die Young" but HERV-H Is Unusually Conserved

PLoS Comput Biol. 2016 Jun 13;12(6):e1004964. doi: 10.1371/journal.pcbi.1004964. eCollection 2016 Jun.

Abstract

About 8% of the human genome is made up of endogenous retroviruses (ERVs). Though most human endogenous retroviruses (HERVs) are thought to be irrelevant to our biology notable exceptions include members of the HERV-H family that are necessary for the correct functioning of stem cells. ERVs are commonly found in two forms, the full-length proviral form, and the more numerous solo-LTR form, thought to result from homologous recombination events. Here we introduce a phylogenetic framework to study ERV insertion and solo-LTR formation. We then apply the framework to site patterns sampled from a set of long alignments covering six primate genomes. Studying six categories of ERVs we quantitatively recapitulate patterns of insertional activity that are usually described in qualitative terms in the literature. A slowdown in most ERV groups is observed but we suggest that HERV-K activity may have increased in humans since they diverged from chimpanzees. We find that the rate of solo-LTR formation decreases rapidly as a function of ERV age and that an age dependent model of solo-LTR formation describes the history of ERVs more accurately than the commonly used exponential decay model. We also demonstrate that HERV-H loci are markedly less likely to form solo-LTRs than ERVs from other families. We conclude that the slower dynamics of HERV-H suggest a host role for the internal regions of these exapted elements and posit that in future it will be possible to use the relationship between full-length proviruses and solo-LTRs to help identify large scale co-options in distant vertebrate genomes.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Animals
  • Base Sequence
  • Conserved Sequence
  • Endogenous Retroviruses / genetics*
  • Evolution, Molecular
  • Genome, Human / genetics*
  • Humans
  • Models, Genetic*
  • Phylogeny
  • Primates / genetics

Grants and funding

PG was supported by the Engineering and Physical Sciences Research Council and AK is funded by the Royal Society. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.