Long perfect dinucleotide repeats are typical of vertebrates, show motif preferences and size convergence

Mol Biol Evol. 2004 Jul;21(7):1226-33. doi: 10.1093/molbev/msh108. Epub 2004 Mar 10.

Abstract

Microsatellites are simple sequence repeats (SSRs) showing complex patterns of length, motif sizes, motif sequences, and repeat perfection. We studied the structure of the dinucleotide SSR population at the genome level by analyzing assembled DNA sequence across species. Three dinucleotide populations were distinguished when SSR genome frequency was analyzed as a function of repeat length and repeat perfection. A population of low-perfection SSRs was identified, which is constituted by short repeats and represents the vast majority of genomic dinucleotide SSRs across eukaryotic genomes. In turn, the highly perfect repeats are 30 to 50 times less frequent and, in addition to short repeats, also contain a long repeat population that is uniquely represented in vertebrate species. Distinctive features of this population include the modal peak in the frequency distribution of repeat length and the strong preferential usage of the repeat motifs AC and AG. These results raise the hypothesis that the ability of carrying a distinct population of long, highly perfect dinucleotide repeats in the genome is a late acquisition in chordate evolution. Our analysis also suggests that different dinucleotide repeat populations have different dynamics and are likely to be underlined by different molecular mechanisms of generation and maintenance in the genome. Thus, these observations imply that caution should be taken in extrapolating results from studies on SSR mutability and on SSR phylogenetic comparisons that do not take into account the stratification of dinucelotide populations in the eukaryotic genome.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Animals
  • Dinucleotide Repeats / genetics*
  • Evolution, Molecular
  • Genome
  • Vertebrates / classification
  • Vertebrates / genetics*