Send to

Choose Destination
See comment in PubMed Commons below
Comput Chem. 1996 Mar;20(1):41-8.

Evolution of simple sequence repeats.

Author information

Theoretical Biology and Biophysics MS K710, Los Alamos National Laboratory, NM 87545, USA.


Simple Sequence Repeats (SSRs) are common and frequently polymorphic in eukaryote DNA. Many are subject to high rates of length mutation in which a gain or loss of one repeat unit is most often observed. Can the observed abundances and their length distributions be explained as the result of an unbiased random walk, starting from some initial repeat length? In order to address this question, we have considered two models for an unbiased random walk on the integers, n (n0 < or = n). The first is a continuous time process (Birth and Death Model or BDM) in which the probability of a transition to n + 1 or n - 1 is lambda k, with k = n - n0 + 1 per unit time. The second is a discrete time model (Random Walk Model or RWM), in which a transition is made at each time step, either to n - 1 or to n + 1. In each case the walks start at length n0, with new walks being generated at a steady rate, S, the source rate, determined by a base substitution rate of mutation from neighboring sequences. Each walk terminates whenever n reaches n0 - 1 or at some time, T, which reflects the contamination of pure repeat sequences by other mutations that remove them from consideration, either because they fail to satisfy the criteria for repeat selection from some database or because they can no longer undergo efficient length mutations. For infinite T, the results are particularly simple for N(k), the expected number of repeats of length n = k + n0 - 1, being, for BDM, N(k) = S/k lambda, and for RWM, N(k) = 2S. In each case, there is a cut-off value of k for finite T, namely k = T lambda ln2 for BDM and k = 0.57 square root of T for RWM; for larger values of k, N(k) becomes rapidly smaller than the infinite time limit. We argue that these results may be compared with SSR length distributions averaged over many loci, but not for a particular locus, for which founder effects are important. For the data of Beckmann & Weber [(1992), Genomics 12, 627] on GT.AC repeats in the human, each model gives a reasonable fit to the data, with the source at two repeat units (n0 = 2). Both the absolute number of loci and their length distribution are well represented.

[Indexed for MEDLINE]
PubMed Commons home

PubMed Commons

How to join PubMed Commons

    Supplemental Content

    Full text links

    Icon for Elsevier Science
    Loading ...
    Support Center