Wide variations in neighbor-dependent substitution rates

J Mol Biol. 1994 Mar 4;236(4):1022-33. doi: 10.1016/0022-2836(94)90009-4.

Abstract

The pattern of 20,200 point substitutions in the 16 unique neighbor-pair environments has been determined from aligned gene/pseudogene sequences in the current database of human DNA sequences. Substitution rates, representing averages over those for different regions of the genome, are distributed over a 60-fold range with strong biases in particular neighbor-pair environments. The rates for substitutions involving the CG doublet are the most rapid overall, where changes of the C.G pair vary over a tenfold range depending on the type of substitution and the 5' neighbor-pair. In general, the rates are fastest in alternating purine-pyrimidine sequences and slowest in purine.pyrimidine tracts, suggesting that the frequencies of one or both key molecular misadventures that can occur during replication, dNTP misinsertion and transient misalignment, may be associated with structural alternations and flexibility of the backbone. By contrast, purine.pyrimidine tracts are less flexible, less prone to substitution, and therefore their proportions accumulate in sequences over time. Characteristic biases of the content and arrangement of oligonucleotide strings or tuples in all sequence elements, but particularly in non-coding regions, appear to be due to the pattern of different neighbor-dependent substitution rates. Computer simulations of numerous replicative cycles have been carried out with substitutions occurring on the same schedule found in this study for pseudogenes. Statistical analyses of tuple frequencies at periodic intervals during the simulation experiment indicate that sequences slowly change in lexical complexity toward a quasi-equilibrium state that corresponds to that for introns.

Publication types

  • Research Support, Non-U.S. Gov't
  • Research Support, U.S. Gov't, P.H.S.

MeSH terms

  • Base Composition
  • Base Sequence
  • Computer Simulation
  • DNA / genetics*
  • Databases, Factual
  • Genes
  • Humans
  • Models, Genetic
  • Point Mutation*
  • Pseudogenes
  • Sequence Deletion
  • Software Design

Substances

  • DNA