Logo of ajhgLink to Publisher's site
Am J Hum Genet. Aug 2000; 67(2): 345–356.
Published online Jul 7, 2000. doi:  10.1086/303013
PMCID: PMC1287183

Repeat Polymorphisms within Gene Regions: Phenotypic and Evolutionary Implications

Abstract

We have developed an algorithm that predicted 11,265 potentially polymorphic tandem repeats within transcribed sequences. We estimate that 22% (2,207/9,717) of the annotated clusters within UniGene contain at least one potentially polymorphic locus. Our predictions were tested by allelotyping a panel of ~30 individuals for 5% of these regions, confirming polymorphism for more than half the loci tested. Our study indicates that tandem-repeat polymorphisms in genes are more common than is generally believed. Approximately 8% of these loci are within coding sequences and, if polymorphic, would result in frameshifts. Our catalogue of putative polymorphic repeats within transcribed sequences comprises a large set of potentially phenotypic or disease-causing loci. In addition, from the anomalous character of the repetitive sequences within unannotated clusters, we also conclude that the UniGene cluster count substantially overestimates the number of genes in the human genome. We hypothesize that polymorphisms in repeated sequences occur with some baseline distribution, on the basis of repeat homogeneity, size, and sequence composition, and that deviations from that distribution are indicative of the nature of selection pressure at that locus. We find evidence of selective maintenance of the ability of some genes to respond very rapidly, perhaps even on intragenerational timescales, to fluctuating selective pressures.

Introduction

The association between repeating microsatellite elements and polymorphism, caused by the expansion and contraction of the core repetitive unit via slipped-strand mispairing, uneven recombination, or some combination of both, has been well documented (Jeffreys et al. 1988; Zuliani and Hobbs 1990; Jakupciak and Wells 1999; Karthikeyan et al. 1999). The potential for such elements to cause disease has been highlighted by the linkage of several inherited neurological disorders to increases in the copy number of various trinucleotide repeats. For some of these diseases—such as Machado-Joseph disease (CAG repeat), Haw River syndrome (CAG repeat), Huntington disease (CAG repeat), and some forms of fragile-X syndrome (CGG repeat)—the repetitive element occurs within the coding sequence (Verkerk et al. 1991; Kawaguchi et al. 1994). For others—including Fredreich ataxia (GAA repeat), myotonic dystrophy (CAG repeat), and another form of fragile-X syndrome—the expanded repeats lie in the introns and 3′ and 5′ UTRs, respectively (Smits et al. 1993; Jansen et al. 1994; Bidichandani et al. 1998). There are excellent reviews available for those interested in the molecular basis for the instability of some of these repeats and how they contribute to disease (Wells 1996; Hancock and Santibanez-Koref 1998).

Building on the success of POMPOUS (polymorphic marker prediction of ubiquitous simple sequences), a program that we developed to identify tandem-repeat polymorphisms in genomic sequences as genetic markers (Fondon et al. 1998), we developed a program, REP-X, to increase the predictive accuracy for repeat polymorphisms in transcribed sequences. This is achieved by requiring perfect homogeneity of the repetitive unit, allowing shorter repeats, and including mononucleotide repeats. This generates fewer predictions, but ones with a higher expected probability of being polymorphic. POMPOUS and REP-X both were used to generate the initial set of predictions tested, so that the role of homogeneity in the sensitivity and specificity of polymorphism predictions could be analyzed.

We applied both of these informatics tools to the UniGene database of human cDNA sequences and selected, for further study, 146 genes predicted to harbor repeat polymorphisms of a variety of types. We tested our predictions by designing primers flanking the predicted polymorphisms, PCR amplifying, and then allelotyping them for either of two panels of individuals (see the Material and Methods section).

After establishing the predictive power of our program, we surveyed amino acid repeats in the human genome, for repeat polymorphisms in coding regions. This analysis supports several new conclusions with respect to the functional and evolutionary importance of polymorphic repeat sequences.

Material and Methods

Computational Tools

The REP-X and POMPOUS programs were run on a Hewlett Packard Exemplar supercomputer running SPP-UX 5.2. For selection of our allelotyping test set and validation of predictive accuracy, both codes were run on the annotated portion of the June 1999 release of the UniGene database of expressed human sequences. For all other analyses, REP-X was run on the January 2000 release of UniGene. The longest sequence with the fewest ambiguous bases in each UniGene cluster was used for analysis. REP-X identifies repeats by comparing a sequence to itself and identifying the longest similar sequence, for each position in the sequence. Mononucleotide A/T repeats within 5% of the end of the sequences were excluded from further analysis.

Polymorphism Prediction Criteria

For a stretch of repeated nucleotides, the minimum number of occurrences of the tandemly repeated unit necessary for it to be considered polymorphic depends on its size and homogeneity (Fondon et al. 1998). Fractional numbers of repeating units were rounded to the nearest integer. To be scored as polymorphic by POMPOUS, repeated DNA sequences had to be eight units long for dimers, whereas trimers, tetramers, pentamers to nonamers, and repeat units of lengths [gt-or-equal, slanted]10 required seven, six, five, and four repeats, respectively. POMPOUS permits up to 10% of the nucleotides within a repeat to deviate from the core repetitive unit. REP-X predictions included monomers, dimers, trimers, tetramers, pentamers to nonamers and repeats with unit size [gt-or-equal, slanted]10 or larger, for which the minimum numbers of repeated units were 12, 6.5, 5.5, 4.5, 3.5, and 2.5, respectively. REP-X permits no deviations from the core repetitive element. Statistics for intronic sequences were obtained by running REP-X on a subset of the GenBank primate database that includes annotated intron sequences for humans only.

Analysis of Peptide Repeats

Comparison of peptide repeats with nucleotide repeats was done by translating the UniGene nucleotide sequences into protein sequences by use of the annotated start and stop sites. Protein sequences were scanned for occurrences of four perfectly repeated residues, and these were then extended, permitting mismatches, provided that two consecutive matches immediately followed the mismatch. These repeats were then scored for polymorphic potential on the basis of REP-X parameters. For gene fragments with a stop site, but not a start site, annotated, open-reading frames (ORFs) uninterrupted by stop codons were chosen. Ambiguous ORFs were discarded.

Statistics for repeats within genomic DNA were obtained by running REP-X on 273 MB of high-throughput genome sequencing (HTGS) human DNA sequence obtained from the National Center for Biotechnology Information.

Allelotyping

For the loci chosen for allelotyping, primers were synthesized using our Mermade oligonucleotide synthesizer (Rayner et al. 1998). For 64 of the genes in our analysis, genomic DNA was extracted, by standard methods, from 30 Epstein-Barr virus–immortalized B lymphoblastoid cell lines of small-cell, non–small-cell, and adenocarcinoma lung cancer patients. For 40 of the genes analyzed, genomic DNA was obtained from the peripheral leukocytes of 36 individuals, 12 of whom had a diagnosis of hypertrophic cardiomyopathy.

Genomic DNA was amplified by PCR using the “touchdown” methodology, with an initial denaturation step at 95°C for 10 min. This was followed by 10 touchdown cycles of 30 s at 94°C, 30 s at 70°C (with a decrease, in the annealing temperature, by 1°C each cycle), and 30 s at 72°C. This was followed by 30 cycles of 30 s at 94°C, 30 s at 60°C, and 30 s at 72°C, with a final extension at 72°C for 10 min. DNA (~50–100 ng of genomic DNA) was amplified in 20-μl reaction volumes containing 50 mM KCl, 10 mM Tris (pH 8.3), 1.5 mM MgCl2, 200 μM each dNTP, 1 μM each primer, 0.5 U of Amplitaq Gold (PE Biosystems), and 2 μCi of [32P]-dCTP (Amersham). The samples were heat denatured, snap chilled, and run on a 6.8% polyacrylamide gel (acrylamide:bis acrylamide ratio 19:1) containing 10 M urea. The gels were dried and exposed overnight using BioMax film (Kodak).

For some genes, the PCR products were also sequenced. For better separation of the different alleles, the samples were run on a 0.5× mutation-detection enhancement gel. Shifted bands were excised from the gel, and DNA was eluted with distilled water and was reamplified using the original PCR primers. The PCR product was run on a 2% agarose gel and was purified by Genlute Agarose spin columns (Sigma). Automated bidirectional sequencing was performed by ABI 377 Dye Terminator cycle sequencing. Sequences were analyzed and were compared with the sequences downloaded from GenBank by DNAStar software (DNAStar).

Results

Polymorphism Predictions

Of the 11,265 putative polymorphic loci identified in coding regions and UTRs, 2,769 are in annotated UniGene clusters. This allowed us to categorize each locus as occurring within either a coding region or the 5′ or 3′ UTRs. A total of 146 of the 2,769 were chosen for analysis, on the basis of medical interest and as a representative sample of repeat types (table 1table 11;; GenBank accession numbers are listed in the Electronic-Database Information section), and 102 were successfully amplified within three attempts of primer design. Of these 102, 54 (53%) were verified to be polymorphic, defined as having at least two alleles among a sample of 60–74 chromosomes. Examples of results are shown in figure 1. The results for all loci tested are summarized in table 2, and, as anticipated, more-stringent homogeneity requirements resulted in higher polymorphism levels.

Figure  1
Two examples of REP-X prediction of polymorphisms. Top, Polymorphism in HVEC, encoding eight to nine polyglutamic acids residues located in the cytoplasmic portion of this transmembrane protein. Polyglutamic acid tracts have been associated with microtubule ...
Table 1
Genes Allelotyped for Polymorphisms[Note]
Table 1
Genes Allelotyped for Polymorphisms
Table 1 Continued
Genes Allelotyped for Polymorphisms[Note]
Table 2
Polymorphism Prediction Accuracy, by Gene Region

Effect of Homogeneity of Repeats on Polymorphism Levels

Increasing the homogeneity requirements for polymorphism predictions increased prediction accuracy. A total of 54 (53%) of 102 POMPOUS predictions were found to be polymorphic, whereas 50 (67%) of 75 REP-X predictions were found to be polymorphic (tables(tables1111 and and2).2). Within the polymorphisms tested, only four of 27 tandem repeats containing deviations from the canonical repeat unit were found to harbor polymorphisms. These results are consistent with the previously observed positive correlation between repeat homogeneity and polymorphism levels (Kunst et al. 1997).

Distribution of Repetitive Unit Lengths

Once the suitability of the new polymorphism criteria for intragenic sequences was established, REP-X was used to generate predictions of repeat polymorphisms in human, mouse, and rat cDNA sequences from the January 2000 UniGene release. Because sequences determined from 3′ UTRs are overrepresented in the UniGene database (Boguski and Schuler 1995), the frequencies of repeats in each of these regions are reported per nucleotide scanned, to permit direct comparison (table 3). Furthermore, only those entries for which reliable translational start and stop sites are known are used for comparisons of coding 5′ and 3′ UTR sequences.

Table 3
Predicted Repeat Polymorphisms, by Species and Location[Note]

That UTRs harbor more repetitive and polymorphic elements than are seen in coding sequences is expected; what is surprising is the number of repeat polymorphisms occurring within the coding regions of genes. If, as with the test set, two thirds of these predictions are correct, then ~3.7% of human genes contain at least one fairly common repeat polymorphism. Note that the high frequency of repeat polymorphisms for unannotated sequences in table 3 is due primarily to mononucleotide repeats (as shown in table 4).

Table 4
Percentage Distribution of Repeats—Unit Sizes Considered Potentially Polymorphic, by Annotated Region

Coding sequences, introns, and 3′ and 5′ UTRs have characteristic distributions of repetitive unit lengths (table 4). More than 92% of the predicted polymorphisms within coding sequences have unit lengths that are a multiple of 3, which would give protection against frameshift mutations (but see Ohno 1984). However, this does leave 0.5% (51) of the annotated data set entries with potentially frameshifting loci.

Peptide Repeats and DNA Repeats

Specific amino acids have an increased proclivity to form homopolymeric runs (Sumiyama et al. 1996). Because of the redundancy of the genetic code, it is not necessary for repeated tracts of amino acids to be encoded by homogeneous trinucleotide repeats (except for methionine and tryptophan homopolymers, which are very rare). We examined all peptide homopolymers of length greater than or equal to five, to determine polymorphic potential (fig. 2).

Figure  2
Amino acid repeats from transcribed UniGene entries, varying in both number and potential for polymorphism. Tandem repeats of at least five amino acids from annotated UniGene entries are shown, grouped first by the number of corresponding codons available ...

Although hydrophobic repeats tend to be located in amino-terminal signaling peptides (fig. 2), we found that some amino acids (Ile, Val, Met, Cys, Asn, Phe, Trp, and Tyr) are rarely, if at all, found repeated in human genes. Numerous potential reasons come to mind to explain these observations: in the case of Trp or Tyr, this is possible because their bulkiness could contribute to unstable structures, because of steric interference; in the case of Cys, it is likely because it could contribute to anomalous cross-linking. Other amino acids vary in the frequency with which they are encoded by potentially polymorphic elements, ranging from 6% for Arg to 62% for His. There is a tendency for homopolymeric runs of residues with more codons to have lower homogeneity in their encoding DNA (e.g., His and Gln > Thr and Gly > Arg and Ser), although there are some deviations from this trend (e.g., Leu > Pro and Gly > Lys).

Discussion

Polymorphism Profiles of Gene Regions

The 5′ and 3′ UTRs are known to harbor more genetic variation than is seen in coding sequences, and this is borne out in our results; relative decreases in both heterozygosity levels and number of alleles were observed for the coding-sequence polymorphisms. Such regional variances may help to identify in which region of a gene an unknown expressed-sequence tag (EST) is located. This variation is presumably due primarily to two factors: the presence, in the UTRs, of repetitive sequences with regulatory functions (e.g., mRNA stability) and, within coding sequences, selection against repeat polymorphisms. Unlike the 3′ UTR, the 5′ UTR exhibits a strong bias toward specific trinucleotide repeats (Stallings 1994). Of the 136 trinucleotide repeats identified in 5′ UTRs, 101 of them were CGG or CCG (data not shown), which have been shown to serve as binding sites for nuclear proteins (Richards et al. 1993; Stallings 1994). The 3′ UTR regions display a broad distribution of repeat-unit sizes but are biased toward mononucleotide repeats (poly-A tails within 5% of the sequence ends were excluded from the analysis). Intronic sequences were found to have a repeat-unit profile very similar to that of genomic DNA.

Approximately 90% of UniGene clusters lack annotation. Each class of transcribed sequence (5′, 3′, intronic, and coding) has a distinct distribution of repeat types, frequency, and unit-size distributions. These distributions may serve to help classify sequences of unknown origin. The difference in distribution of repeat types within these “unknown” sequences and “known” genes indicates that a significant proportion of UniGene clusters may not represent genuine genes. With respect to their repetitive character, these “unknowns,” in the aggregate, do not resemble transcribed DNA at all (as shown in table 4) and are explained if they contain a substantial fraction of cloning or sequencing artifacts. For example, unlike coding sequence, they are biased away from trinucleotide repeats, contain more monomers than are seen in genomic DNA, and contain a higher frequency of ALU sequences than is seen in any transcribed region. Attempts to infer coding sequences from these entries by using conventional “longest ORF” methods, as well as more-sophisticated algorithms (Burge and Karlin 1997), yielded low-confidence coding predictions and repeated amino acid profiles distinctly different than those of the annotated sequences (data not shown). This is possibly due to the fact that ~34,500 (37.5%) of UniGene clusters in this “build” contain only one sequence (UniGene Build #113), which represents either very rare transcripts or sequencing artifacts. In addition, 16,513 of these single-sequence clusters contain anomalous poly-A and poly-T tracts after exclusion of 3′ poly-A tails. These single-sequence clusters have been deposited in GenBank over the years from a variety of sources, and many of them are likely single-read sequences with low-quality base calls. If these 24,115 anomalous poly-A and poly-T repeats found within the 16,513 single-sequence clusters are discarded, this leaves 8,496 polymorphic loci predictions. Then, the repeat frequency and size distribution of the new set begins to more closely resemble some of the other categories in table 4, such as the 3′ UTR or genomic sequences. If we are to assume that at least these 16,513 clusters are not true genes, then the number of valid UniGene clusters becomes 75,706, and inferring the number of human genes from the number of UniGene clusters results in an 18% overestimation. When added to the 2,769 predictions from the annotated clusters, the result is a set of 11,265 loci most likely to be repeated regions in true genes.

Evolutionary and Phenotypic Implications

The redundancy of the genetic code renders it unnecessary to use a perfect DNA repeat to encode a peptide repeat, and it is biologically intuitive to assume that evolution will tend to exploit this redundancy, to fix the number of repeated elements in a gene at some optimal level. To find evidence of this, we sought to compare the homogeneity of all peptide repeats to “expected” levels, for each amino acid. Because of DNA’s natural propensity for self-similarity, random models of expected homogeneity are unsuitable (Tautz et al. 1986). By examining, within genomic DNA, the distributions of sequences that, if translated, would yield peptide repeats, we can estimate the expected levels of homogeneity for a peptide repeat in the absence of selective pressure on the encoded protein. Selection is clearly acting to influence the length of peptide repeats, so comparisons of homogeneity in genomic versus coding repeats are paired by “peptide” type and repeat length (fig. 2). As anticipated (Schmid et al. 1999), selection appears to depress polymorphism levels in repeated coding sequences, by peppering repeats of “optimal” length with synonymous substitutions (fig. 3B and and3C).3C). However, for some specific proteins (data not shown)—and even for some entire classes of peptide repeats (fig. 3D)—peptide repeats appear to be under positive selection for both elevated homogeneity and, thus, higher polymorphism; for these loci, silent substitutions might indeed be deadly. It has been noted that the length of repeats with higher homogeneity tend to diverge between species such as mice and humans (Alba et al. 1999). Furthermore, these peptide repeats are more common in eukaryotes than in prokaryotes, and such hypermutable elements may be a mechanism for more-rapid protein evolution (Marcotte et al. 1999).

Figure  3
Selection for or against allelic plasticity, reflected in repeat homogeneity. A, Homogeneity distributions for DNA encoding four repeated amino acids, including alanine. These are almost identical. Because, regardless of their homogeneity, repeats of ...

If allelic diversity is advantageous for a population, such as in genes involved in host-pathogen interactions, then balancing selection has little trouble maintaining multiple alleles, even when the alleles are relatively immutable. However, if the “optimal” number of repeated elements in a gene varies over time, the fittest allele may ultimately be the one with maximal plasticity. Selection may be actively maintaining this elevated plasticity, for some genes, by preserving high homogeneity in tandem repeats. We hypothesize that, for many of the highly homogeneous coding repeats predicted, by our algorithms, to be polymorphic, this may indeed be the case. And, given that the rate of expansion/contraction of pure, long tandem repeats is high enough that somatic mosaicism is commonplace (Leeflang et al. 1999), it is possible that there is considerable allelic diversity and, thus, potential competition and evolution among cells within an individual (somatic and/or germline).

The location and type of polymorphic repeats can facilitate the building of hypotheses about the potential functional roles that a gene region may have in physiology. Differences in tandem-repeat lengths in 5′ UTR promoter elements, for example, can lead to the modulation of the level of gene transcription, either directly (Shimajiri et al. 1999; Yamada et al. 2000) or indirectly (Mooser et al. 1995; Valenti et al. 1999), whereas AU-rich elements in the 3′ UTRs have been shown to affect mRNA stability (Gay and Babajko 2000). Given the variety of amino acid properties, there are a large number of ways in which polymorphic repeats in coding sequences could affect protein function. For example, in yeast, tandem peptide repeats are found to be overrepresented in certain functional classes of genes, such as transcription factors (Mar Alba et al. 1999).

The polymorphisms that we discovered in two of the genes in our subset—those for herpes viral entry protein C (HVEC; fig. 1, top) and sperm acrosomal protein (ACRP; fig. 1, bottom)—are useful examples of how the location of a polymorphism can be used to construct a hypothesis about its potential effect. In HVEC, the repetitive region encodes eight to nine glutamic acid residues located in the cytoplasmic portion of this transmembrane protein, whereas the ACRP gene has a polymorphic TG-dinucleotide repeat beginning near the 3′ end of its coding sequence, with stop codons in all three frames after 5, 7, or 14 residues (alternative translations are MCVCV, VCVCVRV, and CVCVCESVNAQVGI). ACRP is found in maturing and elongating spermatid heads and is suspected to be involved in penetration of the oocyte zona pellucida (Beaton et al. 1995). Although HVEC could be responsible for some of the known population variance in susceptibility to herpesvirus infection, ACRP could, similarly, have an impact on fertility.

Polymorphic repeats in genes can not only provide useful information about selection forces acting on a gene—and, thereby, aid in generating a hypothesis about the physiological role of the gene—but are also useful as extremely tightly linked markers for mapping studies. We have developed and tested a method optimized to find tandem-repeat polymorphisms in cDNA sequences, where they are considered rare (Nakamura et al. 1987). We have shown that there are a surprisingly large number of these elements undiscovered and uncharacterized in humans and rodents, some of which may provide functional information about the proteins that contain them, whereas others may provide important leads to potential disease-causing mechanisms. The details of the predicted polymorphisms in gene regions described here and in 11,265 others are available for download as a text file at The Garner Lab at UTSW.

Acknowledgments

This research was funded by Special Projects Open Research Environment grant P50CA70907, the Patrick O'Brien Montgomery Distinguished Chair, and the D.W. Reynolds Cardiovascular Clinical Research Center. We would like to thank Hewlett-Packard for the loan of an Exemplar supercomputer.

Electronic-Database Information

Accession numbers and URLs for data in this article are as follows:

Garner Lab at UTSW, The, http://pompous.swmed.edu
GenBank Overview, http://www.ncbi.nlm.nih.gov/Genbank/GenbankOverview.html (for human databases [accession numbers Y00285, D86407, M60052, AF047437, D83492, Y11525, AF032886, M60315, X82209, U60325, U49020, AF017789, M60052, AF060231, AF013956, D86550, AF042838, T62484, T63962, R42196, X78261, T70173, R12160, T47177, X55313, L08835, M64347, D14838, M55047, X70811, U36798, U36336, K02402, X04412, U75285, X78520, U68723, AF065482, M36089, L04489, AB015132, X15949, AF022654, U38276, S83513, U29589, X06374, AB002454, D16532, U92436, U38810, AL021155, X60188, U43292, M75866, M73980, U94333, U21858, D55655, U34962, U47741, U02031, U23752, AF002715, AF010403, S62539, AB005216, AB011792, X05299, M55514, L06147, X05299, AF053944, U68063, Y00764, X53416, AF008192, U13616, AF051946, U88153, T87413, R33865, T62835, T80553, T70304, T60175, L14837, Y00285, U17327, NM_004691, X02812, M14764, AB010710, M12783, U67784, U52152, Y00815, M74525, AF075292, U58334, X02812, L08488, and X17360])
Entrez Nucleotide, http://www.ncbi.nlm.nih.gov:80/entrez/query.fcgi?db=Nucleotide (for nucleotide sequences)

References

Alba MM, Santibanez-Koref MF, Hancock JM (1999) Conservation of polyglutamine tract size between mice and humans depends on codon interruption. Mol Biol Evol 16:1641–1644 [PubMed]
Beaton S, ten Have J, Cleary A, Bradley MP (1995) Cloning and partial characterization of the cDNA encoding the fox sperm protein FSA-Acr.1 with similarities to the SP-10 antigen. Mol Reprod Dev 40:242–252 [PubMed]
Bidichandani SI, Ashizawa T, Patel PI (1998) The GAA triplet-repeat expansion in Friedreich ataxia interferes with transcription and may be associated with an unusual DNA structure. Am J Hum Genet 62:111–121 [PMC free article] [PubMed]
Boguski MS, Schuler GD (1995) ESTablishing a human transcript map. Nat Genet 10:369–371 [PubMed]
Burge C, Karlin S (1997) Prediction of complete gene structures in human genomic DNA. J Mol Biol 268:78–94 [PubMed]
Deininger PL, Batzer MA (1999) Alu repeats and human disease. Mol Genet Metab 67:183–193 [PubMed]
Fondon JW III, Mele GM, Brezinschek RI, Cummings D, Pande A, Wren J, O'Brien KM, et al (1998) Computerized polymorphic marker identification: experimental validation and a predicted human polymorphism catalog. Proc Natl Acad Sci USA 95:7514–7519 [PMC free article] [PubMed]
Gay E, Babajko S (2000) AUUUA sequences compromise human insulin-like growth factor binding protein-1 mRNA stability. Biochem Biophys Res Commun 267:509–515 [PubMed]
Hancock JM, Santibanez-Koref MF (1998) Trinucleotide expansion diseases in the context of micro- and minisatellite evolution, Hammersmith Hospital, April 1–3, 1998. EMBO J 17:5521–5524 [PMC free article] [PubMed]
Jakupciak JP, Wells RD (1999) Genetic instabilities in (CTG.CAG) repeats occur by recombination. J Biol Chem 274:23468–23479 [PubMed]
Jansen G, Willems P, Coerwinkel M, Nillesen W, Smeets H, Vits L, Howeler C, et al (1994) Gonosomal mosaicism in myotonic dystrophy patients: involvement of mitotic events in (CTG)n repeat variation and selection against extreme expansion in sperm. Am J Hum Genet 54:575–585 [PMC free article] [PubMed]
Jeffreys AJ, Royle NJ, Wilson V, Wong Z (1988) Spontaneous mutation rates to new length alleles at tandem-repetitive hypervariable loci in human DNA. Nature 332:278–281 [PubMed]
Karthikeyan G, Chary KV, Rao BJ (1999) Fold-back structures at the distal end influence DNA slippage at the proximal end during mononucleotide repeat expansions. Nucleic Acids Res 27:3851–3858 [PMC free article] [PubMed]
Kawaguchi Y, Okamoto T, Taniwaki M, Aizawa M, Inoue M, Katayama S, Kawakami H, et al (1994) CAG expansions in a novel gene for Machado-Joseph disease at chromosome 14q32.1. Nat Genet 8:221–228 [PubMed]
Kunst CB, Leeflang EP, Iber JC, Arnheim N, Warren ST (1997) The effect of FMR1 CGG repeat interruptions on mutation frequency as measured by sperm typing. J Med Genet 34:627–631 [PMC free article] [PubMed]
Leeflang EP, Tavare S, Marjoram P, Neal CO, Srinidhi J, MacFarlane H, MacDonald ME, et al (1999) Analysis of germline mutation spectra at the Huntington's disease locus supports a mitotic mutation mechanism. Hum Mol Genet 8:173–183 [erratum: Hum Mol Genet 8:717] [PubMed]
Mar Alba M, Santibanez-Koref MF, Hancock JM (1999) Amino acid reiterations in yeast are overrepresented in particular classes of proteins and show evidence of a slippage-like mutational process. J Mol Evol 49:789–797 [PubMed]
Marcotte EM, Pellegrini M, Yeates TO, Eisenberg D (1999) A census of protein repeats. J Mol Biol 293:151–160 [PubMed]
Mooser V, Mancini FP, Bopp S, Petho-Schramm A, Guerra R, Boerwinkle E, Muller HJ, et al (1995) Sequence polymorphisms in the apo(a) gene associated with specific levels of Lp(a) in plasma. Hum Mol Genet 4:173–181 [PubMed]
Nakamura Y, Leppert M, O'Connell P, Wolff R, Holm T, Culver M, Martin C, et al (1987) Variable number of tandem repeat (VNTR) markers for human gene mapping. Science 235:1616–1622 [PubMed]
Ohno S (1984) Repeats of base oligomers as the primordial coding sequences of the primeval earth and their vestiges in modern genes. J Mol Evol 20:313–321 [PubMed]
Rayner S, Brignac S, Bumeister R, Belosludtsev Y, Ward T, Grant O, O'Brien K, et al (1998) MerMade: an oligodeoxyribonucleotide synthesizer for high throughput oligonucleotide production in dual 96-well plates. Genome Res 8:741–747 [PMC free article] [PubMed]
Richards RI, Holman K, Yu S, Sutherland GR (1993) Fragile X syndrome unstable element, p(CCG)n, and other simple tandem repeat sequences are binding sites for specific nuclear proteins. Hum Mol Genet 2:1429–1435 [PubMed]
Schmid KJ, Nigro L, Aquadro CF, Tautz D (1999) Large number of replacement polymorphisms in rapidly evolving genes of drosophila: implications for genome-wide surveys of dna polymorphism. Genetics 153:1717–1729 [PMC free article] [PubMed]
Shimajiri S, Arima N, Tanimoto A, Murata Y, Hamada T, Wang KY, Sasaguri Y (1999) Shortened microsatellite d(CA)21 sequence down-regulates promoter activity of matrix metalloproteinase 9 gene. FEBS Lett 455:70–74 [PubMed]
Smits AP, Dreesen JC, Post JG, Smeets DF, de Die-Smulders C, Spaans-van der Bijl T, Govaerts LC, et al (1993) The fragile X syndrome: no evidence for any recent mutations. J Med Genet 30:94–96 [PMC free article] [PubMed]
Stallings RL (1994) Distribution of trinucleotide microsatellites in different categories of mammalian genomic sequence: implications for human genetic diseases. Genomics 21:116–121 [PubMed]
Sumiyama K, Washio-Watanabe K, Saitou N, Hayakawa T, Ueda S (1996) Class III POU genes: generation of homopolymeric amino acid repeats under GC pressure in mammals. J Mol Evol 43:170–178 [PubMed]
Tautz D, Trick M, Dover GA (1986) Cryptic simplicity in DNA is a major source of genetic variation. Nature 322:652–656 [PubMed]
Valenti K, Aveynier E, Leaute S, Laporte F, Hadjian AJ (1999) Contribution of apolipoprotein(a) size, pentanucleotide TTTTA repeat and C/T(+93) polymorphisms of the apo(a) gene to regulation of lipoprotein(a) plasma levels in a population of young European Caucasians. Atherosclerosis 147:17–24 [PubMed]
Verkerk AJ, Pieretti M, Sutcliffe JS, Fu YH, Kuhl DP, Pizzuti A, Reiner O, et al (1991) Identification of a gene (FMR-1) containing a CGG repeat coincident with a breakpoint cluster region exhibiting length variation in fragile X syndrome. Cell 65:905–914 [PubMed]
Wells RD (1996) Molecular basis of genetic instability of triplet repeats. J Biol Chem 271:2875–2878 [PubMed]
Yamada N, Yamaya M, Okinaga S, Nakayama K, Sekizawa K, Shibahara S, Sasaki H (2000) Microsatellite polymorphism in the heme oxygenase-1 gene promoter is associated with susceptibility to emphysema. Am J Hum Genet 66:187–195 [PMC free article] [PubMed]
Zuliani G, Hobbs HH (1990) A high frequency of length polymorphisms in repeated sequences adjacent to Alu sequences. Am J Hum Genet 46:963–969 [PMC free article] [PubMed]

Articles from American Journal of Human Genetics are provided here courtesy of American Society of Human Genetics
PubReader format: click here to try

Formats:

Related citations in PubMed

See reviews...See all...

Cited by other articles in PMC

See all...

Links

Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...