Logo of genoresGenome ResearchCSHL PressJournal HomeSubscriptionseTOC AlertsBioSupplyNet
Genome Res. 2008 Oct; 18(10): 1545–1553.
PMCID: PMC2556271

Abundance and length of simple repeats in vertebrate genomes are determined by their structural properties

Abstract

Microsatellites are abundant in vertebrate genomes, but their sequence representation and length distributions vary greatly within each family of repeats (e.g., tetranucleotides). Biophysical studies of 82 synthetic single-stranded oligonucleotides comprising all tetra- and trinucleotide repeats revealed an inverse correlation between the stability of folded-back hairpin and quadruplex structures and the sequence representation for repeats ≥30 bp in length in nine vertebrate genomes. Alternatively, the predicted energies of base-stacking interactions correlated directly with the longest length distributions in vertebrate genomes. Genome-wide analyses indicated that unstable sequences, such as CAG:CTG and CCG:CGG, were over-represented in coding regions and that micro/minisatellites were recruited in genes involved in transcription and signaling pathways, particularly in the nervous system. Microsatellite instability (MSI) is a hallmark of cancer, and length polymorphism within genes can confer susceptibility to inherited disease. Sequences that manifest the highest MSI values also displayed the strongest base-stacking interactions; analyses of 62 tri- and tetranucleotide repeat-containing genes associated with human genetic disease revealed enrichments similar to those noted for micro/minisatellite-containing genes. We conclude that DNA structure and base-stacking determined the number and length distributions of microsatellite repeats in vertebrate genomes over evolutionary time and that micro/minisatellites have been recruited to participate in both gene and protein function.

DNA microsatellites, tandem arrays of simple repeats such as mono-, di-, tri-, and tetranucleotides, are common in eukaryotic genomes (Lander et al. 2001; Waterston et al. 2002; Sharma et al. 2007). However, different types of sequences display widely variable abundances within each class (such as the trinucleotide and tetranucleotide repeats, triNRs and tetraNRs, respectively) (Subramanian et al. 2003), particularly at lengths ≥30 nt. For example, >16,000 tracts of mononucleotides comprised of ≥30 As or Ts are present in the human genome, but only seven analogous tracts of Gs or Cs are found (Bacolla et al. 2006).

5′CpG3′-(CpG)-containing repeats are generally rare, suggesting that cytosine methylation and subsequent deamination leading to T:A transitions from 5mC:G base pairs (Walsh and Xu 2006), a frequent cause of human gene mutation (Mort et al. 2008), may have been involved (Kelkar et al. 2008). However, because the rates of cytosine methylation (Bacolla et al. 2001) and 5mC deamination (Lindahl and Nyberg 1974; Frederico et al. 1993) decrease with increasing DNA stability, CpG-containing triNR and tetraNR are expected to display varying transition rates according to their C+G content (Elango et al. 2008). These relationships remain to be clarified. Other repeats,6 such as ATGC, AGCT, ACCC, and ACT, are also rare, although they do not contain the CpG step. Therefore, the underlying mechanisms of biased sequence representation remain poorly understood.

Another characteristic of simple repeats is the lengths attained genome-wide by specific sequences such as AAG repeats, which are consistently the longest by quite a margin within the triNR family (Clark et al. 2006). This behavior is also enigmatic.

The number of repeat units is polymorphic at certain loci, and this variability can play specific roles both in physiology and pathology (for review, see Supplemental Table 1). Also, more than 20 neurological diseases (Supplemental Table 1) are caused by the expansion of triNRs within the coding or untranslated regions of genes (for review, see Wells and Ashizawa 2006; Mirkin 2007; Orr and Zoghbi 2007; Kovtun and McMurray 2008). Among these, recessive Friedreich ataxia (FA) is atypical since, contrary to all other triNR diseases, the expanded GAA:TTC tract (up to >1000 repeat copies) in the first intron of the FXN gene is somatically stable in the transmitting parents (De Biase et al. 2006; Pandolfo 2006). The reasons for this behavior are unclear. Elevated microsatellite alterations at selected tetraNRs (EMAST) have been noted in mismatch repair-proficient cancers of the respiratory tract, skin, and bladder, and they involve preferentially biased purine:pyrimidine (R:Y) sequences (for review, see Supplemental Table 2). Hence, since length instabilities are believed to arise from unrepaired bulges during DNA polymerase slippage over the repeats (Sammalkorpi et al. 2007), R:Y tracts appear to selectively escape repair. However, the underlying mechanisms remain speculative (Yang 2006).

Herein, we show that the relative abundances of triNR and tetraNR sequences in nine vertebrate genomes are inversely proportional to the capacity of their single strands to fold back into hairpin or quadruplex structures of varying thermodynamic stabilities. The sequences that form the longest tracts comprise R:Y-rich sequences, and their length distributions in the human genome correlate directly with the strength of base-stacking interactions within the R-rich strand. Hence, certain DNA secondary structures have prevented the accumulation of specific repeating sequences in genomes over evolutionary time, whereas strong base-stacking interactions have favored their expansion. These results are discussed in the context of human disease-associated repeat polymorphism and genome-wide analyses, which suggest the recruitment of micro/minisatellites by transcription factors and other regulatory genes to perform specific functions, particularly in the nervous system.

Results

Certain long tetraNRs are absent from the human genome

The sequence representation of tetraNR sequences in the human genome was analyzed by comparing tracts comprising ≥8 identical units (Table 1). For some nucleotide combinations, the number of tracts exceeded 5000 copies, whereas for others, no tracts were found (Table 1). This diversity, spanning nearly four orders of magnitude, was intriguing. Based on molecular modeling (Supplemental material), we postulated that the range of tetraNR abundance values might be related to the capacity of certain sequences to fold back upon themselves in the single-stranded state to form quasi-stable non-B DNA conformations, either during DNA replication or transcription. The more stable folded conformations could serve as substrates for DNA repair and might therefore be excised (Wojciechowska et al. 2006; Mirkin 2007). Alternatively, these structures could be bypassed by the DNA replication complex (Iyer et al. 2000; Wells et al. 2005; Mirkin 2007; Zahra et al. 2007). Hence, those sequences that adopted the more stable non-B DNA conformations would tend to be lost over evolutionary time and consequently display reduced length distributions. In contrast, those sequences unable to adopt stable folded conformations would tend to survive, giving rise to extended length distributions.

Table 1.
TetraNR abundance in the human genome for tracts with ≥8 units, Ta values, and DNA structure

Annealing temperature determinations

To assess whether hairpin stability correlated with tetraNR abundance in the human genome, the annealing temperature (Ta) values of 62 single-stranded synthetic oligonucleotides, 36 nt in length and corresponding to all tetraNR duplex DNAs (Table 1), were obtained by temperature-dependent absorption spectroscopy (TDAS). Thirty-six oligonucleotides displayed Ta values ranging from 86.7°C to 16.7°C. The highest Ta values were observed for the self-complementary sequences, that is, d(ACGT)9, d(ATCG)9, d(ATGC)9, and d(AGCT)9, with the exception of d(CCGG)9, for which no cooperative transition was observed. These genomic tetraNRs were either extremely rare (≤2 occurrences) or absent. In contrast, the self-complementary d(AATT)9 molecule displayed a much lower Ta value (53.3°C), which was also lower than for molecules whose hairpins were stabilized by Watson-Crick CG:CG or GC:GC doublets, such as d(AGCG)9, d(GCGT)9, d(ACGG)9, d(GGCT)9, and d(AGGC)9. Hence, adjacent CG:CG and GC:GC base pairs contributed the most stability to the tetraNR hairpins.

For the d(CCCT)9 oligonucleotide, a strong hysteresis effect was observed, with melting temperature (Tm) values of ∼80.0°C and Ta values of ∼30.0°C. Additional TDAS determinations (Supplemental material) indicated that this behavior was due to slow hairpin formation by partially protonated cytosine pairs (C+:C). Therefore, only the Ta value at pH 7.0 was considered for further analyses.

Six oligonucleotides (Table 1, S) displayed a nearly linear decrease in absorbance with decreasing temperature, indicative of base-stacking within single-stranded helices, but little or no hydrogen bonding. The remaining 20 oligonucleotides manifested no temperature-dependent changes in absorbance (Table 1, N), implying that these sequences existed as either random, single-stranded coils or very stable hydrogen-bonded structures with Ta and Tm values >94°C. To resolve these ambiguities, further TDAS and CD investigations were conducted (Supplemental material; Supplemental Fig. 1), which revealed that the d(CCGG)9 and d(XGGG)9 (X = C, T, or A) oligonucleotides formed highly stable hairpin and quadruplex structures under physiologic K+ and Mg2+ concentrations, respectively. These four sequences were rare in the human genome, and the numbers of quadruplex-forming repeats correlated inversely with structure stability (Rachwal et al. 2007). In contrast, the 22 sequences for which no folded-back structures could be revealed were among the most abundant genomic tetraNRs (Table 1).

In summary, the highest Ta values were found for genomic tracts that were only rarely represented, whereas low or absent Ta values were observed for the tetraNRs that were most abundant in the reference human genome sequence, thereby confirming the predictions made through modeling. Hence, hairpin and quadruplex stabilities are a robust predictor of tetraNR abundance in the human genome.

Ta and tetraNR abundance are inversely correlated

In order to determine whether a correlation existed between tetraNR abundance and hairpin/quadruplex formation, we plotted the highest Ta value for each pair of forward/reverse tetraNR sequences versus the log of the tetraNR abundance in the human genome (Table 1; see Supplemental material for a rationale of this analysis). A linear and inverse correlation was found (r = −0.607, P = 0.028), a result that established a clear relationship between genomic tetraNR abundance and non-B-DNA-structure formation. To verify whether the correlation was solely due to the low number of CpG-containing tetraNRs, we estimated the numbers of CpG-containing tetraNRs ≥8 units as if methylation-mediated and deamination-dependent C:G → T:A transitions had not occurred (Supplemental material). The correlation remained significant (P = 0.007).

Next, we reasoned that if the Ta versus tetraNR abundance relationship were to be dictated solely by DNA structural properties, analogous trends would be evident in other genomes despite differences in DNA repair systems. Analyses of eight additional vertebrate genomes indicated that in spite of large variations in the absolute numbers of tetraNRs of ≥8 units (144–126,473), both the rank order and the relative abundance of the different sequences were similar to those exhibited by the human genome. Moreover, the relative abundances for any given sequence remained unchanged irrespective of whether the results for one or the other chromosomal strand were analyzed. This suggested that genomic tetraNR abundance was critically dependent on DNA sequence composition. The relative tetraNR abundances for the nine combined vertebrate genomes correlated strongly with the Ta values (r = −0.706, P = 0.0001, α0.05 = 0.98) (Fig. 1). Significantly, no tracts ≥8 units were found for CCGG, the most thermostable hairpin-forming tetraNR sequence, in any of the nine genomes. The r2 values decreased hyperbolically when shorter tetraNR lengths were considered (Supplemental Fig. 2); nevertheless, all regressions remained significant (P = 0.03 for tract lengths ≥3 units). This trend is in agreement with the dependence of base-paired stem–loop stability upon DNA length (Supplemental Fig. 3) and suggested that (1) longer tracts were deleted more efficiently than shorter tracts; and (2) some shorter tracts might have arisen from longer tracts as a result of DNA replication errors. Finally, no correlation (r = −0.21, P = 0.32) was found for tetraNRs ≥4 units in four nonvertebrate genomes that exhibited shorter length distributions than those noted in vertebrate genomes (Methods).

Figure 1.
Correlation between Ta and tetraNR abundance in nine vertebrate genomes. Each of the 33 symbols represents one of the unique tetraNR sequences listed in Table 1 (column 2). For each sequence, the Ta (x-axis) is given by the highest value found for either ...

In summary, the stability of DNA secondary structures was a key determinant for the abundance of tetraNRs in vertebrate genomes and acted as an important modulator of sequence instability over evolutionary time.

Ta determinations in triNRs

Does hairpin formation determine the abundance of other simple repeats? To address this question, we considered the distribution of triNRs ≥10 units in the nine vertebrate genomes (Table 2). The numbers varied, with the ACG sequence being the rarest and the AAT sequence being the most abundant, hence displaying the same characteristics observed for the tetraNRs. The Ta values for the 20 single-stranded triNR oligonucleotides revealed that the 10 genomic triNRs can be divided into two groups: group 1, comprising the ACG, CCG, and AGC sequences, for which at least one strand displayed a high Ta value (58.5°C–76.7°C); and group 2, comprising the other seven repeats, for which Ta values were low (≤27°C) (Supplemental material).

Table 2.
TriNR abundances and Ta values

The group 1 triNRs were generally of lower abundance than the group 2 triNRs. One notable exception was the AGC sequence, which occurred about 10 times more frequently than expected. This discrepancy was intriguing given the expansion of CAG:CTG repeats (the AGC triNR contains the CAG:CTG sequence) in human genes as a frequent cause of neurological diseases (for review, see Wells and Ashizawa 2006; Mirkin 2007; Orr and Zoghbi 2007; Kovtun and McMurray 2008). Since CUG repeat-containing RNAs participate in splicing regulation and CAG-encoded polyglutamine tracts play a role in protein structure and function (Orr and Zoghbi 2007), the high proportion of the AGC triNR across all vertebrate genomes may reflect the action of positive selection over evolutionary time.

The highest Ta values for each of the triNR sequences correlated negatively with triNR abundance (r = −0.585 for occurrences in the human genome and −0.685 for the percentages in the vertebrate genomes). In summary, considering the composite data from the tetraNRs and triNRs, we conclude that hairpin-forming capacity rather than primary DNA sequence per se determined the relative abundance of simple repeating sequences over evolutionary time.

Intragenic micro/minisatellites

To test the prediction that the AGC triNR has been under positive selection, we compared the intergenic versus intragenic distributions of triNRs and tetraNRs in the human genome. At lengths ≥30 bp, both the AGC and CCG triNRs were highly (∼30%) over-represented within coding regions (Supplemental Figs. 4A, 5A), indicating selection for both sequences. At lengths ≥12 bp, the group 1 triNRs and the CCCG, AGCG, CCGG, and ACCG tetraNRs were also over-represented (>10%) within coding regions (Supplemental Figs. 4B, 5B). Since all these repeats are rare at lengths ≥30 bp (Tables 1, 2) and are associated with high (>50°C) Ta values, these results support the notion that selection actedso as to preserve inherently unstable sequences within coding regions.

To assess the functional relevance of this conclusion, we searched for and analyzed all micro/minisatellite-containing (diNRs–11-mer repeats) cDNAs in the human genome. Of the approximately 2300 nonredundant genes, strong enrichment (P-values down to 10−40) was found for genes involved in the regulation of transcription and cellular functions, synaptic activity, axon guidance, and the MAPK and WNT signaling pathways, with no differences with respect to location (5′-UTR, ORF, or 3′-UTR) (Supplemental Table 3). Hence, genes involved in cell regulatory/signaling functions, particularly in the nervous system, actively recruited simple repeat sequences. diNRs and triNRs were the most abundant (>1000 copies) (Supplemental Fig. 6A). However, diNR tracts localized preferentially to the 3′-UTR region, whereas triNRs localized preferentially in the ORF (Supplemental Fig. 6B). In addition, poly(Q) and poly(E) were the most commonly encoded amino acids, whereas no poly-aromatic amino acids (F, Y, and W) were found (Supplemental Fig. 6C). These results support the hypothesis that codons used for 3-C unbranched amino acids (Supplemental Fig. 6D) were preferentially recruited to perform protein functions, perhaps disordered regions involved in allosteric interactions (Perutz et al. 2002; Liu et al. 2006; Minezaki et al. 2006; Hilser and Thompson 2007; Friedman et al. 2008). Poly(Q) and poly(E) runs were encoded predominantly by the first translated exon, whereas the poly(A), poly(L), poly(G), and poly(P) amino acids were encoded mostly by subsequent exons (Supplemental Fig. 6C, inset). This partitioning suggests a differential localization within the folded native protein. Finally, to assess the extent of evolutionary conservation of homopolymeric amino acid runs, we analyzed three genes (TBP, MEF2A, and POU4F2); strong conservation (Supplemental Fig. 7) of two CAG:CTG and one GGC:GCC repeat tracts encoding poly(Q) and poly(G), respectively, was found. In summary, simple repeats have been recruited by a large network of regulatory genes, mostly related to nervous system activity, to affect both gene and protein function.

Repeat length distributions and base-stacking

In addition to the disparity in relative abundance (Tables 1, 2), genomic tetraNRs and triNRs also manifest variable length distributions in the human genome (Supplemental Fig. 8). A compilation of tetraNR length distributions in the nine vertebrate genomes examined revealed that only a few sequences (e.g., AGAT, ATCC, AAGG, and AAAG) display extended length distributions (Supplemental Fig. 9). Inspection of the tetraNR (Supplemental Figs. 8A, 9) and triNR (Supplemental Fig. 8B; Clark et al. 2006) sequences that formed the longest distributions in the primate lineage (human and chimpanzee) reveals their R:Y-rich nature, suggestive of a role for R:Y asymmetry in supporting extended repeat lengths over evolutionary time.

Analyses of the TDAS profiles indicated a consistent and pronounced “S-behavior” (Tables 1, ,2)2) for the R-rich single-stranded oligonucleotides that corresponded to the longest tetraNR and triNR distributions in the human genome (Supplemental Fig. 8). Because the “S-behavior” is elicited by base-stacking interactions (Applequist and Damie 1966; Powell et al. 1972; Cantor and Schimmel 1980; Friedman and Honig 1995), we considered whether base-stacking interactions might facilitate genomic expansion. Analysis of the TDAS hypochromicity curves (Table 3) yielded qualitative evidence for a direct relationship between the TDAS slope values and the genomic length distributions.

Table 3.
Intrinsic base-stacking and repeat tract length

To verify that the slope values faithfully reflected the energetics of base-stacking, the average theoretical free energy contributions (Friedman and Honig 1995) to the nearest-neighbor base-stacking (ΔGν) were evaluated (Table 3). For both the tetraNR (r2 = 0.76, P = 0.02) (data not shown) and triNR oligonucleotides, (ΔGν) correlated negatively with the slope values (Table 3; Supplemental material), supporting the view that the dynamics of base-stacking are accurately represented by the TDAS slope values (Applequist and Damie 1966). In conclusion, both the TDAS curves and the theoretical ΔGν calculations indicated a role for base-stacking in promoting long triNR and tetraNR lengths in the human genome.

To investigate these relationships quantitatively, we determined the average number of base pairs for the 10 longest tracts for each of the six genomic tetraNR and three triNR sequences (Table 3). With the sole exception of the AAAG sequence, both the TDAS slopes and the (ΔGν) values followed the rank order of genomic mean repeat lengths. The mean values for the tetraNR tract lengths correlated positively with both the TDAS slopes (r2 = 0.96, P = 0.007) and the (ΔGν) values (r2 = 0.95, P = 0.005 without AAAG; r2 = 0.62, P = 0.06 with AAAG). In contrast, no correlations were found when the stacking energies in double-stranded (Cantor and Schimmel 1980; Hunter and Lu 1997; Isaksson et al. 2004; SantaLucia and Hicks 2004; Sponer et al. 2006), rather than single-stranded, DNA were analyzed (r2 ≤ 0.02, P = not significant [NS]). In summary, nearest-neighbor base-stacking interactions within the R-rich strand of genomic simple repeat sequences played a key role in acquiring (and subsequently maintaining) considerable lengths over the course of vertebrate evolution.

Discussion

We previously documented the variable abundances of simple repeating sequences in vertebrate genomes (Bacolla et al. 2006). Herein, model-building studies suggested that non-B DNA structures could be responsible for this behavior, and subsequent coil-to-helix transition analyses on repeating tetraNR and triNR oligonucleotides revealed an inverse relationship between the capacity of these sequences to form DNA secondary structures and their abundance in nine vertebrate genomes. These relationships also revealed the action of positive selection in maintaining unstable repeats within coding regions in the human genome and the recruitment of simple repeats by genes involved in regulatory and signaling pathways, particularly of the nervous system. Furthermore, nearest-neighbor base-stacking interactions correlated directly with the repeat sequences that manifested a bias toward expansion. These remarkably simple findings crystallize the concept that intrinsic structural features of DNA played a fundamental role as cellular recognition targets over evolutionary time.

Figure 2 shows a model for the relationship between stable hairpin formation and the “evolutionary fitness” of DNA motifs. If a repeating sequence can form a stable hairpin during the process of DNA replication or transcription, this self base-paired tract may be bypassed by DNA replication (left side, a) or induce DSBs (left side, b), which then trigger deletions resulting, over evolutionary time, in the loss of the underlying duplex DNA. Alternatively, if the sequence forms a less stable hairpin, it may generate longer repeat-containing alleles as a consequence of DNA slippage (Bowater et al. 1997; Lin et al. 2006; Wang and Vasquez 2006; Wells and Ashizawa 2006; Mirkin 2007; Wells 2007). These longer alleles will be maintained in the population and will go on to generate further length polymorphisms (Fig. 2, right side, alleles 1–3). Functional repeat polymorphisms within human genes may be associated with variable phenotypic traits (filled bars), such as blood pressure, heart rate, and muscular tension (Supplemental Table 1). Such traits have the potential to confer either increased fitness (allele 2) or allele-specific susceptibility to complex diseases (allele 3) or repeat expansion disorders (Supplemental Table 1).

Figure 2.
Model for a relationship among repeat abundance, DNA structure, repeat polymorphism, and variable phenotype. (Orange) triNRs or tetraNRs. Note that overall repeat lengths are not drawn to scale and that the choice of decreased gene expression with increasing ...

Relationships with gene function and disease

An analysis of genes in which the association between triNR and tetraNR length polymorphisms and phenotypic variation and/or susceptibility to inherited disease in the general population was reported (Supplemental Table 1) revealed enrichment in development, transcriptional regulation, cell signaling, and nervous system activities (Supplemental Table 4). Hence, the ability of gene-associated simple repeats to expand and contract within relatively short evolutionary time frames could have contributed to gene and protein structure/function in vertebrate genomes (Supplemental Table 3; Legendre et al. 2007). In the process, however, repeat polymorphism may also have generated new risk factors for disease susceptibility. Gene classes involved in cell adhesion and cell–cell communication are also characterized by long intronic R:Y tracts (Bacolla et al. 2006) and are associated with high meiotic recombination rates (Frazer et al. 2007; Freudenberg et al. 2007). Hence, repeating DNA and meiotic recombination may have facilitated selective pressure over evolutionary time.

In the context of trinucleotide repeat expansion disorders, the CAG:CTG and CCG:CGG repeats have been preserved in coding regions in the human genome in spite of their inherent instability; the mechanisms involved remain speculative. In the case of FA, the frequency of carriers with expanded GAA:TTC repeats is estimated at ∼1:500 in the general population (De Biase et al. 2006); to our knowledge, this is the only example of a triNR/tetraNR of such length to be stably maintained in the human population. The unique base-stacking behavior of the GAA:TTC repeat is likely to contribute to its maintenance over the generations. Similarly, strong base-stacking interactions may also be responsible for the preferential instability at AAAG and AAGG repeats in specific malignancies (Supplemental Table 2; Ahrendt et al. 2000; Xu et al. 2001).

Role of base-stacking in repeat expansion

Base-stacking is emerging as a key component of DNA repair since the efficiency of this activity by diverse enzyme systems is inversely proportional to the target stacking strength (for review, see Yang 2006). Base-stacking may potentiate the expansion of repeat sequences by promoting replication slippage and at the same time protecting any ensuing secondary structures from repair (Jucker et al. 1996). These properties may contribute to avoidance of DNA repair and could be responsible for the unique lengths and high mutation rates (Kelkar et al. 2008) observed for the AAG, AAAG, and AAGG repeating sequences in the human genome.

Mechanisms of sequence loss

The nature of the mechanisms that lead to DNA structure-dependent sequence loss is unclear. However, hairpin formation is an obligatory step in programmed V(D)J recombination, where the ARTEMIS/DNA-PKcs (ARTEMIS also known as DCLRE1C) complex cleaves the RAG-induced hairpin structures in human B- and T-cells (Ma et al. 2002). In other tissues, such as mouse liver, muscle, heart, and kidney, the ARTEMIS/DNA-PKcs activity cleaves the terminal hairpins of recombinant adeno-associated virus (rAAV) particles used in gene therapy, thereby contributing to viral replication (Inagaki et al. 2007). An alternative pathway to process hairpins is the Holliday-junction resolvase activity (Inagaki et al. 2007), which comprises RAD51C in humans (for review, see Sharan and Kuznetsov 2007). Hence, at least two enzymatic activities are known to process hairpin structures in mammals.

Cleaved hairpins/cruciforms may trigger loss of the underlying DNA sequences by two mechanisms. First, opening of the single-stranded loops by the ARTEMIS/DNA-PKcs complex or incisions at the 3-/4-way junctions by the Holliday junction resolvase may initiate DNA repair and exonucleolytic cleavage, followed by single-strand annealing (Al-Minawi et al. 2008) or nonhomologous end-joining (Inagaki et al. 2007), respectively. Second, cleaved hairpins may impose a mutational burden with an ensuing growth disadvantage for the organism (Tanaka et al. 2007). Analogously, an excessive reservoir of hairpin-forming sequences may overwhelm cellular defense mechanisms and lead to deleterious rearrangements (Bacolla and Wells 2004; Wang and Vasquez 2006; Wells 2007).

In summary, DNA secondary structures played a key role in determining the number and length distributions of microsatellite repeats in vertebrate genomes over evolutionary time. Concomitantly, microsatellite length polymorphism may have served to modulate both gene expression and protein function, thereby contributing to, but at times also hampering, cellular regulatory circuitries.

Methods

Oligonucleotides

The sequences of the 82 single-stranded oligonucleotides used in this study are given in Tables 1 and 2. HPLC-purified synthetic oligonucleotides containing nine copies of each tetraNR and 12 copies of each triNR (to give 36-base molecules in all cases) sequences were purchased from Sigma Genosys.

Temperature-dependent absorption spectroscopy (TDAS)

Standard assay conditions

Oligonucleotides (0.6–0.8 OD260/mL, 1–3 μM) were dissolved in buffer 1 (50 mM KCl, 0.5 mM MgCl2, 0.4 mM Na-phosphate at pH 7.0) and equilibrated overnight at 25°C. The optical absorbance at 260 nm was measured on a Cary 3 Bio UV-Vis spectrophotometer equipped with a Cary temperature controller with heating (melting curve) from 10°C to 94°C and cooling (annealing curve) from 94°C to 10°C (the temperature range was extended to 4°C for the triNR oligos) at a rate of 0.75°C/min. Minimum hysteresis effects were observed, except when noted. The melting curves often showed small peaks not present during the cooling step, suggesting the slow formation of multiple DNA conformations during the overnight incubation. For these reasons, and because the annealing step simulated more closely the biologically relevant folding of single strands into hairpins than the melting step, only the cooling curves were used to determine the midpoint of transitions. We defined as Tm the temperature at which the midpoint of transition occurred during the melting step (helix-to-coil transition) and Ta the temperature at which the midpoint of transition occurred during the annealing step (coil-to-helix transition).

Other pH assay conditions

For the d(CCCT)9 and d(CCT)12 oligonucleotides, for which strong hysteresis effects were observed, TDAS measurements were performed at three different pH values in the following buffers: 0.5 mM Tris-acetate (pH 4.5), 50 mM KCl, 0.5 mM MgCl2; 0.5 mM Tris-acetate (pH 7.0), 50 mM KCl, 0.5 mM MgCl2; 0.5 mM Tris-HCl (pH 8.0), 50 mM KCl, 0.5 mM MgCl2, and both the Tm and Ta values were determined.

Salt concentration assay conditions

Variable salt concentration assay conditions were used to distinguish between nonhelical single-stranded coils and hydrogen-bonded, highly structured, helices with Tm and Ta values >94°C. These TDAS measurements were performed in buffer 2 (10 mM Tris-HCl, 10 μM EDTA at pH 7.4) with or without KCl (1, 5, 10, 50, 100, 500 mM), NaCl (1, 10, 50, 100 mM), LiCl (1, 10, 50 mM), MgCl2 (0.5 and 10 mM), and ±50% formamide.

Tm and Ta determinations

The raw absorbance curves were smoothed with a Lowess function (f = 0.10–0.20) and used to obtain the first derivative with cubic spline functions. The first derivative curves were interpolated with peak curves (SigmaPlot 8.02; SPSS Inc.), and the best fits were chosen to obtain the Tm and Ta values from the peak parameter (r2 = >0.95 in most cases).

Slopes of hypochromicity curves

For the d(AAAT)9, d(AAAG)9, d(AAAC)9, d(AAC)12, d(AAT)12, and d(AAG)12 oligonucleotides, which displayed near-linear TDAS hypochromicity curves, the slopes of the cooling curves were taken over the entire temperature range (10°C–94°C). For the d(AAGG)9 and d(AGAT)9 oligonucleotides, the slopes of the cooling curves were taken from 94°C to the temperature that preceded the annealing transitions; these portions of the curves were near-linear. For the d(AAGG)9 oligonucleotide, two near-linear segments were observed: one less steep from 94°C to ∼60°C, and a second, steeper one from ∼60°C to ∼30°C, which was used to obtain the slope value. In all cases, r2 ≥ 0.94 for the near-linear segments.

Circular dichroism (CD)

CD studies were performed to monitor quadruplex formation by the d(XGGG)9 oligonucleotides (where X = A, C, or T). The ellipticity was monitored on a Jasco J-720 spectropolarimeter over the 220–320-nm range for oligonucleotide solutions (0.6–0.8 OD260/mL, 1–3 μM) at 25°C by using the variable salt concentration assay conditions (see above) with and without K+, Na+, Li+ (1–100 mM), and Mg2+ (0.5 and 10 mM) ions.

Repeat searches

Computer searches were performed on a variety of eukaryotic genome sequences to retrieve all genomic triNR and tetraNR tracts comprising at least three tandem units (e.g., ACGACGACG or ACGGACGGACGG). All tracts with 3, 4, 5 units, and so on (no upper limits were set) were binned, and within each bin the total number of tracts was given by all possible reading frames and strand complementarities, to yield the 33 unique tetraNR sequences and the 10 unique triNR sequences (Tables 1 and 2, respectively, “Unique genomic sequence”). Computer searches (Collins et al. 2003) were performed on the following genomes: Nine vertebrate genomes: human (Homo sapiens, hg18, 18 March 2006, NCBI Build 36.1, database version 44.36f), chimpanzee (Pan troglodytes, panTro2), mouse (Mus musculus, mm8), rat (Rattus norvegicus, rn4), dog (Canis familiaris, canFam2), cow (Bos taurus, bosTau2), chicken (Gallus gallus, galGal3), zebrafish (Danio rerio, danRer4), and Fugu (Takifugu rubripes, fr2); four nonvertebrate genomes: plants [Arabidopsis thaliana, tair7 and rice (Oryza sativa ssp. japonica, release 5)], nematode worm (Caenorhabditis elegans, ce2), and yeast (Saccharomyces cerevisiae, sacCer1).

Database searches

Studies describing associations between human intragenic tetraNR and triNR length polymorphisms and inherited disease or the occurrence of tetraNR instability in cancer were retrieved from the Human Gene Mutation Database (http://www.hgmd.org; Stenson et al. 2003) and from manual PubMed searches.

Gene enrichment analyses

Functional category enrichment analyses were performed using the Database for Annotation, Visualization, and Integrated Discovery (DAVID) at http://david.abcc.ncifcrf.gov. Only the gene categories most enriched in each proteomic/genomic database were reported.

Acknowledgments

This work was supported by the NIH (ES11347), Friedreich’s Ataxia Research Alliance, Seek-a-Miracle Foundation, and the Robert A. Welch Foundation to R.D.W. and in part by the Intramural Research Program of the NIH, NCI, and Federal funds from the NCI, NIH to J.R.C. (contract no. N01-CO-12400), and financial support from BIOBASE GmbH to D.N.C. We thank J.E.L. for 40 years of research on non-B DNA structures. All research materials from the R.D.W. laboratory have been transferred to S. Mirkin (ude.stfut@nikriM.iegreS). We thank Xiaolian Gao of the University of Houston for the use of her facilities and Jin Jen of the NCI for helpful discussions.

Footnotes

[Supplemental material is available online at www.genome.org.]

Article published online before print. Article and publication date are at http://www.genome.org/cgi/doi/10.1101/gr.078303.108.

6Abbreviations: Concerning nucleic acid nomenclature, we designated a double-stranded genomic triNR or tetraNR by its unique sequence (Tables 1, 2) with no specification as to the reading frame or strand composition. Accordingly, AGC includes all genomic tracts composed of AGC:GCT, GCA:TGC, CAG:CTG, GCT:AGC, TGC:GCA, and CTG:CAG duplex DNA, where the colon separates the complementary strands. In contrast, we specify single-stranded DNA oligonucleotides and their reading frame by d(AGC)n, for example. A subscript (n) indicates the number of repeating units. Hydrogen-bonded nucleotides are also indicated by a colon, that is, A:T.

References

  • Ahrendt S.A., Decker P.A., Doffek K., Wang B., Xu L., Demeure M.J., Jen J., Sidransky D., Decker P.A., Doffek K., Wang B., Xu L., Demeure M.J., Jen J., Sidransky D., Doffek K., Wang B., Xu L., Demeure M.J., Jen J., Sidransky D., Wang B., Xu L., Demeure M.J., Jen J., Sidransky D., Xu L., Demeure M.J., Jen J., Sidransky D., Demeure M.J., Jen J., Sidransky D., Jen J., Sidransky D., Sidransky D. Microsatellite instability at selected tetranucleotide repeats is associated with p53 mutations in non-small cell lung cancer. Cancer Res. 2000;60:2488–2491. [PubMed]
  • Al-Minawi A.Z., Saleh-Gohari N., Helleday T., Saleh-Gohari N., Helleday T., Helleday T. The ERCC1/XPF endonuclease is required for efficient single-strand annealing and gene conversion in mammalian cells. Nucleic Acids Res. 2008;36:1–9. [PMC free article] [PubMed]
  • Applequist J., Damie V., Damie V. Thermodynamics of the one-stranded helix-coil equilibrium in polyadenylic acid. J. Am. Chem. Soc. 1966;88:3895–3900. [PubMed]
  • Bacolla A., Wells R.D., Wells R.D. Non-B DNA conformations, genomic rearrangements, and human disease. J. Biol. Chem. 2004;279:47411–47414. [PubMed]
  • Bacolla A., Pradhan S., Larson J.E., Roberts R.J., Wells R.D., Pradhan S., Larson J.E., Roberts R.J., Wells R.D., Larson J.E., Roberts R.J., Wells R.D., Roberts R.J., Wells R.D., Wells R.D. Recombinant human DNA (cytosine-5) methyltransferase. III. Allosteric control, reaction order, and influence of plasmid topology and triplet repeat length on methylation of the fragile X CGG.CCG sequence. J. Biol. Chem. 2001;276:18605–18613. [PubMed]
  • Bacolla A., Collins J.R., Gold B., Chuzhanova N., Yi M., Stephens R.M., Stefanov S., Olsh A., Jakupciak J.P., Dean M., Collins J.R., Gold B., Chuzhanova N., Yi M., Stephens R.M., Stefanov S., Olsh A., Jakupciak J.P., Dean M., Gold B., Chuzhanova N., Yi M., Stephens R.M., Stefanov S., Olsh A., Jakupciak J.P., Dean M., Chuzhanova N., Yi M., Stephens R.M., Stefanov S., Olsh A., Jakupciak J.P., Dean M., Yi M., Stephens R.M., Stefanov S., Olsh A., Jakupciak J.P., Dean M., Stephens R.M., Stefanov S., Olsh A., Jakupciak J.P., Dean M., Stefanov S., Olsh A., Jakupciak J.P., Dean M., Olsh A., Jakupciak J.P., Dean M., Jakupciak J.P., Dean M., Dean M., et al. Long homopurine*homopyrimidine sequences are characteristic of genes expressed in brain and the pseudoautosomal region. Nucleic Acids Res. 2006;34:2663–2675. [PMC free article] [PubMed]
  • Bowater R.P., Jaworski A., Larson J.E., Parniewski P., Wells R.D., Jaworski A., Larson J.E., Parniewski P., Wells R.D., Larson J.E., Parniewski P., Wells R.D., Parniewski P., Wells R.D., Wells R.D. Transcription increases the deletion frequency of long CTG.CAG triplet repeats from plasmids in Escherichia coli. Nucleic Acids Res. 1997;25:2861–2868. [PMC free article] [PubMed]
  • Cantor C.R., Schimmel P.R., Schimmel P.R. Biophysical chemistry. W.H. Freeman & Co.; New York: 1980.
  • Clark R.M., Bhaskar S.S., Miyahara M., Dalgliesh G.L., Bidichandani S.I., Bhaskar S.S., Miyahara M., Dalgliesh G.L., Bidichandani S.I., Miyahara M., Dalgliesh G.L., Bidichandani S.I., Dalgliesh G.L., Bidichandani S.I., Bidichandani S.I. Expansion of GAA trinucleotide repeats in mammals. Genomics. 2006;87:57–67. [PubMed]
  • Collins J.R., Stephens R.M., Gold B., Long B., Dean M., Burt S.K., Stephens R.M., Gold B., Long B., Dean M., Burt S.K., Gold B., Long B., Dean M., Burt S.K., Long B., Dean M., Burt S.K., Dean M., Burt S.K., Burt S.K. An exhaustive DNA micro-satellite map of the human genome using high performance computing. Genomics. 2003;82:10–19. [PubMed]
  • De Biase I., Rasmussen A., Bidichandani S.I., Rasmussen A., Bidichandani S.I., Bidichandani S.I. Evolution and instability of the GAA triplet-repeat sequence in Friedreich’s Ataxia. In: Wells R.D., Ashizawa T., Ashizawa T., editors. Genetic instabilities and neurological diseases. Elsevier/Academic Press; San Diego: 2006. pp. 305–319.
  • Elango N., Kim S.H., Vigoda E., Yi S.V., Kim S.H., Vigoda E., Yi S.V., Vigoda E., Yi S.V., Yi S.V. Mutations of different molecular origins exhibit contrasting patterns of regional substitution rate variation. PLoS Comput. Biol. 2008;4:e1000015. doi: 10.1371/journal.pcbi.1000015. [PMC free article] [PubMed] [Cross Ref]
  • Frazer K.A., Ballinger D.G., Cox D.R., Hinds D.A., Stuve L.L., Gibbs R.A., Belmont J.W., Boudreau A., Hardenbol P., Leal S.M., Ballinger D.G., Cox D.R., Hinds D.A., Stuve L.L., Gibbs R.A., Belmont J.W., Boudreau A., Hardenbol P., Leal S.M., Cox D.R., Hinds D.A., Stuve L.L., Gibbs R.A., Belmont J.W., Boudreau A., Hardenbol P., Leal S.M., Hinds D.A., Stuve L.L., Gibbs R.A., Belmont J.W., Boudreau A., Hardenbol P., Leal S.M., Stuve L.L., Gibbs R.A., Belmont J.W., Boudreau A., Hardenbol P., Leal S.M., Gibbs R.A., Belmont J.W., Boudreau A., Hardenbol P., Leal S.M., Belmont J.W., Boudreau A., Hardenbol P., Leal S.M., Boudreau A., Hardenbol P., Leal S.M., Hardenbol P., Leal S.M., Leal S.M., et al. A second generation human haplotype map of over 3.1 million SNPs. Nature. 2007;449:851–861. [PMC free article] [PubMed]
  • Frederico L.A., Kunkel T.A., Shaw B.R., Kunkel T.A., Shaw B.R., Shaw B.R. Cytosine deamination in mismatched base pairs. Biochemistry. 1993;32:6523–6530. [PubMed]
  • Freudenberg J., Fu Y.H., Ptacek L.J., Fu Y.H., Ptacek L.J., Ptacek L.J. Enrichment of HapMap recombination hotspot predictions around human nervous system genes: Evidence for positive selection? Eur. J. Hum. Genet. 2007;15:1071–1078. [PubMed]
  • Friedman R.A., Honig B., Honig B. A free energy analysis of nucleic acid base stacking in aqueous solution. Biophys. J. 1995;69:1528–1535. [PMC free article] [PubMed]
  • Friedman M.J., Wang C.E., Li X.J., Li S., Wang C.E., Li X.J., Li S., Li X.J., Li S., Li S. Polyglutamine expansion reduces the association of TATA-binding protein with DNA and induces DNA binding-independent neurotoxicity. J. Biol. Chem. 2008;283:8283–8290. [PMC free article] [PubMed]
  • Hilser V.J., Thompson E.B., Thompson E.B. Intrinsic disorder as a mechanism to optimize allosteric coupling in proteins. Proc. Natl. Acad. Sci. 2007;104:8311–8315. [PMC free article] [PubMed]
  • Hunter C.A., Lu X.J., Lu X.J. DNA base-stacking interactions: A comparison of theoretical calculations with oligonucleotide X-ray crystal structures. J. Mol. Biol. 1997;265:603–619. [PubMed]
  • Inagaki K., Ma C., Storm T.A., Kay M.A., Nakai H., Ma C., Storm T.A., Kay M.A., Nakai H., Storm T.A., Kay M.A., Nakai H., Kay M.A., Nakai H., Nakai H. The role of DNA-PKcs and Artemis in opening viral DNA hairpin termini in various tissues in mice. J. Virol. 2007;81:11304–11321. [PMC free article] [PubMed]
  • Isaksson J., Acharya S., Barman J., Cheruku P., Chattopadhyaya J., Acharya S., Barman J., Cheruku P., Chattopadhyaya J., Barman J., Cheruku P., Chattopadhyaya J., Cheruku P., Chattopadhyaya J., Chattopadhyaya J. Single-stranded adenine-rich DNA and RNA retain structural characteristics of their respective double-stranded conformations and show directional differences in stacking pattern. Biochemistry. 2004;43:15996–16010. [PubMed]
  • Iyer R.R., Pluciennik A., Rosche W.A., Sinden R.R., Wells R.D., Pluciennik A., Rosche W.A., Sinden R.R., Wells R.D., Rosche W.A., Sinden R.R., Wells R.D., Sinden R.R., Wells R.D., Wells R.D. DNA polymerase III proofreading mutants enhance the expansion and deletion of triplet repeat sequences in Escherichia coli. J. Biol. Chem. 2000;275:2174–2184. [PubMed]
  • Jucker F.M., Heus H.A., Yip P.F., Moors E.H., Pardi A., Heus H.A., Yip P.F., Moors E.H., Pardi A., Yip P.F., Moors E.H., Pardi A., Moors E.H., Pardi A., Pardi A. A network of heterogeneous hydrogen bonds in GNRA tetraloops. J. Mol. Biol. 1996;264:968–980. [PubMed]
  • Kelkar Y.D., Tyekucheva S., Chiaromonte F., Makova K.D., Tyekucheva S., Chiaromonte F., Makova K.D., Chiaromonte F., Makova K.D., Makova K.D. The genome-wide determinants of human and chimpanzee microsatellite evolution. Genome Res. 2008;18:30–38. [PMC free article] [PubMed]
  • Kovtun I.V., McMurray C.T., McMurray C.T. Features of trinucleotide repeat instability in vivo. Cell Res. 2008;18:198–213. [PubMed]
  • Lander E.S., Linton L.M., Birren B., Nusbaum C., Zody M.C., Baldwin J., Devon K., Dewar K., Doyle M., FitzHugh W., Linton L.M., Birren B., Nusbaum C., Zody M.C., Baldwin J., Devon K., Dewar K., Doyle M., FitzHugh W., Birren B., Nusbaum C., Zody M.C., Baldwin J., Devon K., Dewar K., Doyle M., FitzHugh W., Nusbaum C., Zody M.C., Baldwin J., Devon K., Dewar K., Doyle M., FitzHugh W., Zody M.C., Baldwin J., Devon K., Dewar K., Doyle M., FitzHugh W., Baldwin J., Devon K., Dewar K., Doyle M., FitzHugh W., Devon K., Dewar K., Doyle M., FitzHugh W., Dewar K., Doyle M., FitzHugh W., Doyle M., FitzHugh W., FitzHugh W., et al. Initial sequencing and analysis of the human genome. Nature. 2001;409:860–921. [PubMed]
  • Legendre M., Pochet N., Pak T., Verstrepen K.J., Pochet N., Pak T., Verstrepen K.J., Pak T., Verstrepen K.J., Verstrepen K.J. Sequence-based estimation of minisatellite and microsatellite repeat variability. Genome Res. 2007;17:1787–1796. [PMC free article] [PubMed]
  • Lin Y., Dion V., Wilson J.H., Dion V., Wilson J.H., Wilson J.H. Transcription promotes contraction of CAG repeat tracts in human cells. Nat. Struct. Mol. Biol. 2006;13:179–180. [PubMed]
  • Lindahl T., Nyberg B., Nyberg B. Heat-induced deamination of cytosine residues in deoxyribonucleic acid. Biochemistry. 1974;13:3405–3410. [PubMed]
  • Liu J., Perumal N.B., Oldfield C.J., Su E.W., Uversky V.N., Dunker A.K., Perumal N.B., Oldfield C.J., Su E.W., Uversky V.N., Dunker A.K., Oldfield C.J., Su E.W., Uversky V.N., Dunker A.K., Su E.W., Uversky V.N., Dunker A.K., Uversky V.N., Dunker A.K., Dunker A.K. Intrinsic disorder in transcription factors. Biochemistry. 2006;45:6873–6888. [PMC free article] [PubMed]
  • Ma Y., Pannicke U., Schwarz K., Lieber M.R., Pannicke U., Schwarz K., Lieber M.R., Schwarz K., Lieber M.R., Lieber M.R. Hairpin opening and overhang processing by an Artemis/DNA-dependent protein kinase complex in nonhomologous end joining and V(D)J recombination. Cell. 2002;108:781–794. [PubMed]
  • Minezaki Y., Homma K., Kinjo A.R., Nishikawa K., Homma K., Kinjo A.R., Nishikawa K., Kinjo A.R., Nishikawa K., Nishikawa K. Human transcription factors contain a high fraction of intrinsically disordered regions essential for transcriptional regulation. J. Mol. Biol. 2006;359:1137–1149. [PubMed]
  • Mirkin S.M. Expandable DNA repeats and human disease. Nature. 2007;447:932–940. [PubMed]
  • Mort M., Ivanov D., Cooper D.N., Chuzhanova N.A., Ivanov D., Cooper D.N., Chuzhanova N.A., Cooper D.N., Chuzhanova N.A., Chuzhanova N.A. A meta-analysis of nonsense mutations causing human genetic disease. Hum. Mutat. 2008;29:1037–1047. [PubMed]
  • Orr H.T., Zoghbi H.Y., Zoghbi H.Y. Trinucleotide repeat disorders. Annu. Rev. Neurosci. 2007;30:575–621. [PubMed]
  • Pandolfo M. Friedreich’s ataxia. In: Wells R.D., Ashizawa T., Ashizawa T., editors. Genetic instabilities and neurological diseases. Elsevier/Academic Press; San Diego: 2006. pp. 277–296.
  • Perutz M.F., Pope B.J., Owen D., Wanker E.E., Scherzinger E., Pope B.J., Owen D., Wanker E.E., Scherzinger E., Owen D., Wanker E.E., Scherzinger E., Wanker E.E., Scherzinger E., Scherzinger E. Aggregation of proteins with expanded glutamine and alanine repeats of the glutamine-rich and asparagine-rich domains of Sup35 and of the amyloid beta-peptide of amyloid plaques. Proc. Natl. Acad. Sci. 2002;99:5596–5600. [PMC free article] [PubMed]
  • Powell J.T., Richards E.G., Gratzer W.B., Richards E.G., Gratzer W.B., Gratzer W.B. The nature of stacking equilibria in polynucleotides. Biopolymers. 1972;11:235–250. [PubMed]
  • Rachwal P.A., Brown T., Fox K.R., Brown T., Fox K.R., Fox K.R. Sequence effects of single base loops in intramolecular quadruplex DNA. FEBS Lett. 2007;581:1657–1660. [PubMed]
  • Sammalkorpi H., Alhopuro P., Lehtonen R., Tuimala J., Mecklin J.P., Jarvinen H.J., Jiricny J., Karhu A., Aaltonen L.A., Alhopuro P., Lehtonen R., Tuimala J., Mecklin J.P., Jarvinen H.J., Jiricny J., Karhu A., Aaltonen L.A., Lehtonen R., Tuimala J., Mecklin J.P., Jarvinen H.J., Jiricny J., Karhu A., Aaltonen L.A., Tuimala J., Mecklin J.P., Jarvinen H.J., Jiricny J., Karhu A., Aaltonen L.A., Mecklin J.P., Jarvinen H.J., Jiricny J., Karhu A., Aaltonen L.A., Jarvinen H.J., Jiricny J., Karhu A., Aaltonen L.A., Jiricny J., Karhu A., Aaltonen L.A., Karhu A., Aaltonen L.A., Aaltonen L.A. Background mutation frequency in microsatellite-unstable colorectal cancer. Cancer Res. 2007;67:5691–5698. [PubMed]
  • SantaLucia J., Hicks D., Hicks D. The thermodynamics of DNA structural motifs. Annu. Rev. Biophys. Biomol. Struct. 2004;33:415–440. [PubMed]
  • Sharan S.K., Kuznetsov S.G., Kuznetsov S.G. Resolving RAD51C function in late stages of homologous recombination. Cell Div. 2007;2:15. doi: 10.1186/1747-1028-2-15. [PMC free article] [PubMed] [Cross Ref]
  • Sharma P.C., Grover A., Kahl G., Grover A., Kahl G., Kahl G. Mining microsatellites in eukaryotic genomes. Trends Biotechnol. 2007;25:490–498. [PubMed]
  • Sponer J., Jurecka P., Marchan I., Luque F.J., Orozco M., Hobza P., Jurecka P., Marchan I., Luque F.J., Orozco M., Hobza P., Marchan I., Luque F.J., Orozco M., Hobza P., Luque F.J., Orozco M., Hobza P., Orozco M., Hobza P., Hobza P. Nature of base stacking: Reference quantum-chemical stacking energies in ten unique B-DNA base-pair steps. Chemistry. 2006;12:2854–2865. [PubMed]
  • Stenson P.D., Ball E.V., Mort M., Phillips A.D., Shiel J.A., Thomas N.S., Abeysinghe S., Krawczak M., Cooper D.N., Ball E.V., Mort M., Phillips A.D., Shiel J.A., Thomas N.S., Abeysinghe S., Krawczak M., Cooper D.N., Mort M., Phillips A.D., Shiel J.A., Thomas N.S., Abeysinghe S., Krawczak M., Cooper D.N., Phillips A.D., Shiel J.A., Thomas N.S., Abeysinghe S., Krawczak M., Cooper D.N., Shiel J.A., Thomas N.S., Abeysinghe S., Krawczak M., Cooper D.N., Thomas N.S., Abeysinghe S., Krawczak M., Cooper D.N., Abeysinghe S., Krawczak M., Cooper D.N., Krawczak M., Cooper D.N., Cooper D.N. Human Gene Mutation Database (HGMD): 2003 Update. Hum. Mutat. 2003;21:577–581. [PubMed]
  • Subramanian S., Mishra R.K., Singh L., Mishra R.K., Singh L., Singh L. Genome-wide analysis of microsatellite repeats in humans: Their abundance and density in specific genomic regions. Genome Biol. 2003;4:R13. doi: 10.1186/gb-2003-4-2-r13. [PMC free article] [PubMed] [Cross Ref]
  • Tanaka H., Cao Y., Bergstrom D.A., Kooperberg C., Tapscott S.J., Yao M.C., Cao Y., Bergstrom D.A., Kooperberg C., Tapscott S.J., Yao M.C., Bergstrom D.A., Kooperberg C., Tapscott S.J., Yao M.C., Kooperberg C., Tapscott S.J., Yao M.C., Tapscott S.J., Yao M.C., Yao M.C. Intrastrand annealing leads to the formation of a large DNA palindrome and determines the boundaries of genomic amplification in human cancer. Mol. Cell. Biol. 2007;27:1993–2002. [PMC free article] [PubMed]
  • Walsh C.P., Xu G.L., Xu G.L. Cytosine methylation and DNA repair. Curr. Top. Microbiol. Immunol. 2006;301:283–315. [PubMed]
  • Wang G., Vasquez K.M., Vasquez K.M. Non-B DNA structure-induced genetic instability. Mutat. Res. 2006;598:103–119. [PubMed]
  • Waterston R.H., Lindblad-Toh K., Birney E., Rogers J., Abril J.F., Agarwal P., Agarwala R., Ainscough R., Alexandersson M., An P., Lindblad-Toh K., Birney E., Rogers J., Abril J.F., Agarwal P., Agarwala R., Ainscough R., Alexandersson M., An P., Birney E., Rogers J., Abril J.F., Agarwal P., Agarwala R., Ainscough R., Alexandersson M., An P., Rogers J., Abril J.F., Agarwal P., Agarwala R., Ainscough R., Alexandersson M., An P., Abril J.F., Agarwal P., Agarwala R., Ainscough R., Alexandersson M., An P., Agarwal P., Agarwala R., Ainscough R., Alexandersson M., An P., Agarwala R., Ainscough R., Alexandersson M., An P., Ainscough R., Alexandersson M., An P., Alexandersson M., An P., An P., et al. Initial sequencing and comparative analysis of the mouse genome. Nature. 2002;420:520–562. [PubMed]
  • Wells R.D. Non-B DNA conformations, mutagenesis and disease. Trends Biochem. Sci. 2007;32:271–278. [PubMed]
  • Wells R.D., Ashizawa T., Ashizawa T. Genetic instabilities and neurological diseases. Elsevier/Academic Press; San Diego: 2006.
  • Wells R.D., Dere R., Hebert M.L., Napierala M., Son L.S., Dere R., Hebert M.L., Napierala M., Son L.S., Hebert M.L., Napierala M., Son L.S., Napierala M., Son L.S., Son L.S. Advances in mechanisms of genetic instability related to hereditary neurological diseases. Nucleic Acids Res. 2005;33:3785–3798. [PMC free article] [PubMed]
  • Wojciechowska M., Napierala M., Larson J.E., Wells R.D., Napierala M., Larson J.E., Wells R.D., Larson J.E., Wells R.D., Wells R.D. Non-B DNA conformations formed by long repeating tracts of DM1, DM2 and FRDA genes, not the sequences per se, promote mutagenesis in flanking regions. J. Biol. Chem. 2006;281:24531–24543. [PubMed]
  • Xu L., Chow J., Bonacum J., Eisenberger C., Ahrendt S.A., Spafford M., Wu L., Lee S.M., Piantadosi S., Tockman M.S., Chow J., Bonacum J., Eisenberger C., Ahrendt S.A., Spafford M., Wu L., Lee S.M., Piantadosi S., Tockman M.S., Bonacum J., Eisenberger C., Ahrendt S.A., Spafford M., Wu L., Lee S.M., Piantadosi S., Tockman M.S., Eisenberger C., Ahrendt S.A., Spafford M., Wu L., Lee S.M., Piantadosi S., Tockman M.S., Ahrendt S.A., Spafford M., Wu L., Lee S.M., Piantadosi S., Tockman M.S., Spafford M., Wu L., Lee S.M., Piantadosi S., Tockman M.S., Wu L., Lee S.M., Piantadosi S., Tockman M.S., Lee S.M., Piantadosi S., Tockman M.S., Piantadosi S., Tockman M.S., Tockman M.S., et al. Microsatellite instability at AAAG repeat sequences in respiratory tract cancers. Int. J. Cancer. 2001;91:200–204. [PubMed]
  • Yang W. Poor base stacking at DNA lesions may initiate recognition by many repair proteins. DNA Repair. 2006;5:654–666. [PubMed]
  • Zahra R., Blackwood J.K., Sales J., Leach D.R., Blackwood J.K., Sales J., Leach D.R., Sales J., Leach D.R., Leach D.R. Proofreading and secondary structure processing determine the orientation dependence of CAG⋅CTG trinucleotide repeat instability in Escherichia coli. Genetics. 2007;176:27–41. [PMC free article] [PubMed]

Articles from Genome Research are provided here courtesy of Cold Spring Harbor Laboratory Press
PubReader format: click here to try

Formats:

Save items

Related citations in PubMed

See reviews...See all...

Cited by other articles in PMC

See all...

Links

Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...