• We are sorry, but NCBI web applications do not support your browser and may not function properly. More information
Logo of molbiolevolLink to Publisher's site
Mol Biol Evol. Mar 2009; 26(3): 659–669.
Published online Dec 23, 2008. doi:  10.1093/molbev/msn287
PMCID: PMC2767092

Spontaneous Mutational and Standing Genetic (Co)variation at Dinucleotide Microsatellites in Caenorhabditis briggsae and Caenorhabditis elegans

Abstract

Understanding the evolutionary processes responsible for shaping genetic variation within and between species requires separating the effects of mutation and selection. Differences between the patterns of genetic variation observed in nature and when mutations are allowed to accumulate in the relative absence of selection can reveal biases imposed by selection. We characterize the genetic variation at dinucleotide microsatellite repeats in four sets of 250-generation mutation accumulation (MA) lines, two in the species Caenorhabditis briggsae and two in Caenorhabditis elegans, and compare the mutational variation with the standing variation in those species. We also compare the mutational properties of microsatellites with the cumulative effects of mutations on fitness in the same lines. Integrated over the whole genome, we infer that the mutation rate of C. briggsae is about twice that of C. elegans, consistent with the cumulative mutational effects on fitness. The mutational spectrum (ratio of insertions to deletions) differs between repeat types and, in some cases, between species. The per-locus mutation rate is significantly positively correlated with the standing genetic variation at the same locus in both species, providing justification for the common practice of using the standing genetic variance as a surrogate for the mutation rate.

Keywords: heterozygosity, microsatellite, mutation accumulation, mutational variance, spontaneous mutation, tandem repeat

Introduction

For many reasons, understanding evolution requires understanding mutation—the rate at which mutations occur, the molecular spectrum, and their effects on fitness. First, the standing genetic variation within a population (H) is a composite function of the effective population size (Ne) and the mutation rate (μ): at equilibrium, An external file that holds a picture, illustration, etc.
Object name is molbiolevolmsn287fx1_ht.jpg≈ 4Neμ (Hartl and Clark 2007). Differences in the standing genetic variation between populations or species, or between regions of the genome in the same species, may be due to differences in Ne, in mutation rate, or both; Ne itself depends on many factors, including natural selection (Hill and Robertson 1966). Similarly, the rate of neutral divergence (k) between taxa is equal to the neutral mutation rate (μ0) (Kimura 1968). Differences between lineages in the rate of molecular evolution may be due either to differences in the absolute mutation rate (μ) or to differences in the fraction of mutations that are effectively neutral, which is a function of Ne.

Second, the evolution of genome size and/or composition may depend both on natural selection (e.g., small genomes may be favored due to the increased speed of replication) and on the mutational process (e.g., a deletion bias will lead to a reduction in genome size). Differences among taxa in genome size and/or composition, or in the properties of particular genomic components (such as introns) may result from the effects of selection, mutation, or both.

Third, the cumulative effect on fitness of deleterious mutations depends on both the rate (U) and the distribution of fitness effects (g[a]) (Lynch and Walsh 1998), and those parameters are of utmost importance in evolutionary theory. However, the rate and distribution of effects are confounded both statistically (Begin and Schoen 2006) and conceptually (Baer et al. 2007), and taxa that differ in the degree to which they suffer the effects of deleterious mutation may have different mutation rates, different distributions of mutational effects, or both.

The common theme of these examples is that the manifestation of mutational processes is almost always confounded with natural selection and/or population size, from which it follows that an unambiguous characterization of mutational processes can greatly facilitate understanding of evolution. The most effective way to dissociate mutational processes from selection is to allow mutations to accumulate at very small Ne, thereby minimizing the efficiency of selection (Kimura 1962; Kondrashov et al. 2006). Here we report the results of a comparative study of the mutational properties of dinucleotide short tandem repeat—STR, or microsatellite—loci in two species of Rhabditid nematode, Caenorhabditis briggsae and Caenorhabditis elegans. Mutations were allowed to accumulate at Ne ≈ 1 for 250 generations; selection has thus been “turned off” as much as is possible.

We target STRs for scrutiny for three reasons. First, we are interested in the relationship between the mutation rate at a well-defined class of molecular loci and the overall impact of new mutations on fitness. We have previously documented significant variation in the cumulative effects of new mutations on fitness and body size between and potentially within C. briggsae and C. elegans (Baer et al. 2005, 2006, 2008; Ostrow et al. 2007). The weight of the evidence suggests a scenario in which C. briggsae declines in fitness at about twice the rate as C. elegans. However, the mutational decay of fitness may differ between groups due to a difference in the distribution of mutational effects rather than in mutation rate. To date, there is very limited information on the relationship between the rate and spectrum of new mutations at the molecular level (μ) and the genomic mutation rate for fitness (U). It would be of considerable interest to identify an easily screened class of marker loci whose mutational properties vary with U in a consistent way.

Second, we are interested in the relationship between standing variation and mutation rate. Almost all studies that report variation in “mutation rate” infer mutation rate indirectly, from standing genetic variation and/or divergence among taxa at a class of loci assumed to be evolving neutrally. Any inference derived from such a study is only as robust as the underlying assumptions. There have been remarkably few studies (we know of only one in a eukaryote) in which the standing variance at a set of loci has been directly compared with the demonstrated mutation rate at the same set of loci. A strong positive association between the standing variance and the demonstrated mutation rate provides the best possible justification for inferring mutation rate from standing variation. STR loci are ideal for this purpose because their high mutation rate provides much greater power than could be obtained from other classes of loci (e.g., base substitutions at single nucleotides).

Third, an influential model of genome evolution invokes a general mutational bias in favor of small deletions relative to insertions as a driving force behind the evolution of genome size (Petrov 2001, 2002). A survey of random nuclear mutations in the C. elegans genome found a significant excess of short insertions relative to deletions (Denver et al. 2004), contrary to the phylogenetic pattern observed in nuclear pseudogenes (Witherspoon and Robertson 2003). Similarly, a survey of STR mutations in the N2 strain of C. elegans showed a significant insertion bias (Seyfert et al. 2008), although repeat motifs were not uniformly represented, nor was an effect of repeat motif tested in that study.

Finally, our most fundamental motivation is “genomic natural history.” The study is part of an ongoing effort to understand the factors underlying taxonomic variation in the genomic mutation rate. Evolutionary theory provides clear guidance regarding the evolution of mutation rate with respect to mating system and chromosomal context (reviewed in Drake et al. 1998; Sniegowski et al. 2000; Baer et al. 2007), but there have been very few empirical studies conducted in a systematic comparative context. The two species studied here have very similar life histories; thus there is no a priori reason to expect systematic differences in the strength of selection on mutation rate.

Materials and Methods

Mutation Accumulation (MA) Lines

The MA protocol has been described in detail elsewhere (Vassilieva and Lynch 1999; Baer et al. 2005). Briefly, highly inbred stocks of each strain were replicated 100 times and perpetuated by single-hermaphrodite transfer for 250 generations. This protocol results in a genetic effective population size of Ne ≈ 1, minimizing the efficiency of natural selection and ensuring that all but the most deleterious mutations evolve according to neutral dynamics.

Choice of Loci and Primer Design

We initially identified all dinucleotide repeat loci of ≥5 perfect repeats in the draft C. briggsae genome (strain AF16) by Blast search against the NCBI nr database (ca. October 2002) for the oligonucleotide sequence XX(5), where XX is the dinucleotide sequence AC, AG, AT, and CG, using the “short, nearly exact match” algorithm. The dinucleotide repeat and 200 bp of upstream and downstream flanking sequence were saved and screened for duplicates by pairwise Blast search of the flanking sequence. We chose at random 96 loci of at least five repeats, allowing at most one nucleotide indel over the entire repeat sequence. Primers were designed using Primer3 software (Rozen and Skaletsky 2000) using the default parameters with an optimum fragment length of 250, a minimum(maximum) allowable fragment length of 150(350) bases, and a minimum distance of 10 bases between the repeat and the primer termini. Loci for C. elegans were chosen from those previously published by Sivasundar and Hey (2003) and Frisse (1999) and supplemented with loci chosen randomly from the C. elegans genome (WS137).

Presence and length of microsatellite loci were confirmed by direct sequencing of cloned polymerase chain reaction (PCR) products for most loci in both species (supplemental table S1, Supplementary Material online). PCR products were cloned into TOPO TA cloning vectors (Invitrogen, Carlsbad, CA) and sequenced on an ABI 3730 automated capillary sequencer at the University of Florida's Interdisciplinary Center for Biotechnology Research (ICBR). Sequences were aligned in Sequence Viewer 4 (CLC Bio, Cambridge, MA) using default parameters. Repeat number was determined by taking the average of at least two sequencing reads per locus and searching for dinucleotide repeat motifs with the PHOBOS algorithm (Christoph Mayer, Ruhr-Universität Bochum).

Genotyping

Genomic DNA was extracted from two replicate cultures of each MA line using a modified protocol of (Williams et al. 1992). We employed a nested PCR strategy with fluorescently tagged primers, via a modification of the “three-primer” method of Schuelke (2000). Reactions of 15 μl were done in 96-well plates, using 1 μl of DNA sample, 60 pmol of selective primer, 6 pmol of M13-tail primer, 60 pmol of labeled M13 primer, 1.5 mM MgCl2, 10 mM of dNTPs, and 0.375 units of Eppendorf Taq polymerase. Reactions were initially run for 10 cycles of 40 s at 94 °C, 40 s at 60 °C, and 40 s at 72 °C; the annealing temperature was then decreased to 48 °C, and the reaction was continued for an additional 30 cycles.

Three fluorescent labels were used (fam, hex, and ned), with only one label used for a given reaction (= locus). Products with different fluorescent labels were pooled and analyzed using an Applied Biosystems (Foster City, CA, ABI) 3730XL DNA analyzer (The Center for Applied Genomics, Hospital for Sick Children, Toronto, ON, Canada). Data were analyzed using Genemapper 3.0 software (ABI) and GeneMarker (SoftGenetics). Fragment length was established relative to a known size-standard ladder (GeneScan 500, ABI). We employed an iterative binning procedure to identify putative mutants. In the first iteration, we calculated the mean fragment length for all replicates. Fragments that deviated by >1.5 bases from the mean were removed from the data set and the mean recalculated. Alleles that differed by >1.5 bases from the recalculated mean were scored as putative mutants. Putative mutants were then reamplified from an independent DNA extraction and regenotyped as above. Putative mutants that had the same fragment length in both reactions were scored as mutant alleles; putative mutants that did not have the same fragment length in both reactions were considered false positives and scored as nonmutant. Averaged over all loci, the initial false-positive rate was approximately 40%, with a final false-positive rate equal to (p/L)(0.40), where p is the probability of observing a mutant allele at a locus, L is the number of MA lines genotyped at a locus and assuming a complete mutational bias to the observed allele. Averaged over all loci, the final false-positive rate is on the order of 0.5%.

A conservative estimate of the false-negative rate is to inflate the observed fraction of mutant alleles by the initial false-positive rate (≈40%). Thus, it is possible that we have underestimated the true mutation rate by 20% if there is no mutational bias and as much as 40% if there is a complete mutational bias to the wild-type allele. The only solution to the false-negative problem is to genotype each line twice at each locus, which would have been prohibitively expensive. Reported values of mutation rates are uncorrected values.

The possibility of contamination is an inherent problem in MA experiments; the most likely cause is mislabeling of plates or tubes. We found no lines that had high frequencies of mutations (≥3 in C. briggsae, >2 in C. elegans) that shared the identical set of mutations. As an additional test, we fit the distribution of the number of mutations among lines to a Poisson distribution; nonindependence of mutations in lines would increase the variance above the Poisson expectation. The number of lines differed among loci, so we used the weighted mean number of lines for each repeat type as the sample size. In three of the four strains, the distribution of numbers of mutations was an excellent fit to the Poisson (goodness-of-fit chi-square, P > 0.18 in all cases). In PB306, there was a marginally significant excess of lines with one mutation and a deficit of lines with two or more mutations (obs/exp 0 = 45/49; obs/exp 1 = 21/29; obs/exp ≥2 = 2/5; goodness-of-fit χ12 = 4.59, P > 0.03). Undetected contamination would have the effect of reducing the actual number of lines surveyed, thus potentially biasing the estimated mutation rate downward.

Data Analysis

Mutation Rate

Allelic state was initially modeled as a binomially distributed random variable X with state 0 = wild-type and state 1 = variant (length differences among variant alleles are not considered); each locus is assumed to provide an independent manifestation of the mutational process. The null hypothesis is no difference among groups in the binomial parameter p = Pr(X = 1). The mutation rate μ can be approximated by p/t, where t is the number of generations of MA. Because different loci were examined in the two species, loci are nested within species. Differences among groups were assessed via generalized linear mixed model as implemented in SAS v.9.1 PROC GLIMMIX, using a logit link function (http://support.sas.com/rnd/app/papers/glimmix.pdf). Significance of approximate F-tests for fixed effects was determined by the residual pseudolikelihood method (Wolfinger and O'Connell 1993); degrees of freedom were calculated by the Kenward–Rogers method. Length-specific mutation rates were calculated by estimating the least-squares mean of the binomial parameter for each strain/repeat type combination and dividing by the number of generations of MA. SAS code is presented in supplementary table S2, Supplementary Material online.

We first tested for variation between strains within each species. Two strains cannot provide a meaningful estimate of the within-species variance, so strain is treated as a fixed effect. The initial model was p = rep_num + rep_type + strain + locus(rep_type(strain)) + all interactions, where p is the binomial parameter, rep_num is the number of dinucleotide repeats, and rep_type is the dinucleotide motif (AC, AG, and AT). Strain, rep_type, and their interactions are categorical fixed effects; locus is a categorical random effect; and rep_num is a continuous covariate. To account for overdispersion of the data, the among-locus (residual) component of variance was estimated separately for each rep_type/strain combination. Among-locus variance in mutation rate was assessed by likelihood-ratio test (LRT), comparing the (pseudo) likelihoods of the models with and without the residual variance term. Twice the difference in the negative log-likelihood of the two models is chi-square distributed with degrees of freedom equal to the difference in the number of parameters between the two models.

We next tested for differences among species using the model p = rep_num + rep_type + species + strain(species) + locus(rep_type(species)) + all interaction terms; the among-locus component of variance is estimated separately for each rep_type/strain combination as described previously. A complication is that the within-species analysis revealed significant interactions with strain in C. elegans but not in C. briggsae (see Results) and the fixed-effect model fits a single effect of strain nested within species. To account for the variation in the effects of strain between the two species, we pooled the two strains of C. briggsae and compared the pooled data with C. elegans, using the same model as above.

Mutational Spectrum

Mutations were characterized as insertions or deletions (without respect to length), and the proportion of deletions q was calculated for each locus in each strain, where q = #deletions/total # of mutations. Allelic state (insertion or deletion) is modeled as a binomially distributed random variable, with the null hypothesis of no difference among groups in the binomial parameter q. Differences among groups were analyzed using PROC GLIMMIX with the logit link function. We initially tested for differences between strains within species, using the model q = rep_num + rep_type + strain + all interactions + locus(rep_type). The analysis failed to converge, so we removed repeat number from the model, subsuming repeat number in the among-locus variance. That model also failed to converge, so we considered differences among strains for each repeat type separately, using the model q = strain + locus, employing a Bonferroni-corrected global α = 0.05 with three tests.

We next tested for differences between species. The full model did not converge, so we considered differences among species for each repeat type separately, using the model q = species + strain(species) + locus(species), employing a Bonferroni-corrected global α = 0.05 with three tests. Species and strain are fixed effects; locus is a random effect.

Whole-Genome Distributions of STR Loci

To examine the genomewide distribution of dinucleotide STRs, we performed in silica searches of whole-genome sequences of C. elegans (Wormbase version WS189) and C. briggsae (Wormbase version WS191). Because the current build of the C. briggsae genome (WS191) contains “random” reads that share similarity but cannot be placed exactly on a particular chromosome assembly we omitted these sequences from the analysis. The inclusion of these random sequences increases the overall number of STR loci, but does not change the relative abundances of each repeat type (data not shown). This strategy provides a conservative estimate of the genomewide distribution of STR loci in C. briggsae. All dinucleotide STRs of ≥5 perfect repeat units were identified using the PHOBOS algorithm (Christoph Mayer, Ruhr-Universität Bochum, http://www.ruhr-uni-bochum.de/spezzoo/cm/cm_phobos.htm). The PHOBOS parameters for all searches were: search method = imperfect, minimum unit length = 2, maximum unit length = 4, indel score = −2, recursion depth = 7, and minimum score = 8. Here, we present only perfect repeats of ≥5 repeat units. The expected genomewide mutation rate μG was calculated as: μG = μACpAC + μAGpAG + μATpAT, where μi is the expected mutation rate of repeat type i and pi is the proportion of repeat type i in the genome. To estimate μi, we determined the average repeat number n for repeat type i and determined the expected mutation rate for a repeat of length n from the linear regression of the per-locus mutation rate on repeat number, averaged over the two strains of each species. Regressions were done using the MIXED procedure in SAS v. 9.1, including locus as a random effect.

Comparison of Mutational and Standing Variation

Six natural isolates (“strains”) of C. briggsae were genotyped at 32 loci; these strains were the only wild strains of C. briggsae that were publicly available at the time. Strains (supplementary table S4, Supplementary Material online) were obtained from the Caenorhabditis Genetic Center stock collection at the University of Minnesota and cryopreserved upon receipt. Genotyping was performed as described above for the MA lines. We calculated the locus-specific effective number of alleles ne,i at locus i under both the stepwise mutation model (SMM) (Kimura and Ohta 1975) and the infinite alleles model (IAM) (Kimura and Crow 1964) using the Microsatellite Analysis software package (Dieringer and Schlotterer 2003). We used published values of ne for 19 loci in 23 strains of C. elegans (Sivasundar and Hey 2003).

To assess the relationship between standing genetic variation and mutation rate, we first calculated the correlation between the per-locus mutation rate in the two strains within each species using SAS v.9.1 PROC MIXED with the TYPE = UNR covariance structure. The per-locus mutation rate was significantly positively correlated between the two strains in each species (C. briggsae r = 0.41, P < 0.002; C. elegans r = 0.65, P < 0.0001), so we used the average of the two strains. We then calculated the Spearman's correlation between ne,i in the wild isolates and μi, using SAS PROC CORR. To account for the effect of repeat number on the correlation, we regressed the dependent variables (ne,i and μi) against the mean repeat number of the relevant group (wild isolates or MA lines) and then calculated the Spearman correlation of the residuals of the two regressions.

Results

Mutation Rate

Locus-by-locus statistics of the mutational properties (rate and spectrum) are presented in supplementary table S3, Supplementary Material online, and among-locus averages are presented in table 1; the distribution of mutation rates for each repeat type–strain combination are shown in figure 1. SAS code and tables of tests of fixed effects are presented in supplementary table S2, Supplementary Material online.

Table 1
Summary of Per-Generation Mutation Rate μ for all Loci Assayed in Caenorhabditis briggsae and Caenorhabditis elegans
FIG. 1.
Relationship between observed mutation rate (μOBS; dependent variable) and number of perfect repeats (independent variable) for each repeat type in the two strains Caenorhabditis briggsae and Caenorhabditis elegans. Panels A–C are of AC, ...

i) Variation within species—Overall mutation rate does not differ between strains in either species. The large residual among-locus component of variance in both species (C. briggsae, LRT chi-square = 146.3, df = 6, P<0.0001; C. elegans LRT chi-square = 26.1, df = 6, P < 0.0003) is biologically relevant, because it means that there are locus-specific effects on mutation rate beyond the simple ones of repeat number and repeat type. Further, there is significant variation between strains in the among-locus variance in C. elegans but not in C. briggsae (C. elegans LRT chi-square = 5.4, df = 1, P < 0.03; C. briggsae LRT chi-square = 1.8, df = 1, P > 0.17). Mutation rate increases with repeat number in both species, although the effect is more pronounced in C. elegans than in C. briggsae (C. elegans, average slope of the regression of μ on repeat number = 3.84 × 10−6, P < 0.0001; C. briggsae, average slope = 3.43 × 10−6, P < 0.03). In addition, in C. elegans, there is a marginally significant interaction between repeat number and repeat type (P < 0.04), indicating that the quantitative effect of repeat number on mutation rate varies depending on the particular repeat type (depicted in fig. 1).

There are marginally significant main effects of repeat type in both species (0.03 < P < 0.05), but the rank order differs between species (table 1). In C. briggsae, the rank order averaged over strains is AG > AC > AT. In C. elegans, the rank order averaged over strains is AC > AG > AT, but there is a significant interaction between strain and repeat type (P < 0.02). The interaction between strain and repeat type results from the difference between strains in the average length-corrected AG mutation rate (4.3 × 10−5 in N2 vs. 0.82 × 10−5 in PB306). The qualitative rank order for N2 is AG > AC > AT; for PB306 it is AC > AT ≈ AG. Moreover, in C. elegans, there is a marginally significant (P > 0.02) three-way interaction between repeat number, repeat type, and strain.

ii) Variation between species—There is a highly significant difference in overall mutation rate between the two species (P < 0.002), with C. briggsae having an average length-corrected mutation rate almost 3-fold higher than that of C. elegans (5.64 × 10−5/generation vs. 1.98 × 10−5/generation). Further, there is a significant (P < 0.01) interaction of repeat number with species. However, the slope of the regression of mutation rate on repeat number differs by only about 10% between the two species, which suggests that the effect of repeat length is probably not qualitatively different in the two species. There is a marginally significant (P < 0.03) interaction between repeat type and species, which can primarily be attributed to the much higher AG mutation rate in C. briggsae than in C. elegans. Finally, there is a marginally significant (P < 0.03) three-way interaction between repeat number, repeat type, and species.

The preceding between-species analysis in which strain is considered a fixed effect fits an average effect of strain nested within species, but in reality some of the effects of strain differ between the two species, there being several significant interactions with strain in C. elegans but not in C. briggsae. When the two strains of C. briggsae are pooled and compared with (unpooled) C. elegans, the results are nearly identical; in particular, the main effect of species remains highly significant (P < 0.001).

Mutational Spectrum

The indel spectrum differs between species for two of the three repeat types (AC, P < 0.015; AG, P < 0.001; experimentwide α = 0.05 with three tests = 0.05/3 = 0.0167), with the bias being toward deletions in C. briggsae and toward insertions in C. elegans (table 2, last column). In neither case is there a significant difference between strains within either species (P > 0.11 in all cases). The pattern differs for AT dinucleotides; there is a consistent insertion bias in C. briggsae (HK104, q = 0.2; PB800, q = 0), whereas the two strains of C. elegans have opposite indel biases (N2, q = 0; PB306, q = 1). However, there are fewer AT loci than AC or AG in the data set for C. elegans, and we were unable to assess the significance of the differences among groups at AT repeat loci.

Table 2
Summary of the Indel Spectrum

Overall, the data provide a relatively poor fit to the strict SMM (table 2). Averaged over strains and repeat types, the fraction of mutations that are insertions or deletions of a single repeat is very similar in the two species (73% in C. briggsae, 71% in C. elegans). This result is quite consistent with those reported by Seyfert et al. (2008) from the N2 strain of C. elegans. For loci that are comparable between the two studies (AC dinucleotides of <100 repeats), the fraction of single-step mutations is 56% in our study (Supplementary table S3, Supplementary Material online) and 65% in theirs.

Genomic Mutational Properties

We can extrapolate from our experimental results to make inferences about the genomewide mutational properties of the two species (table 3). The complete distributions of perfect dinucleotide STR loci are presented in figure 2; the estimated total for C. briggsae is probably an underestimate (see Methods). In the C. elegans genome, we found 5,586 STR loci of ≥5 perfect repeats of which (approximately) 35% are AC with an average length of ~7 repeats, 35% are AG (~ 8 repeats), 26% are AT (~8 repeats), and 5% are CG. In C. briggsae, we found 4,408 STR loci, of which (approximately) 26% are AC with an average length of ~7 repeats, 58% are AG (~10 repeats), 14% are AT (8 repeats), and 3% are CG. Using the species-average point estimates of the linear regression parameters calculated from the mutation data, the expected number of mutations at dinucleotide STRs per generation in C. briggsae is about twice that of C. elegans.

Table 3
Inferred Genomewide Mutational Properties
FIG. 2.
Genomewide distribution of dinucleotide STR loci in (A) the Caenorhabditis elegans and (B) the Caenorhabditis briggsae genomes. Only perfect repeats are reported. The Y-axis is the frequency in the genome. The X-axis is divided into bins of five repeats, ...

The Relationship between Mutational and Standing Genetic Variation

Standing genetic variation in the six wild strains of C. briggsae is summarized in supplementary table S4, Supplementary Material online. The correlation between the locus-specific mutation rate, μi, and the standing genetic variance, ne, is significantly positive in both species, but almost twice as great in C. elegans (Spearman's r = 0.88, n = 19, P < 0.0001) as in C. briggsae (Spearman's r = 0.51, P < 0.003). The correlation between the residuals of the regressions of ne and μi on repeat number is somewhat weaker, as expected (C. elegans, Spearman's r = 0.60, P < 0.01; C. briggsae, r = 0.30, P > 0.11). If the mutation rate of C. briggsae is in fact twice that of C. elegans, the expectation is that there will be twice as much standing genetic variance in C. briggsae as in C. elegans. However, because of the geographic disparity between our sample of wild C. briggsae strains and the sample of wild C. elegans of Sivasundar and Hey (2003) and because the global population genetic structure differs between the two species (Cutter et al. 2006; Dolgin et al. 2008), comparison of standing genetic variation between the two data sets is not meaningful.

Discussion

Comparison of Mutation Rate among Species/Strains

The primary motivation of this study is to compare the mutational properties (rate and spectrum) between strains and species, toward the end of understanding the factors underlying variation in those properties. Results from several experiments lead to the conclusion that, on average, the cumulative effects of new mutations on fitness (and body size) accrue about twice as fast in these two strains of C. briggsae as in these two strains of C. elegans (Baer et al. 2006; Ostrow et al. 2007). “Cumulative effects” in this context refers to both the change in mean phenotype over time (ΔM) and in the increase in genetic variance due to the input of new mutations, VM. ΔM and VM are functions of both the rate and distribution of phenotypic effects of new mutations (Lynch and Walsh 1998, pp. 328–335), and statistically separating the effects of the rate and distribution of effects is notoriously difficult because the sampling variances of the two are negatively correlated (Begin and Schoen 2006). The ~2-fold difference between the two species in mutation rate supports the conclusion that it is a difference in mutation rate per se that underlies the greater cumulative mutational effects in these strains of C. briggsae (although the distribution of effects may differ as well). More generally, our results support the intuitive conclusion that there is a close correspondence between the genomic mutation rate for fitness, U, and the molecular mutation rate, μ.

There is a caveat to the conclusion that the mutation rate is higher in C. briggsae. In an MA experiment, mutational variance (VM and ne) is proportional to the effective population size, Ne. Our experiment was designed to have Ne = 1, by transferring a single-hermaphrodite worm every generation. However, when the focal individual failed to reproduce, we picked a “backup” worm from the previous generation (occasionally from two generations previous), thereby increasing the census size above 1. When census size fluctuates over time, Ne is a function of harmonic mean population size (Hartl and Clark 2007, p. 121). In the case where some generations have census size of 1 and other generations are large, the harmonic mean is insensitive to differences of many orders of magnitude of the large-size generations and depends only on the ratio t(1):t(large), where t represents number of generations of a given size. From this calculation, it turns out that the HK104 strain of C. briggsae had a larger Ne (≈1.5) than did the other three strains in our experiment (≈1.1 in all cases), because we had to go to backup more often in that strain than in the other three strains. A caveat to this caveat, however, is that selection acts to reduce Ne (Hill and Robertson 1966), and if selection were stronger in HK104 than in the other strains (which seems likely), then Ne would actually be closer to one than we infer from census size. The fact that the mutational properties of the HK104 and PB800 strains of C. briggsae are very similar, both in this study and in our previous studies, suggest that the potential difference in Ne between HK104 and the other three strains is not a major factor.

Comparison of Indel Spectrum among Species/Strains

Many, but by no means all, studies of the STR mutational spectrum report a bias toward insertions (summarized in Ellegren 2004; Paun and Horandl 2006). For two of the three repeat types (AC, AG), the direction of the indel spectrum differed significantly between the two species: the bias was toward deletions in C. briggsae (An external file that holds a picture, illustration, etc.
Object name is molbiolevolmsn287fx3_ht.jpgAC=0.80, An external file that holds a picture, illustration, etc.
Object name is molbiolevolmsn287fx3_ht.jpgAG=0.77) and toward insertions in C. elegans (An external file that holds a picture, illustration, etc.
Object name is molbiolevolmsn287fx3_ht.jpgAC=0.42, An external file that holds a picture, illustration, etc.
Object name is molbiolevolmsn287fx3_ht.jpgAG=0.21). The pattern at AT repeats was more variable (table 2, last column), but the combination of fewer loci sampled in C. elegans and a substantially lower observed mutation rate for AT versus AC and AG in C. briggsae limits the strength of inference about AT repeats. The overall insertion bias observed in C. elegans, and particularly N2, is consistent with previous findings of a mutational insertion bias in the N2 strain, both at STR loci (Frisse 1999; Seyfert et al. 2008) and for random nuclear sequence (Denver et al. 2004). Denver et al. speculated that the qualitative difference between the indel spectrum of mutations accumulated in an MA experiment and those observed over evolutionary time (Witherspoon and Robertson 2003) may be due to the effects of selection. A possible alternative explanation, given the results of this study, is that the ancestral mutational bias was toward deletion and that the C. elegans lineage evolved a bias toward insertion in recent time. If so, the apparent discrepancy between the MA and evolutionary patterns can be resolved without invoking natural selection. However, the median size of insertion substitutions observed by Witherspoon and Robertson (+2 nucleotides) was smaller than the median deletion (−7 nucleotides) and the means were even more disparate, so conclusions drawn from these STR data may not be relevant to the genome at large.

Relationship between Mutational and Standing Genetic Variation

At mutation-drift equilibrium, the standing genetic variance at a locus is proportional to the mutation rate under both the IAM (An external file that holds a picture, illustration, etc.
Object name is molbiolevolmsn287fx4_ht.jpge ≈ 4Neμ) and the SMM An external file that holds a picture, illustration, etc.
Object name is molbiolevolmsn287fx5_ht.jpg In the absence of perturbing forces (i.e., natural selection), Ne is the same at all loci, so differences in standing genetic variance among loci must be due either to differences in mutation rate or to natural selection (sampling variance notwithstanding). The mutation rate is usually known only imprecisely and is usually assumed to be uniform across loci, and differences among loci or classes of loci in the standing genetic variance are typically attributed to differences in the strength or efficiency of selection (e.g., Sabeti et al. 2006). Of the large body of studies of the mutational properties of STR loci (reviewed in Ellegren 2004), few of those that measure mutations directly (i.e., from the distribution over a pedigree or from reporter constructs) have attempted formal comparisons of the relative mutation rates of different repeat types, and those that draw inferences indirectly from the standing genetic variance or from comparisons of substitutions between species cannot unambiguously partition the effects of selection from those of mutation.

Our data show a (variably) strong positive correlation between the per-locus mutation rate and the standing variance at the locus, but the relationship is stronger in C. elegans (Spearman's r = 0.88) than in C. briggsae (r = 0.51). After accounting for the expected effects of repeat length, the correlations are weaker but still positive (r = 0.60 in C. elegans; r = 0.30 in C. briggsae). A possible source of the discrepancy between the two species is that we only included six strains of C. briggsae, whereas Sivasundar and Hey (2003) had 23 strains of C. elegans. Alternatively, it is known that the population structure differs between C. elegans and C. briggsae, and our sample of strains from C. briggsae incorporates samples from two major clades (Cutter et al. 2006). If the migration rate (4Nem) differed among regions of the genome (there is no reason to expect it would), it could potentially reduce the correlation between standing genetic variance and mutation rate. Nevertheless, this is an important result because it demonstrates that, on average, the standing genetic variation at a locus does substantially reflect the underlying mutation rate. We know of one other study in which direct estimates of mutation rate were compared with standing variation at the same set of loci. Vigouroux et al. (2002) reported a Spearman's r of 0.42 of heterozygosity with mutation rate at a set of 98 STR loci in a collection of 193 maize plants (Matsuoka et al. 2002). Thus, studies that employ indirect methods of inference of mutation rate are justified. This was not a foregone conclusion—for example, recall the discrepancy between the indel spectra inferred from direct and indirect studies of C. elegans—and it remains to be verified for classes of loci other than STRs.

Taxon-Specific Differences in Mutability of Different Repeat Types

The most comprehensive study of the mutational properties of STRs is that of Kelkar et al. (2008), who indirectly inferred the mutational properties of a very large number (>950,000 loci of four or more repeats) of orthologous dinucleotide STR loci in the human and chimpanzee genomes. Comparison of our study with their vastly larger one is instructive in several respects. First, the observed positive relationship between mutation rate and standing variation validates their inferences—there is no evidence that their data are positively misleading because of hidden biases. Second, they observe substantial (order of magnitude) residual variation among loci in the mutation rate, after accounting for the effects of repeat number and motif. Thus, the among-locus variance in mutation rate that we observe experimentally appears to be an honest manifestation of biologically relevant properties that operate over long evolutionary time scales. Numerous studies have found both direct (i.e., position effects; Lichtenauer-Kaligis et al. 1993) and (more often) indirect evidence that the mutation rate varies with local genomic context (e.g., Hardison et al. 2003; Lercher et al. 2004; Arndt et al. 2005). Various explanations have been offered; one that appears particularly convincing is that mutation rates are higher in regions of closed chromatin (Prendergast et al. 2007), although the specific mechanism is not known.

An intriguing disparity between our results and the findings of Kelkar et al. (2008) is the difference in the relative magnitudes of mutation rates of the different repeat types. In the human–chimp genome, AT dinucleotides mutate significantly faster than AC or AG repeats. Our data from C. briggsae show that AT repeats have a significantly lower mutation rate than AC and AG repeats, and the standing variance at AT repeats in C. briggsae is lower than for AG repeats, which have the highest observed mutation rate, although nothing like the predicted 3-fold difference (supplementary table S4, Supplementary Material online). Kelkar et al. posit that the higher mutation rate at AT repeats results from the smaller number of hydrogen bonds in double-stranded AT repeats relative to AC or AG repeats. If we provisionally accept that the difference between our data and the human/chimp is real and not Type-I error on our part resulting from our vastly smaller sample size, it leads to the conclusion that “DNA is not just DNA,” that is, that DNA with the same sequence mutates in different ways in different taxa.

Extrapolation to Genomewide Mutation Rate

Two properties of the genomic distribution of dinucleotide STRs differ qualitatively between (the AF16 strain of) C. briggsae and (the N2 strain of) C. elegans (fig. 2). First, the fraction of AG repeats is much greater in C. briggsae than in C. elegans (58% vs. 35%) and there are only half as many AT repeats in C. briggsae as in C. elegans (14% vs. 26%), and second, the average AG locus is over two repeats longer in C. briggsae than in C. elegans (10.1 vs. 7.6 repeats; table 3). Taken together and extrapolating over the genomic distribution from the linear regression parameters inferred from our MA study, we infer that the per-locus dinucleotide STR mutation rate is about twice as great in C. briggsae, as is the expected total number of dinucleotide STR mutations (0.24/gen) as in C. elegans (0.12/gen). There are obviously many sources of uncertainty in those calculations. Nevertheless, the results are remarkably consistent with the body of evidence inferred indirectly from the cumulative phenotypic effects of MA.

Given the short average length of perfect dinucleotide STRs and their relative paucity in the Caenorhabditis genome (compare with the ~950,000 orthologous dinucleotide STR loci in the human–chimp genome), it is unlikely that differing properties of perfect dinucleotide STRs in the two species is the sole cause of the different cumulative effects of MA. Rather, the difference in the mutational properties of STR loci is probably a byproduct of some more fundamental difference in the mutational input and/or output. One obvious possibility is that some property of the DNA repair machinery differs consistently between the two species. The two species are believed to have diverged on the order of 100 million generations ago, roughly on the timescale of the divergence of humans and rodents (Cutter 2008). Significant differences in various aspects of the DNA repair process are known to exist between humans and rodents, and between more closely related taxa (Eisen and Hanawalt 1999), and it is certainly possible that differences exist between the two Caenorhabditis species. However, there is no concrete evidence for any such difference.

A second possibility, for which there is some evidence, albeit indirect, is that some aspect of oxygen free-radical metabolism differs between the two species, or at least between the strains included in this experiment. Reactive oxygen species (ROS) are normal byproducts of cellular metabolism, and oxidative stress has been implicated in microsatellite instability (Jackson and Loeb 2000; Lee et al. 2006). Howe and Denver (2008) have documented the presence of heteroplasmy for a deletion in the mitochondrial NADH-dehydrogenase 5 (ND5) gene. ND5 functions in ROS metabolism, and there is evidence that individuals with defective ND5 suffer increased ROS damage. The Caenorhabditis Genetics Center stocks of HK104 and PB800 used by Howe and Denver both have low frequencies of the ND5 deletion, but the immediate ancestor of our HK104 MA lines apparently evolved a high frequency of the ND5 deletion during the inbreeding leading up to the MA experiment (D. Denver, personal communication).

Conclusions

It is estimated that the genomic mutation rate at dinculeotide STRs in C. briggsae is roughly twice that of C. elegans. This finding is entirely consistent with the body of evidence from the cumulative mutational effects on the phenotype, and provides one of the first direct demonstrations of a relationship between molecular and phenotypic mutational properties for a nonmutator genotype. Further, the indel spectrum differs between repeat types and species, in contrast to many (but certainly not all) previous studies that have found a general insertion bias at STR loci. Finally, the per-locus mutation rate is significantly positively correlated with the standing genetic variation in both species. This result, which was also found in maize by Vigouroux et al. (2002), provides empirical justification for using the standing genetic variance as a proxy for the mutation rate.

Supplementary Material

Supplementary tables S1S4 are available at Molecular Biology and Evolution online (http://www.mbe.oxfordjournals.org/).

Supplementary Material

[Supplementary Data]

Acknowledgments

This project had its origins when C.F.B. was a postdoc in the laboratory of Mike Lynch at Indiana University. We thank the IU and UF worm crews for 250 generations of hard labor and especially A. Houppert for screening the nascent C. briggsae genome for microsatellites by eye, and T. Keller for assistance in the laboratory. A. Cutter, E. Dolgin, M-A. Félix, and the CGC generously provided worm stocks. G. Clark, A. Cutter, D. Denver, S. Estes, J. Joyner-Matos, M. Lynch, C. Matsuba, M. L. Wayne, and the anonymous reviewers provided helpful advice and/or comments. Support was provided by an National Institutes of Health (NIH)/National Research Service Award postdoctoral fellowship to C.F.B., NIH/National Institute of General Medical Sciences award 1 R01GM072639-01A2 to C.F.B. and D.R. Denver, and University of Florida start-up funds to C.F.B.

References

  • Arndt PF, Hwa T, Petrov DA. Substantial regional variation in substitution rates in the human genome: importance of GC content, gene density, and telomere-specific effects. J Mol Evol. 2005;60:748–763. [PubMed]
  • Baer CF. Quantifying the decanalizing effects of spontaneous mutations in rhabditid nematodes. Am Nat. 2008;172:272–281. [PMC free article] [PubMed]
  • Baer CF, Miyamoto MM, Denver DR. Mutation rate variation in multicellular eukaryotes: causes and consequences. Nat Rev Genet. 2007;8:619–631. [PubMed]
  • Baer CF, Phillips N, Ostrow D, Avalos A, Blanton D, Boggs A, Keller T, Levy L, Mezerhane E. Cumulative effects of spontaneous mutations for fitness in Caenorhabditis: role of genotype, environment and stress. Genetics. 2006;174:1387–1395. [PMC free article] [PubMed]
  • Baer CF, Shaw F, Steding C, et al. (11 co-authors) Comparative evolutionary genetics of spontaneous mutations affecting fitness in rhabditid nematodes. Proc Natl Acad Sci USA. 2005;102:5785–5790. [PMC free article] [PubMed]
  • Begin M, Schoen DJ. Low impact of germline transposition on the rate of mildly deleterious mutation in Caenorhabditis elegans. Genetics. 2006;174:2129–2136. [PMC free article] [PubMed]
  • Cutter AD. Divergence times in Caenorhabditis and Drosophila inferred from direct estimates of the neutral mutation rate. Mol Biol Evol. 2008;25:778–786. [PubMed]
  • Cutter AD, Felix MA, Barriere A, Charlesworth D. Patterns of nucleotide polymorphism distinguish temperate and tropical wild isolates of Caenorhabditis briggsae. Genetics. 2006;173:2021–2031. [PMC free article] [PubMed]
  • Denver DR, Morris K, Lynch M, Thomas WK. High mutation rate and predominance of insertions in the Caenorhabditis elegans nuclear genome. Nature. 2004;430:679–682. [PubMed]
  • Dieringer D, Schlotterer C. Microsatellite Analyser (MSA): a platform independent analysis tool for large microsatellite data sets. Mol Ecol Notes. 2003;3:167–169.
  • Dolgin ES, Felix MA, Cutter AD. Hakuna Nematoda: genetic and phenotypic diversity in African isolates of Caenorhabditis elegans and C briggsae. Heredity. 2008;100:304–315. [PubMed]
  • Drake JW, Charlesworth B, Charlesworth D, Crow JF. Rates of spontaneous mutation. Genetics. 1998;148:1667–1686. [PMC free article] [PubMed]
  • Eisen JA, Hanawalt PC. A phylogenomic study of DNA repair genes, proteins, and processes. Mut Res-DNA Repair. 1999;435:171–213. [PMC free article] [PubMed]
  • Ellegren H. Microsatellites: simple sequences with complex evolution. Nat Rev Genet. 2004;5:435–445. [PubMed]
  • Frisse L. Understanding the mechanisms of microsatellite formation and mutation using the model organism Caenorhabditis elegans. 1999. Ph.D. Dissertation. University of Missouri-Kansas City.
  • Hardison RC, Roskin KM, Yang S, Diekhans M, Kent WJ, Weber R, Elnitski L, Li J, O'Connor M, Kolbe D. (17 co-authors) Covariation in frequencies of substitution, deletion, transposition, and recombination during eutherian evolution. Gen Res. 2003;13:13–26. [PMC free article] [PubMed]
  • Hartl DL, Clark AG. Principles of population genetics. Sunderland (MA): Sinauer Associates; 2007.
  • Hill WG, Robertson A. Effect of linkage on limits to artificial selection. Genet Res. 1966;8:269–294. [PubMed]
  • Howe DK, Denver DR. Muller's Ratchet and compensatory mutation in Caenorhabditis briggsae mitochondrial genome evolution. BMC Evol Biol. 2008;8:62. [PMC free article] [PubMed]
  • Jackson AL, Loeb LA. Microsatellite instability induced, by hydrogen peroxide in Escherichia coli. Mutat Res-Fundam Mol Mech Mutagen. 2000;447:187–198. [PubMed]
  • Kelkar YD, Tyekucheva S, Chiaromonte F, Makova KD. The genome-wide determinants of human and chimpanzee microsatellite evolution. Gen Res. 2008;18:30–38. [PMC free article] [PubMed]
  • Kimura M. On probability of fixation of mutant genes in a population. Genetics. 1962;47:713–719. [PMC free article] [PubMed]
  • Kimura M. Evolutionary rate at the molecular level. Nature. 1968;217:624–626. [PubMed]
  • Kimura M, Crow JF. Number of alleles that can be maintained in a finite population. Genetics. 1964;49:725–738. [PMC free article] [PubMed]
  • Kimura M, Ohta T. Distribution of allelic frequencies in a finite population under stepwise production of neutral alleles. Proc Natl Acad Sci USA. 1975;72:2761–2764. [PMC free article] [PubMed]
  • Kondrashov FA, Ogurtsov AY, Kondrashov AS. Selection in favor of nucleotides G and C diversifies evolution rates and levels of polymorphism at mammalian synonymous sites. J Theor Biol. 2006;240:616–626. [PubMed]
  • Lee DH, Esworthy RS, Chu C, Pfeifer GP, Chu FF. Mutation accumulation in the intestine and colon of mice deficient in two intracellular glutathione peroxidases. Cancer Res. 2006;66:9845–9851. [PubMed]
  • Lercher MJ, Chamary JV, Hurst LD. Genomic regionality in rates of evolution is not explained by clustering of genes of comparable expression profile. Genome Res. 2004;14:1002–1013. [PMC free article] [PubMed]
  • Lichtenauer-Kaligis EGR, van der Velde-van Dijke T, den Dulk H, van de Putte P, Giphart-Gassler M, Tasseron-de Jong JG. Genomic position influences spontaneous mutagenesis of an integrated retroviral vector containing the HPRT cDNA as target for mutagenesis. Hum Mol Genet. 1993;2:173–182. [PubMed]
  • Lynch M, Walsh B. Genetics and analysis of quantitative traits. Sunderland (MA): Sinauer; 1998.
  • Matsuoka Y, Vigouroux Y, Goodman MM, Sanchez GJ, Buckler E, Doebley J. A single domestication for maize shown by multilocus microsatellite genotyping. Proc Natl Acad Sci USA. 2002;99:6080–6084. [PMC free article] [PubMed]
  • Ostrow D, Phillips N, Avalos A, Blanton D, Boggs A, Keller T, Levy L, Rosenbloom J, Baer CF. Mutational bias for body size in Rhabditid nematodes. Genetics. 2007;176:1653–1661. [PMC free article] [PubMed]
  • Paun O, Horandl E. Evolution of hypervariable microsatellites in apomictic polyploid lineages of Ranunculus carpaticola: directional bias at dinucleotide loci. Genetics. 2006;174:387–398. [PMC free article] [PubMed]
  • Petrov DA. Evolution of genome size: new approaches to an old problem. Trends Genet. 2001;17:23–28. [PubMed]
  • Petrov DA. Mutational equilibrium model of genome size evolution. Theor Popul Biol. 2002;61:531–544. [PubMed]
  • Prendergast JGD, Campbell H, Gilbert N, Dunlop MG, Bickmore WA, Semple CAM. Chromatin structure and evolution in the human genome. BMC Evol Biol. 2007;7 [PMC free article] [PubMed]
  • Rozen S, Skaletsky HJ. Primer3 on the WWW for general users and for biologist programmers. In: Krawetz S, Misener S, editors. Bioinformatics methods and protocols: methods in molecular biology. Totowa (NJ): Humana Press; 2000. pp. 365–386. [PubMed]
  • Sabeti PC, Schaffner SF, Fry B, Lohmueller J, Varilly P, Shamovsky O, Palma A, Mikkelsen TS, Altshuler D, Lander ES. Positive natural selection in the human lineage. Science. 2006;312:1614–1620. [PubMed]
  • Schuelke M. An economic method for the fluorescent labeling of PCR fragments. Nat Biotechnol. 2000;18:233–234. [PubMed]
  • Seyfert AL, Cristescu MEA, Frisse L, Schaack S, Thomas WK, Lynch M. The rate and spectrum of microsatellite mutation in Caenorhabditis elegans and Daphnia pulex. Genetics. 2008;178:2113–2121. [PMC free article] [PubMed]
  • Sivasundar A, Hey J. Population genetics of Caenorhabditis elegans: the paradox of low polymorphism in a widespread species. Genetics. 2003;163:147–157. [PMC free article] [PubMed]
  • Sniegowski PD, Gerrish PJ, Johnson T, Shaver A. The evolution of mutation rates: separating causes from consequences. Bioessays. 2000;22:1057–1066. [PubMed]
  • Vassilieva LL, Lynch M. The rate of spontaneous mutation for life-history traits in Caenorhabditis elegans. Genetics. 1999;151:119–129. [PMC free article] [PubMed]
  • Vigouroux Y, Jaqueth JS, Matsuoka Y, Smith OS, Beavis WF, Smith JSC, Doebley J. Rate and pattern of mutation at microsatellite loci in maize. Mol Biol Evol. 2002;19:1251–1260. [PubMed]
  • Williams BD, Schrank B, Huynh C, Shownkeen R, Waterston RH. A genetic mapping system in Caenorhabditis elegans based on polymorphic sequence-tagged-sites. Genetics. 1992;131:609–624. [PMC free article] [PubMed]
  • Witherspoon DJ, Robertson HM. Neutral evolution of ten types of mariner transposons in the genomes of Caenorhabditis elegans and Caenorhabditis briggsae. J Mol Evol. 2003;56:751–769. [PubMed]
  • Wolfinger R, O'Connell M. Generalized linear mixed models: a pseudo-likelihood approach. J Stat Comput Simul. 1993;4:233–243.

Articles from Molecular Biology and Evolution are provided here courtesy of Oxford University Press
PubReader format: click here to try

Formats:

Related citations in PubMed

See reviews...See all...

Cited by other articles in PMC

See all...

Links

Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...