• We are sorry, but NCBI web applications do not support your browser and may not function properly. More information
Logo of cgrKargerHomeAlertsResources
Cytogenet Genome Res. Apr 2010; 127(2-4): 112–127.
Published online Mar 8, 2010. doi:  10.1159/000295342
PMCID: PMC2872679

Dynamic Nucleotide Mutation Gradients and Control Region Usage in Squamate Reptile Mitochondrial Genomes

Abstract

Gradients of nucleotide bias and substitution rates occur in vertebrate mitochondrial genomes due to the asymmetric nature of the replication process. The evolution of these gradients has previously been studied in detail in primates, but not in other vertebrate groups. From the primate study, the strengths of these gradients are known to evolve in ways that can substantially alter the substitution process, but it is unclear how rapidly they evolve over evolutionary time or how different they may be in different lineages or groups of vertebrates. Given the importance of mitochondrial genomes in phylogenetics and molecular evolutionary research, a better understanding of how asymmetric mitochondrial substitution gradients evolve would contribute key insights into how this gradient evolution may mislead evolutionary inferences, and how it may also be incorporated into new evolutionary models. Most snake mitochondrial genomes have an additional interesting feature, 2 nearly identical control regions, which vary among different species in the extent that they are used as origins of replication. Given the expanded sampling of complete snake genomes currently available, together with 2 additional snakes sequenced in this study, we reexamined gradient strength and CR usage in alethinophidian snakes as well as several lizards that possess dual CRs. Our results suggest that nucleotide substitution gradients (and corresponding nucleotide bias) and CR usage is highly labile over the ~200 m.y. of squamate evolution, and demonstrates greater overall variability than previously shown in primates. The evidence for the existence of such gradients, and their ability to evolve rapidly and converge among unrelated species suggests that gradient dynamics could easily mislead phylogenetic and molecular evolutionary inferences, and argues strongly that these dynamics should be incorporated into phylogenetic models.

Key Words: D-loop, Genome replication, Genome structure-function, Snakes, Substitution gradients

Introduction

In vertebrate mitochondrial (mt) genomes there exists an intriguing link between genome structure and genome-wide nucleotide evolution due to the asymmetrical replication of the mitochondrial genome [Clayton, 1982; Bielawski and Gold, 2002; Faith and Pollock, 2003; Krishnan et al., 2004a; Raina et al., 2005; Jiang et al., 2007]. This asymmetry leads to gradients of mutational bias across the genome that are governed predominantly by the distance between any given nucleotide site and the origins of genome replication. Variation in this strand-asymmetric replication process appears to have contributed substantially to variation in substitution rates and patterns across the mitochondrial genome [Bielawski and Gold, 2002; Faith and Pollock, 2003; Raina et al., 2005].

The strand-asymmetric replication mechanism has been thought to expose different regions of the parental heavy strand to varying amounts of time in the single-stranded state during replication (DssH) [Tanaka and Ozawa, 1994], depending on the distances of the regions from the origins of heavy strand (OH) and light strand (OL) synthesis. There is some controversy over the classical mt genome replication mechanism based on the research of Holt and colleagues, mostly concerning the asymmetry of the process, the role of the putative origin of light strand replication, and whether the replicating DNA spends substantial amounts of time single-stranded [Yang et al., 2002; Reyes et al., 2005; Yasukawa et al., 2005]. Therefore, to take a neutral position, we refer to the time that a gene or nucleotide is predicted to spend in an asymmetric mutagenic state (TAMS), rather than the predicted duration of time that the heavy strand spends single-stranded (DssH); the calculation of these statistics are identical [Tanaka and Ozawa, 1994; Reyes et al., 1998; Faith and Pollock, 2003]. We note that the evolutionary genetic evidence is highly compatible with and provides extremely strong evidence for the classic replication model (or something like it), while there is no known mechanism by which the Holt models could have produced the asymmetric substitution patterns observed in vertebrate mitochondria.

Single-stranded DNA is particularly prone to deaminations, especially deaminations of cytosine (C) and adenine (A), which cause respective transitions to thymine (T) and guanine (G) on the heavy strand [Tanaka and Ozawa, 1994; Reyes et al., 1998; Faith and Pollock, 2003]. Since transition rates are much greater than transversion rates, these excess transitions lead to elevated G/A and T/C ratios, which accounts for most of the asymmetry in synonymous substitutions in vertebrate mt genomes [Bielawski and Gold, 1996; Rand and Kann, 1998; Reyes et al., 1998; Frank and Lobry, 1999; Faith and Pollock, 2003; Krishnan et al., 2004a, 2004b; Raina et al., 2005]. On the heavy strand, C→T substitutions occur at a very high rate [Frederico et al., 1990], and the T/C ratio quickly plateaus at low TAMS [Faith and Pollock, 2003]. The deamination of A→hypoxanthine (leading to A→G mutations after replication) occurs at a slower rate, and in vertebrates the A→G heavy strand substitutions at 4-fold and 2-fold redundant 3rd codon positions increase linearly with increasing TAMS [Faith and Pollock, 2003; Krishnan et al., 2004a]. Consequently, A→G substitutions and the resultant A/G nucleotide frequency gradient are good predictors of TAMS [Faith and Pollock, 2003; Raina et al., 2005; Jiang et al., 2007].

What is known about the evolution of this gradient in vertebrates comes from an analysis of primate mitochondria [Raina et al., 2005]. In the primates, the gradients are ancestrally weak, and have convergently evolved from weak to strong in at least 2 lineages [Raina et al., 2005]. It is notable, however, that mitochondrial genome size and gene order are highly conserved across primates (and mammals in general), whereas other vertebrate groups demonstrate more diversity of mitochondrial genome size and structure. Thus, primate gradient evolution may not be fully representative of gradient evolution in other groups with more extensive mitochondrial genome structural diversity. Further study of diverse vertebrate mitochondria would provide a more inclusive perspective on the degree to which gradients vary among vertebrate species, particularly when changes in mitochondrial genome structure and gene order occur. Although mitochondrial gene sequences are extensively used for phylogenetic inference and studies of molecular evolution, mitochondrial nucleotide gradients and their ability to change through time have never been incorporated into phylogenetic models (partly due to a lack of understanding about how such gradients evolve through time). Thus, in-depth understanding of mitochondrial gradients and their evolutionary dynamics are important for appreciating how current evolutionary inferences may be biased by not accounting for such mutational dynamics, and also how future development of nucleotide models may appropriately account for the existence and evolution of gradients.

In this study we focus on the mitochondrial genomes of squamate reptiles (snakes and lizards). In contrast to primates, squamate mitochondrial genomes possess numerous examples of structural rearrangements, including the duplication of the mitochondrial control region (CR) thought to contain the OH. In most snakes, and some lizards, the control region is duplicated and the 2 copies evolve by concerted evolution, resulting in the 2 identical or nearly identical duplicates in a given mitochondrial genome. Additionally, compared to the relatively shallow evolutionary divergence of the primate clade, the squamates in this study span a much greater depth of evolutionary divergence (~200 m.y.). Previously, we analysed the mitochondrial genomes of several snake species with dual CRs and found evidence, based on nucleotide gradients, that both CRs may act as heavy strand origins in many species, although the actual gradients themselves were never characterized and compared. Recently, the number of squamate mitochondrial genomes available has increased substantially, including many new snake species with dual CRs, and including lizard species with dual CRs.

We also analyze the diversity of mutational gradients in squamate mitochondrial genomes, including 2 new snake species added in this study, to develop a greater understanding of the evolution of these gradients in a clade of vertebrates that demonstrates substantial diversity in mitochondrial genome structure. Using this example, we address the following questions: (1) What is the overall diversity of gradients observed in squamates and how does this compare to that previously reported for the primates? (2) How rapidly do these gradients appear to evolve? (3) Could convergent evolution of gradients potentially mislead phylogenetic inference? And (4), how rapidly does the preference for one CR over the other evolve in mitochondria with dual control regions?

Material and Methods

Mitochondrial Genome Sequencing and Annotation

Complete mitochondrial genomes of 2 new snake species, Micrurus fulvius (‘eastern coral snake’; Elapidae) and Causus defilippii (‘snouted night adder’; Viperidae), were sequenced to complement existing sampling of alethinophidian snakes. Laboratory and annotation protocols follow Jiang et al. [2007], but briefly, total DNA was isolated from frozen liver tissue and the mitochondrial genomes were amplified in 6 large overlapping fragments [Jiang et al., 2007]. These fragments were cloned and the clones were sequenced by primer walking on a Beckman CEQ8000 automated sequencer. Contigs were assembled using the commercial program Sequencher, and tRNAs were identified using tRNAscan [Lowe and Eddy, 1997] and homology was verified manually based on sequence similarity and anticodon. The tRNAs were used to identify approximate boundaries of protein coding genes, control regions, and ribosomal RNAs, which were then confirmed and fine tuned by manual inspection of homology with previously annotated snake mitochondrial genomes [Slack et al., 2003]. The 2 genomes were submitted to GenBank under accession numbers GU045452 (Causus) and GU044553 (Micrurus).

Gradient Analyses

Complete mitochondrial genome sequences from squamate species available in GenBank (September, 2008) were retrieved and combined with the 2 newly sequenced snake species to form our starting data set of 51 species (table (table1).1). For analyses of gradients, the sequences and genome coordinates for all 13 mitochondrial protein-coding genes were extracted. The heavy strand origins of replication (OH) were estimated as the center of the annotated control regions (recall that many snakes have 2 control regions), and the light strand of replication (OL) was estimated as the center of the annotated OL sequence. In cases where no OL was identified in the GenBank annotation, the center of the region between the tRNA-Asn and tRNA-Trp gene annotations, the normal OL location, was used. Two additional snake species of special interest (representatives of the poorly sampled blind snakes) with nearly complete mitochondrial genomes were also included in this study (bringing the total to 53 taxa): Ramphotyphlops australis and Typhlops mirus. For these 2 species, complete mitochondrial genomes of congeneric species are known, and genome size, gene coordinates, and origin locations were estimated by aligning these incomplete genomes to the complete genomes of congeneric species, and then substituting the missing sequence data from the completed genomes to obtain estimates of genome length, and gene coordinates.

Table 1
GenBank accession numbers for species used

We implemented a slightly modified version of the MCMC approach in Raina et al. [2005] to estimate the likelihoods of the slope and intercept of the G/A ratio gradient depending on the calculated TAMS at every site. These analyses were run on 4-fold degenerate sites. The calculation of TAMS differs depending on whether CR1 or CR2 is functional, but only for the genes that are in between the 2 control regions [Jiang et al., 2007]; for the alethinophidian snakes with 2 control regions, these genes include the 2 rRNAs and ND1. The G/A ratio in the ND1 gene was used in such species to predict activity of CR1 or CR2 in initiating heavy strand replication. To do this, slope and intercept calculations were made based on the TAMS from CR1 and CR2 separately, and together as a weighted average in a Markov chain Monte Carlo (MCMC) analysis [Raina et al., 2005]. Other than the addition of the weighting parameter, all details of the Markov chain were as in Raina et al. [2005]. Relative support levels for alternative control region usage hypotheses were determined using the Akaike Information Criterion (AIC) and Akaike weights [Akaike, 1973, 1983], as in Jiang et al. [2007]. The Akaike weights for the alternative individual models provide a measure of the degree to which a control region is exclusively functional, while the weight parameter in the mixed model represents the time-averaged effect of mixed control region usage on the G/A ratios.

Quantitative Analyses of the Evolutionary Rate and Phylogenetic Consistency of the G/A Gradient

To estimate the approximate rate of change of gradient slope and intercept over time, we reconstructed the ancestral G/A gradient slope and intercepts at all nodes by treating the gradient parameter MLEs from the tip sequences as continuously-valued characters and reconstructing the most likely states by ML under a Brownian motion model. This was done using the R-module APE [Paradis et al., 2004]. We measured the relationship between changes in slope and intercept (as estimated via the Brownian motion ancestral reconstruction) versus temporal divergence (as assessed via linear regression analyses). To statistically estimate the degree of phylogenetic association of the slope and intercept estimates, we conducted randomization tests in which the slope and intercept assignments were randomly permuted among different tips to assess the correspondence of gradient parameters with the phylogeny. The distributions of likelihood scores were examined across 1000 such permutations, after maximizing the parameters of the model by ML, to estimate the significance of the phylogenetic association.

Clustering of Gradients Using Mixture Models

To identify groupings of mt genomes with similar nucleotide gradients, we applied MCMC-based mixture models, as described in Raina et al. [2005]. In brief, a Markov chain was run on all squamate mt genomes (4-fold degenerate sites only) simultaneously using a series of mixture models with different numbers of mixture classes (from 2 to a total of 20 classes in the mixtures). Genome membership in classes was determined according to full Bayesian posterior probabilities. To determine whether larger numbers of classes were justified, we considered the relative likelihood maxima between runs with different numbers of mixture classes (the likelihood ratio test), as well as the degree of mixed membership in different classes according to full Bayesian posterior probabilities [Raina et al., 2005]. Previous experience has shown that the p < 0.05 cutoff in the likelihood ratio test is a good predictor of when genomes will have mixed membership according to empirical Bayesian posteriors, whereas the 0.001 cutoff is a better predictor of when membership will be mixed according to full Bayesian posteriors [Raina et al., 2005]; interestingly, the results in this study were consistent with this observation, although there is no guarantee that this will always be so.

Phylogeny and Divergence Time Estimates

Phylogenetic inference or divergence dating, per se, is not a main goal of this study, although we did utilize a reasonable estimate of the phylogeny and divergence times to place aspects of gradient evolution into a coarse temporal and phylogenetic framework. To infer phylogenetic relationships among the species of squamates used in the gradient analysis, we analyzed the nucleotide alignments of the 12 protein-coding mitochondrial genes encoded on the heavy strand of each mt genome. These sequences were aligned based on their amino acid sequence, using ClustalX [Thompson et al., 1997], reverse-translated back to their nucleotide sequences using a perl script, and concatenated. This automated alignment was verified manually, and modified only slightly to exclude a small number of ambiguously aligned positions (mostly at the 5′ and 3′ end of genes).

We simultaneously inferred phylogenetic relationships and divergence times using Beast 1.4.8 [Drummond and Rambaut, 2007]. Previous studies have shown evidence of strong selection and molecular convergence in the mitochondrial genome that can lead to misleading estimates of phylogenetic relationships [Castoe et al., 2008, 2009]. Because of this potential problem we constrained several nodes that represent a better consensus of the phylogenetic relationships among squamates [Vidal and Hedges, 2005; Castoe et al., 2009]. Thus, we constrained the monophyly of iguania (iguanids, chamaeleonids and agamids), the monophyly of Toxicofera [Vidal and Hedges, 2005] and the monophyly of Scolecophidia. We partitioned the entire dataset by gene and codon position and implemented a different GTRΓI model for each partition and unlinked parameters estimation across partitions. We used the relaxed clock method assuming uncorrelated lognormal rates among branches and a birth-death process of speciation. For the treeModel.rootHeight parameter we used a normal distribution prior with mean = 250 and SD = 15 based on the suggested origin of squamates (online supplementary table S1, www.karger.com/doi/10.1159/000295342). We used 5 fossil cali-brations across the entire tree and implemented lognormal priors with SD = 0.2 for all of them (online supplementary table S1).

We initiated 4 independent runs in Beast with random starting trees, and ran each for 10 million generations. Chains were sampled every 1000 generations, and convergence and stationarity were verified by examining the ESS values for parameter estimates using the program Tracer 1.4. Based on examination of trial runs we discarded the first 3 million generations as burn-in period. The posterior probabilities for nodal support were obtained after combining the post burn-in samples from the 4 independent runs.

Results

Results of Genome Annotation and Phylogeny Estimation

The mitochondrial genome of M. fulvius contained 17,506 bp, and that of C. defilippii 17,342 bp. As expected for alethinophidian snakes, the genomes of both newly sequenced species contain dual control regions that are nearly identical to one another within each genome. In both cases, the location of the control regions follows the typical alethinophidian snake pattern, with CR1 in the standard vertebrate placement, between CYTB and 12s rRNA (excluding the intervening tRNAs), and the duplicate CR2 located between ND1 and ND2 (again excluding intervening tRNAs). In Causus, CR1 is 1,144 bp in length, whereas CR2 is 1,134 bp, missing 10 bp from the 3′ end of CR1. In Micrurus, CR1 is 1,194 bp and CR2 is 40 bp shorter, missing these last 40 bp from the 3′ end. These genomes also possess the translocated tRNALEU common to all alethinophidian snakes yet sequenced. The only notable structural difference between the 2 genomes is that Causus has its tRNAPRO located upstream of CR2 (as in other viperid species) compared to Micrurus, in which it is located upstream of CR1 (as in most colubroid species).

Our estimates of phylogeny and divergence times, based on analyses of the mitochondrial genome data are shown in figure figure1.1. We note that there is some controversy over the taxonomic classification of the fossil we use as calibration point number 1 [fig. [fig.1;1; Gao, 1997], but we observed no notable differences in divergence time estimates with or without this calibration point. Thus, regardless of the resolution of this controversy, it does not appear to have any appreciable effect on our estimates of divergence times, especially for the purposes we employ them in this study. Our estimates of divergence times are broadly consistent with other studies [Vidal and Hedges, 2005; Castoe et al., 2009; Hedges and Vidal, 2009], and place the 2 new species with other members of their respective families (Causus: Viperidae; Micrurus: Elapidae) in the tree (fig. (fig.11).

Fig. 1
Phylogenetic tree of squamate species used in this study, with divergence times for nodes estimated. This Bayesian tree was estimated using nucleotide sequences of 12 mitochondrial protein-coding genes, estimated simultaneously with divergence times. ...

Estimate of CR Usage in Dual CR Genomes

To test for mutational evidence that one CR has been preferred over another, or for dual CR usage, we applied our MCMC analysis [Raina et al., 2005] to fit alternative models of exclusive CR1 or CR2 usage, or mixed control region effect. This included analysis of dual CR mt genomes of all 17 alethinophidian snakes, as well as the analyses of 5 lizard mt genomes with dual CR that appear to be homogenized by some type of concerted evolution (table (table22).

Table 2
Results of likelihood-based hypothesis tests for CR usage in dual CR squamates

Among the dual CR lizards, 3 appear to have almost no preference among the individual CR or dual CR models: Gekko vittatus, Takydromus tachydromoides, and Varanus niloticus. In contrast, the 2 agamid lizards, Pogona vitticeps and Chlamydosaurus kingii, show a stronger preference for individual CR2 usage or dual CR models, but not strong enough to significantly reject either of the individual CR models (table (table2).2). Only in 2 snake species, Acrochordus granulatus and Python regius, was the CR1 model significantly rejected (at p < 0.05) in favor of the individual CR2 usage model. To determine the overall significance of the gradient within these models tested, we compared the likelihood of the G/A gradient models with the same model but with a slope of zero (but freely varying intercept). In the case of mt genomes with dual CRs, we used the dual CR model as the alternative hypothesis. One lizard, G. vittatus, of the 5 dual CR lizard species had significant slopes, as did 7 of the 17 alethinophidian snakes (table (table2;2; fig. fig.22).

Fig. 2
Weighted estimates of the time-averaged effect of differential control region usage in species with 2 control regions (maintained in concerted evolution) from a linear G/A gradient mixture model of dual control region usage. An asterisk indicates species ...

Since the nucleotide sequence of duplicate control regions is nearly identical within each genome, it is also reasonable to assume that both control regions are probably functional, and that a mixed CR usage model is most appropriate as a hypothetical null model. From this perspective, the weight parameter in the mixed model represents the time-averaged effect of mixed control region usage on the G/A ratios (table (table2;2; fig. fig.2).2). Based on the weighting parameter of CR usage from the dual CR model, CR2 usage was strongly preferred over CR1 as an explanation of the data in the lizard Chlamydosaurus, and the Python, Acrochordus, Deinagkistrodon, Agkistrodon, Enhydris, and Micrurus snakes. Only the Tropidophis, Cylindrophis, and Naja snakes show strong preference for CR1 (fig. (fig.2).2). The remaining lizards and snakes show intermediate preference for both control regions. In the mixed dual CR model, the average control region effect tends to mirror the preferences for one CR over another in the individual CR models. The results of the dual CR model analyses (together with the individual CR2 analyses) provide evidence that CR2 plays a functional role in replication in some and perhaps most species. The patterns of CR preference, however, appear to be highly species-specific, and change rapidly over time (fig. (fig.22).

Gradient Analysis Results from Single CR Species

The results of the G/A gradient model analyses are provided in table table3,3, along with the results of hypothesis tests for the existence of a gradient (based on a null zero slope gradient model). Considering all squamate mt genomes analysed (including the dual CR mt genomes), the null gradient model was significantly rejected in favor of the free-parameter models (based on LRTs at p < 0.05) in 17 out of the total 52 tested mt genomes. None of the gradients from the 5 scolecophidian (one CR) snakes were significant. Among the 24 lizards with a single CR, 9 were able to reject the null gradient model. There was considerable variation among genomes in both slope and intercept, and values for many pairs of species were apparently meaningfully different from one another in that they lay outside their respective 95% credible intervals.

Table 3
Results of G/A gradient analyses, including MLE and confidence intervals for slope and intercept, and the p value of the likelihood ratio test comparing the null (zero slope) model with the free-slope models

Comparisons and Phylogenetic Trends of G/A Gradients

To demonstrate the broad taxonomic trends in G/A gradient characteristics, we first plotted the MLEs of the slopes and intercepts by taxonomic grouping; gradients previously calculated for the primates [Raina et al., 2005] were included for comparative purposes (fig. (fig.3).3). The primates tend to have on average higher G/A slopes than squamate groups, although the variation observed in some squamate species far exceeds the range of the primate gradients. As a group, the lizard gradients tend to have lower slopes than the primate and alethinophidian snake gradients. Of the 4 taxonomic groupings (fig. (fig.3),3), the alethinophidian snakes contain the least overall variation and tightest clustering of gradients, followed by the primates. Although the lizard and snake mitochondrial gradients tended to have lower slopes and often lower intercepts than those estimated for the primates, several species within the squamates stand out as having particularly extreme outlying values for gradient slopes and/or intercepts. The iguana (Iguana iguana) mitochondrial genome is estimated to have an extremely high slope (MLE = 2.96), and the amphisbaenian lizard Bipes canaliculatus had an extremely high intercept (MLE = 3.57). The blind snake Leptotyphlops also has an extremely high intercept (MLE = 3.97) and slope (MLE = 1.19), especially in comparison to the other 4 scolecophidian snake mt genomes included.

Fig. 3
Plots of mitochondrial G/A gradient slope and intercept estimates, separated by taxonomic group. Slope and intercepts are shown as the maximum likelihood estimates (MLEs), with the shaded ovals representing the 95% confidence interval around the MLEs. ...

Visualization of the G/A gradient slope and intercept estimates in the context of the inferred phylogeny (and divergence timescale) provides further insight into the evolutionary dynamics of the G/A gradient characteristics (fig. (fig.4).4). Overall, the estimated intercept values are more highly variable across the tree than the slope. In terms of phylogenetic conservatism, the intercept is also less phylogenetically consistent than the slope (fig. (fig.4).4). From this visualization it is clear that substantial increases and decreases in the slope and intercept have occurred multiple times over the course of squamate evolution, and even within the more evolutionarily shallow snake clade (fig. (fig.44).

Fig. 4
Among-species variation in the G/A gradient slope and intercept presented in the context of the dated phylogeny of species analyzed. Slope and intercept values are presented as the MLE value (dot) and 95% confidence interval (lines).

To quantitatively assess how rapidly the slope and intercept evolved, and statistically evaluate how consistent changes in the G/A gradient were with the phylogeny, we analysed the relationship between evolutionary distance (in time) and change in gradient characteristics. We reconstructed the ancestral G/A gradient slope and intercepts at all nodes, and regressed the change in gradient parameters versus evolutionary divergence (in millions of years). From these estimates, we infer that the gradient slope changed on average approximately 0.0028 units per million years, whereas the intercept parameter evolved more rapidly (0.0036 units per million years; fig. fig.5).5). In each plot, there is a single major outlier observed (fig. (fig.5)5) – this outlier is the Iguana, which is extreme in both slope and intercept compared to the other squamates (e.g. fig. fig.33).

Fig. 5
Plots of inferred change in G/A gradient slope (a) and intercept (b) versus divergence time (in millions of years), based on ancestral reconstructions of gradient characteristics from a Brownian motion model of G/A gradient slope and intercept evolution ...

Although the regressions in figure figure55 clearly indicate a linear relationship between the evolution of mutation gradient parameters and evolutionary time, a substantial amount of variation remains in the data (R2 = 0.42 for slope; R2 = 0.50 for intercept). We therefore conducted a randomization test in which the slope/intercept assignments were randomly permuted among different tips to assess the correspondence of gradient parameters with the phylogeny. The observed distribution of slope assignments was more likely than 998/1000 permutations, giving a p value of 0.002 for phylogenetic association of the gradient data. The observed distribution of intercept data was also significantly associated with the phylogeny, but had a larger p value (p = 0.021).

Clustering of Gradients and Relation to Phylogeny

Based on likelihood ratio tests of significance for adding model classes to the mixture models, a total of 11 model classes were highly justified (p < 0.01; table table4).4). After 7 or 8 classes, however, the posterior assignments of mt genome gradients to a particular class became progressively less certain, and the ability to visualize the clustering of similar gradients became more difficult. Because our primary goal was to generally explore the groupings of different gradients in relation to phylogeny, we decided to show the detailed results of the 6-class mixture model because with more classes, class assignments became weaker (based on posterior probability estimates) and more difficult to meaningfully interpret. The 6-class mixture model also was the model with the highest number of classes that was significant at p < 0.001 (table (table44).

Table 4
Results of mixture model clustering of squamate mitochondrial genome G/A gradients under different numbers of classes

The clusters identified by the 6-class mixture model (fig. (fig.6)6) are readily separated, and visualized as discrete groupings that tend to separate out as layers of increasing slope and intercept. When the mixture class assignments are visualized in the context of the squamate phylogeny, a striking number of examples of distantly related lineages with similar gradients are observed (fig. (fig.7).7). These results highlight the degree to which gradient characteristics (represented by classes in this case) can converge among distantly related lineages, and also diverge substantially between some relatively closely related species.

Fig. 6
Plot of the MLE slope and intercept estimates for squamate mitochondrial G/A gradients, clustered according to the results of a 6-class mixture model. Slope MLEs that were negative (and were not significant based on the null, zero slope model) were set ...
Fig. 7
Phylogenetic distribution of G/A gradient classes, based on gradient class assignments from a 6-class mixture model, demonstrating widespread convergence of gradient characteristics across the phylogeny.

Discussion

Vertebrate mitochondrial genome sequences are important systems and heavily utilized for molecular evolutionary studies, phylogenetics, and taxonomy. A thorough understanding of nucleotide substitution gradients in vertebrate mitochondrial genomes is thus important for making advances in evolutionary model construction, accurate evolutionary inferences, and providing basic insight into the biology of the mitochondria. The results of this study provide new details on the evolution of the response of the G/A mutational gradient to the asymmetrical process of mitochondrial replication. We refer to evolution of this response as ‘gradient evolution’ and the combined slope and intercept as the ‘response curve’ [Faith and Pollock, 2003]. Building on previous work in the evolutionarily-shallow primate clade [Raina et al., 2005], our results here provide a novel perspective on gradient evolution and diversity over a much broader, evolutionarily deep scale in the squamate reptiles.

In the squamates, as with the primates, there is evidence that components of the response curve (i.e., slope and intercept) are for the most part phylogenetically consistent for closely related species and groups. In squamates, the family taxonomic level, for example, seems to often contain groupings of fairly similar gradients. We have also found, however, several extreme cases in which the response curve changed substantially between fairly close relatives; examples of this include comparisons between the extreme response curves of Iguana, Leptotyphlops, and Bipes, and their sister taxa. Despite the tendency for related species to have similar response curves, we find many examples where close relatives do not share similar response curves and sister taxa (in our tree) have response curves that do not cluster together.

As with the evolutionary variation in gradient characteristics, we found further evidence of substantial variation in the inferred usage of multiple control regions in snake and lizard species that possess dual CRs. Duplicate control regions, homogenized via concerted evolution, have evolved at least 4 times in squamates, with no evidence of being lost after their origin in any of these lineages; alethinophidian snakes, for example, have stably maintained dual CRs since they evolved in the ancestral alethinophidian lineage ~100 million years ago (fig. (fig.1).1). This is consistent with dual CRs conveying a selective functional advantage, possibly in either genome replication and/or transcriptional decoupling [Jiang et al., 2007]. Although the molecular details of how dual CRs may behave are not yet clear, our mixture model results suggest that many species may utilize both CRs to varying extents to initiate mitochondrial genome replication. In one species of lizard (Chlamydosaurus) and several lineages of snakes (Python, Acrochordus, Deinagkistrodon, Agkistrodon, Enhydris, Micrurus) there is evidence of a strong preference for the duplicate CR copy (CR2) in genome replication (fig. (fig.2),2), implying that this duplicate copy does play an important functional role in at least some species.

To an even greater extent than previously documented for the primates [Raina et al., 2005], we observed substantial convergence in gradient characteristics between distantly related lineages of squamate reptiles. Since changes in equilibrium base frequencies are the necessary outcome of evolution of the mutation spectrum, and because evolution of base frequencies can dramatically mislead phylogenetic analyses, these results may explain some difficulties encountered inferring phylogenies of squamates using mt genomic data. In addition to interfering with phylogenetic inference, other effects of these gradients and their evolution that should be considered are the potential effect they have had on amino acid substitutions, whether they can be incorporated into codon-based models, and whether they substantially affect our ability to detect selection and adaptation in mitochondria using synonymous versus nonsynonymous substitution ratios. Gradients, and their evolutionary dynamics, may also affect how synonymous and nonsynonymous ratios are used in population genetics to understand how selection affects polymorphism levels.

Evidence for mitochondrial substitution gradients is extensive, and their ability to evolve rapidly through time is now even clearer, however, accounting for these gradients in the context of phylogenetic models of DNA evolution has not been accomplished. Our results demonstrate a degree of phylogenetic conservatism in gradient evolution, while also providing many examples of the exact opposite, including rapid radical changes and widespread convergence of the response curve. Thus, there is evidence that a gradient-based phylogenetic model that allowed for modifications of the gradient across branches would be reasonable. Based on our current analysis, incorporation of a gradient evolution model directly into phylogeny-based likelihood analysis would seem necessary to obtain accurate estimates and variances for topology, divergence time, and molecular evolutionary inferences.

This study provides important baseline information about how diverse gradients may be on a broad taxonomic scale, and how rapidly aspects of the gradient may evolve through time. Our evaluations of how gradients change through time and among species provide an important first step towards developing new phylogenetic evolutionary models that would reasonably account for the existence and evolutionary dynamics of gradients. Despite their evolutionary labiality, our findings also show that there is some phylogenetic component (i.e., heritability of gradients) of gradient characteristics; collectively these results suggest gradient evolution could feasibly be incorporated into a dynamic phylogenetic model. Although challenging, the development of such models would provide a major expected increase in the power and accuracy of evolutionary inferences for mitochondrial genomic data.

Supplementary Material

Supplemental Table S1

Fossil constraints utilized in this study. All priors were implemented. The Amphisbaenian calibration point is somewhat controversial, but it does not have any appreciable effect on our estimates of divergence times.

Acknowledgements

We acknowledge the support of the National Institutes of Health (NIH; GM065612-01, GM065580-01) to D.D.P., an NIH training grant (LM009451) to T.A.C., and a National Science Foundation Collaborative Research grant to C.L.P. (DEB-0416000).

References

  • Akaike H. Information theory and an extension of the maximum likelihood principle. In: Petrov BN, Csake F, editors. Second International Symposium on Information Theory. Budapest: Akademia Kiado; 1973. p. 673.p. 681.
  • Akaike H. Information measures and model selection. Intl Stat Inst. 1983;22:277–291.
  • Bielawski JP, Gold JR. Unequal synonymous substitution rates within and between two protein-coding mitochondrial genes. Mol Biol Evol. 1996;13:889–892. [PubMed]
  • Bielawski JP, Gold JR. Mutation patterns of mitochondrial H- and L-strand DNA in closely related Cyprinid fishes. Genetics. 2002;161:1589–1597. [PMC free article] [PubMed]
  • Castoe TA, Jiang ZJ, Gu W, Wang ZO, Pollock DD. Adaptive evolution and functional redesign of core metabolic proteins in snakes. PLoS ONE. 2008;3:e2201. [PMC free article] [PubMed]
  • Castoe TA, de Koning AP, Kim HM, Gu W, Noonan BP, et al. Evidence for an ancient adaptive episode of convergent molecular evolution. Proc Natl Acad Sci USA. 2009;106:8986–8991. [PMC free article] [PubMed]
  • Clayton DA. Replication of animal mitochondrial DNA. Cell. 1982;28:693–705. [PubMed]
  • Drummond AJ, Rambaut A. Beast: Bayesian evolutionary analysis by sampling trees. BMC Evol Biol. 2007;7:214. [PMC free article] [PubMed]
  • Faith JJ, Pollock DD. Likelihood analysis of asymmetrical mutation bias gradients in vertebrate mitochondrial genomes. Genetics. 2003;165:735–745. [PMC free article] [PubMed]
  • Frank AC, Lobry JR. Asymmetric substitution patterns: A review of possible underlying mutational or selective mechanisms. Gene. 1999;238:65–77. [PubMed]
  • Frederico LA, Kunkel TA, Shaw BR. A sensitive genetic assay for the detection of cytosine deamination – determination of rate constants and the activation energy. Biochemistry. 1990;29:2532–2537. [PubMed]
  • Gao K. Sineoamphisbaena phylogenetic relationships discussed: reply. Can J Earth Sci. 1997;34:886–889.
  • Hedges SB, Vidal N. Lizards, snakes, and amphisbaenians (Squamata) In: Hedges SB, Kumar S, editors. The Timetree of Life. Oxford: Oxford University Press; 2009. p. 383.p. 389.
  • Jiang ZJ, Castoe TA, Austin CC, Burbrink FT, Herron MD, et al. Comparative mitochondrial genomics of snakes: Extraordinary substitution rate dynamics and functionality of the duplicate control region. BMC Evol Biol. 2007;7:123. [PMC free article] [PubMed]
  • Krishnan NM, Seligmann H, Raina SZ, Pollock DD. Detecting gradients of asymmetry in site-specific substitutions in mitochondrial genomes. DNA Cell Biol. 2004a;23:707–714. [PMC free article] [PubMed]
  • Krishnan NM, Seligmann H, Stewart CB, De Koning AP, Pollock DD. Ancestral sequence reconstruction in primate mitochondrial DNA: compositional bias and effect on functional inference. Mol Biol Evol. 2004b;21:1871–1883. [PubMed]
  • Lowe TM, Eddy SR. tRNAscan-se: A program for improved detection of transfer RNA genes in genomic sequence. Nucleic Acids Res. 1997;25:955–964. [PMC free article] [PubMed]
  • Paradis E, Claude J, Strimmer K. Ape: Analyses of Phylogenetics and Evolution in R Language. Bioinformatics. 2004;20:289–290. [PubMed]
  • Raina SZ, Faith JJ, Disotell TR, Seligmann H, Stewart CB, Pollock DD. Evolution of base-substitution gradients in primate mitochondrial genomes. Genome Res. 2005;15:665–673. [PMC free article] [PubMed]
  • Rand DM, Kann LM. Mutation and selection at silent and replacement sites in the evolution of animal mitochondrial DNA. Genetica. 1998;103:393–407. [PubMed]
  • Reyes A, Gissi C, Pesole G, Saccone C. Asymmetrical directional mutation pressure in the mitochondrial genome of mammals. Mol Biol Evol. 1998;15:957–966. [PubMed]
  • Reyes A, Yang MY, Bowmaker M, Holt IJ. Bidirectional replication initiates at sites throughout the mitochondrial genome of birds. J Biol Chem. 2005;280:3242–3250. [PubMed]
  • Slack KE, Janke A, Penny D, Arnason U. Two new avian mitochondrial genomes (penguin and goose) and a summary of bird and reptile mitogenomic features. Gene. 2003;302:43–52. [PubMed]
  • Tanaka M, Ozawa T. Strand asymmetry in human mitochondrial DNA mutations. Genomics. 1994;22:327–335. [PubMed]
  • Thompson JD, Gibson TJ, Plewniak F, Jeanmougin F, Higgins DG. The CLUSTAL_X windows interface: Flexible strategies for multiple sequence alignment aided by quality analysis tools. Nucleic Acids Res. 1997;25:4876–4882. [PMC free article] [PubMed]
  • Vidal N, Hedges SB. The phylogeny of squamate reptiles (lizards, snakes, and amphisbaenians) inferred from nine nuclear protein-coding genes. CR Biol. 2005;328:1000–1008. [PubMed]
  • Yang MY, Bowmaker M, Reyes A, Vergani L, Angeli P, et al. Biased incorporation of ribonucleotides on the mitochondrial l-strand accounts for apparent strand-asymmetric DNA replication. Cell. 2002;111:495–505. [PubMed]
  • Yasukawa T, Yang MY, Jacobs HT, Holt IJ. A bidirectional origin of replication maps to the major noncoding region of human mitochondrial DNA. Mol Cell. 2005;18:651–662. [PubMed]

Articles from Cytogenetic and Genome Research are provided here courtesy of Karger Publishers
PubReader format: click here to try

Formats:

Related citations in PubMed

See reviews...See all...

Cited by other articles in PMC

See all...

Links

Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...