• We are sorry, but NCBI web applications do not support your browser and may not function properly. More information
Logo of pnasPNASInfo for AuthorsSubscriptionsAboutThis Article
Proc Natl Acad Sci U S A. Sep 30, 1997; 94(20): 10739–10744.
PMCID: PMC23469
Evolution

Intron “sliding” and the diversity of intron positions

Abstract

Alignments of homologous genes typically reveal a great diversity of intron locations, far more than could fit comfortably in a single gene. Thus, a minority of these intron positions could be inherited from a single ancestral gene, but the larger share must be attributed to subsequent events of intron gain or intron “sliding” (movement from one position to another within a gene). Intron sliding has been argued from cases of discordant introns and from putative spatial clustering of intron positions. A list of 32 cases of discordant introns is presented here. Most of these cases are found to be artefactual. The spatial and phylogenetic distributions of intron positions from five published compilations of gene data, comprising 205 intron positions, have been examined systematically for evidence of intron sliding. The results suggest that sliding, if it occurs at all, has contributed little to the diversity of intron positions.

The Problem of Intron Position Diversity

The locations of introns in homologous genes do not always coincide, the proportion of shared intron positions decreasing with increased evolutionary distance. In early comparisons of eukaryotic protein-coding genes (e.g., ref. 1), it seemed possible to attribute all such differences to loss of introns inherited from an intron-rich ancestral gene. Such a view has become problematic, due to the greatly increased numbers of intron positions now known, and to the increasing recognition that individual intron positions typically show a restricted phylogenetic distribution indicative of a recent origin (25). If all 205 different intron positions documented in published compilations of gene data for actins (6), glyceraldehyde-3-phosphate dehydrogenase (GAPDH; ref. 7), small G proteins (3), triose-phosphate isomerase (TPI; ref. 5), and tubulins (8) are packed into hypothetical ancestral genes (with a combined length of 1,603 codons), they would break up the genes into exons with a mean length of only 23 bp and a median length of only 14 bp, with many minuscule exons (e.g., 26% would be 1–6 bp in length). If only half of these 205 introns occupy ancestral positions, this still would imply a mean ancestral exon size of just 15 codons, three times smaller than the mean exon size observed for the most intron-rich extant genomes known (9, 10).

Thus, the vast diversity of (typically phylogenetically restricted) intron positions suggests that the majority of intron locations in extant eukaryotic genes do not represent divisions present in genes of a eukaryotic common ancestor—much less the spacers between mini-genes in an even more ancient hypothetical progenitor (11). A modest proportion of intron positions could represent ancient features, as required minimally in an introns-early view (12), but most extant divisions in split genes are more recent in origin, owing to one or more processes that have operated during the divergence of eukaryotes. Two candidate processes, sliding of old introns to new positions (13, 14), and addition of introns to genes [by insertion (15) or by duplication of splice signals (16)], have been proposed and discussed in relation to several sets of intron data (2, 4, 5, 8, 1720).

Two Hypotheses of Intron Sliding

The term “sliding” and its apparent synonyms (“migration,” “frameshifting,” “shifting,” “slippage,” “displacement,” and “drift”) appear frequently in the literature on intron evolution (7, 8, 13, 2126), but the nature of this process, the evidence supporting it, its underlying molecular mechanism, and its significance for gene evolution, are often unclear. These issues can be clarified briefly as follows. The term “junctional sliding” originally was used to refer to the reassignment of a single upstream or downstream splice junction so as to produce an indel (insertion or deletion) in the encoded mRNA and protein (27). Currently, sliding and its various synonyms are used ambiguously to refer to this process of junctional sliding, as well as to the distinct phenomenon of apparent shifts of an entire intron (which do not produce an indel), referred to here as “intron sliding.” Junctional sliding is relevant here only in that it is invoked as a component process in some models of intron sliding.

In spite of suggestive evidence, intron sliding has not been demonstrated to occur. A diagnosis of intron sliding would be nearly unavoidable for a reliable case in which demonstrably homologous introns occur at slightly different positions in closely related genes. To our knowledge, no such case has yet been found. A claim of homology has been made for introns in two histone genes of Volvox carteri (28), but the introns are different in length and the sequence similarity is largely due to biased nucleotide composition: the alignment of the native sequences is not significantly better than that of the randomly scrambled sequences (R. F. Doolittle, personal communication). The difficulty is not in finding closely spaced introns, which are common, but instead may lie in detecting their homology: spliceosomal introns diverge so rapidly that sequence similarity indicative of homology quickly vanishes (e.g., ref. 3). In the absence of sequence evidence, close spacing itself has been interpreted as evidence of homology of introns. This argument takes two forms (described in more detail below), the discordant introns argument and the clustering argument, neither of which has been evaluated systematically.

Several intron sliding mechanisms have been proposed. The two most commonly invoked mechanisms, shown in Fig. Fig.11 A and B, are here referred to collectively as “DNA-based sliding.” The mechanism in Fig. Fig.11A calls for a double event of junctional sliding in which nucleotide changes alter splicing signals so as to effect balanced reassignments of the upstream and downstream splice junctions (13, 29). The mechanism in Fig. Fig.11B invokes balanced indels (26, 28). Martinez et al. (14) propose an RNA-mediated mechanism, in which a spliced intron is inserted by the splicing machinery (reverse-spliced) into a nearby site, reverse-transcribed, and incorporated into DNA by recombination. More generally, retropositional movement of introns can be expected to create the appearance of sliding to the extent that the RNA substrate is a spliced (rather than unspliced) mRNA, because the recombination event that incorporates the intron will tend to convert flanking sites to their intron-lacking states (Fig. (Fig.11C). Given this, in the comments below, we do not distinguish between the model of Martinez et al. (14) and the coupled insertion/deletion model (Fig. (Fig.11C), because the only relevant difference in their implications is that the former implies homology of closely spaced introns, a phenomenon for which no evidence currently exists.

Figure 1
Mechanisms to account for closely spaced introns. (A) Intron sliding by balanced junction reassignments. Upstream and downstream splice junctions each are reassigned by junctional sliding (curved arrows) to new positions offset in the same direction, ...

Alternatively, if introns do not slide, instances of apparent sliding would be due to separate events of loss and gain of introns, or to separate events of gain. As an explanation for a pair of closely spaced intron positions, the hypothesis of separate gain is usually a reasonable alternative to intron sliding, because usually no evidence exists that either intron position was present in a common ancestor. Even when one of two closely spaced introns appears to be ancestral (based on an outgroup comparison), an apparent slide could be due to loss of the ancestral intron followed by a separate event of gain, the separate loss and gain model (Fig. (Fig.11D).

What is the relevance of intron sliding for the origin and evolution of intron-containing genes? Based on the assumption that intron sliding is widespread, some authors advocate an introns-early view in which all (or nearly all) differences in intron positions are attributable to sliding and loss of primordial introns, with no need to invoke widespread gain of introns (8, 14, 20, 24). Others advocate an introns-late view in which sliding is insignificant, and all (or nearly all) differences in intron positions are attributable to recent gain and loss (2, 5). One of these contrasting interpretations may be correct, but the dichotomy is rhetorical in origin and does not reflect an underlying logical necessity: though an introns-late view clearly requires extensive intron gain, it is not incompatible with sliding subsequent to gain; nor is intron sliding the only available means to address the problem of intron diversity from an introns-early perspective.

The issue of intron sliding may be separated from polemics by considering two hypotheses independently of any view on the ultimate origin of introns: the strong intron sliding hypothesis would be that a substantial proportion of observed introns are shifted from their original locations (regardless of how and when those original locations were established), whereas the weak intron sliding hypothesis is merely that intron sliding has occurred, if but rarely. These proposals are addressed below, using case studies of discordant introns and an extensive set of data on the phylogenetic and spatial distribution of introns.

Case Studies of Discordant Introns

For some gene comparisons, it has been proposed that the numbers of introns per gene, or the general locations of introns, are conserved in spite of differences in the exact positions of one or more introns, which are called “discordant” or “quasi-conserved” (13, 14, 19, 20, 26, 29). For example, Brenner and Corrochano (29) report that the histidyl-tRNA synthetase genes of pufferfish and hamster each have 12 introns, exactly matching in position except for the eighth introns, which differ in position by a mere 5 bp. Similarly, Jellie et al. (26) present the intron/exon structure of a gene for a nine-domain globin in Artemia salina, noting that three of the nine domain-encoding regions lack an ancestral intron position, yet each such region has an intron at a different position. For several years, we have been cataloguing reports of discordant introns in concordant contexts as they appear in the literature and attempting to verify the physical evidence for them, resulting in a database of information on 32 cases (available from the authors).

Most of these discordances are artefacts. Table Table11 lists the 12 of 32 cases of apparently discordant introns in concordant contexts that are now known (on the basis of information described in Table Table1)1) to arise from errors in a published sequence. This list includes the discordant introns reported recently by Brenner and Corrochano (29). In an additional eight cases, apparently discordant introns with the same phase (location relative to the triplet reading frame) occur in regions of alignment ambiguity, such that alternative alignments place the introns at exactly the same position (data not shown). For any isolated case, it is impossible to judge conclusively that the putatively discordant introns are actually concordant (i.e., that one alignment is true and the other false). However, because the vast majority (95%) of other introns in these same genes are concordant, the same is likely to be true of the majority of introns in poorly aligned regions, the appearance of discordance arising from alignment errors.

Table 1
Apparently discordant introns attributable to errors

The remaining 12 cases of discordant introns, shown in Table Table2,2, do not arise from alignment ambiguities and are not known to be attributable to errors. In the absence of confirmatory sequencing and crucial evidence for the homology of introns, these cases remain open to multiple interpretations.

Table 2
Additional apparently discordant introns

Intron Sliding and Phylogeny

The phylogenetic hallmark of the mechanisms shown in Fig. Fig.11 would be a distribution in which one intron position is nested within the distribution of another, as illustrated in Fig. Fig.22A. This pattern of nesting is evident in some isolated instances in which intron sliding has been suggested (e.g., ref. 26), but its occurrence in more extensive sets of data has not been considered. For example, Liaud et al. (8) have argued that intron positions found among diverse tubulin genes fall into 16 regularly spaced clusters, as though each cluster represented the descendants of an ancestral intron that has slid, in localized fashion, to neighboring positions. Yet, the phylogenetic distribution of tubulin intron positions compiled by Liaud et al. (8) shows little sign of the nesting expected from such extensive intron sliding (Fig. (Fig.22B): nested distributions are absent from 15 of the 16 putative clusters.

Figure 2
Intron sliding and phylogeny. (A) Hypothetical examples of nested and nonnested distributions. (Left) A phylogenetic tree. (Center) Maps of intron positions. (Right) A table of occurrences for the six different intron positions. Brackets indicate pairs ...

To evaluate phylogenetic evidence for intron sliding more systematically, the phylogenetic distributions of introns for the five sets of intron data discussed above were examined. [Nexus files combining intron data and phylogenetic trees taken from analyses of the corresponding protein sequences (3, 5, 3335) are available from the authors.]

Of the total set of 205 intron positions, 157 (76.6%) show a distribution that is consistent with a single origin followed by faithful inheritance (i.e., no events of loss); 24 positions (11.7%) show a distribution consistent with a single origin followed by 1–3 apparent events of loss; and 24 positions (11.7%) exhibit various complex patterns suggesting multiple (≥2) origins or many (≥4) losses. Note that an event of origin may be an event of either sliding or gain, and that an event of loss may represent either actual loss or sliding to a different position. Thus, the intron positions with patchy distributions are candidates for ancestral introns that may have slid elsewhere, to closely spaced positions that can be identified readily by a pattern of phylogenetic nesting.

In the five sets of intron data, 40 pairs of intron positions are closely spaced (1–30 bp apart) and show a nested phylogenetic distribution (a database describing the 40 cases of closely spaced nested pairs is available from the authors). For 25 of the 40 nested pairs, both introns are absent in an ingroup so as to suggest (as explained in Fig. Fig.22A) the model of separate loss and gain (Fig. (Fig.11D). Nevertheless, because such patterns also might arise by intron sliding (or coupled insertion/deletion; Fig. Fig.11 AC) followed by separate loss events, it is not possible to draw a conclusion from these numbers in the absence of a reference standard or an exact quantitative model.

A standard of comparison can be established by considering that events of sliding (and, to a lesser degree, coupled insertion/deletion) are spatially localized, whereas separate events of loss and gain are not. Closely spaced, phylogenetically nested pairs of introns may be due to either cause, whereas distantly spaced pairs must be attributed to whatever nonsliding processes are operative (inheritance of ancestral introns, and separate events of gain and loss). If nested distributions are mainly due to localized intron sliding, then with increasing distance between introns, nested phylogenetic distributions will be rare; but if sliding never occurs, one expects similar phylogenetic patterns regardless of the distance between the introns. To test these implications, nested pairs of introns 31–60 bp apart and 61–90 bp apart were identified by the same criteria used to identify the closely spaced (1–30 bp) cases.

The comparison between these three subsets yields a simple result. The numbers of cases (40, 44, and 27 for the short, medium, and long distances, respectively) are not significantly different (χ2 = 4.3, two degrees of freedom, P > 0.05), indicating that separate loss and gain is sufficient to explain the observed number of closely spaced nested pairs. Furthermore, the proportions of cases in which the phylogenetic distribution of the introns suggests an intron-lacking intermediate (25 of 40, 28 of 44, and 13 of 27, for the short, medium, and long classes, respectively) are not significantly different (χ2 = 1.9 for the 3 × 2 contingency test, two degrees of freedom, P > 0.05). Closely spaced nested pairs of introns, which represent the best candidates for intron sliding, are neither more frequent nor more suggestive of sliding (as opposed to separate loss and gain) than more distantly spaced pairs, which are not candidates for intron sliding. Thus, a phylogenetic signal that might justify invoking a special process to explain closely spaced introns is not detected.

Intron Sliding and the Spatial Distribution of Intron Positions

Apparent clustering and excess closeness have been mentioned in regard to gene data for tubulins (8), as mentioned above, and also for TPI (5, 25, 36), GAPDH (4, 7, 19, 20), and globins (17, 18, 24, 37). However, a nonrandom degree of closeness or clustering has not been demonstrated.

The spatial distributions of intron positions for the five sets of data discussed previously, along with sample random distributions, are shown in Fig. Fig.3.3. The observed distributions do not seem to exhibit clusters more prominent or more regularly spaced than those generated at random. The possibility of excess closeness can be evaluated more rigorously by comparing the nearest-neighbor distances to an exponential distribution (5), and the possibility of clustering (under-dispersion) can be evaluated by applying the covariance test of Goss and Lewontin (38).

Figure 3
Spatial distribution of intron positions for five genes. Positions of introns are shown as vertical hatches on a horizontal bar (the scale at the bottom is in bp). Beneath each observed distribution are two random distributions (drawn at random, without ...

Using the exponential test and the covariance test, we find no significant deviation from randomness for four of the five data sets, and a significant deviation for both tests for the case of tubulin (P < 0.05 for the Kolmogorov–Smirnov exponential test; P < 0.005 for the Goss–Lewontin covariance test). This deviation appears to be due, not to regularly spaced clusters, but to a bias in intron density: half of the tubulin introns map to the first 20% of the gene (Fig. (Fig.33E). Such a bias in intron density (strongest for the tubulin data, but also seen for actin and GAPDH, Fig. Fig.3)3) is a corollary of the well known tendency for exons to be shorter toward the 5′ end of a gene (9). If the tubulin data are split into 5′ and 3′ partitions (the region containing the first 20 intron positions and the region containing the last 20) to compensate crudely for the observed bias in intron density, no significant deviation from randomness (P > 0.1) is found for either partition, for either statistical test.

A final test can be made with specific reference to DNA-based sliding, for which it is possible to predict something of the distribution of sliding distances based on (i) the implication of these models that the exonic sequences between the two slid introns are not homologous (Fig. (Fig.11 A and B); and (ii) the apparently contradictory empirical result that the aligned exonic sequences between closely spaced introns are typically so similar as to give the appearance of homology (14), as illustrated by the example shown in Fig. Fig.44A. If closely spaced intron positions result from DNA-based sliding, it must be the case that successful slides are limited to the rare instances in which, by chance, high sequence identify is achieved. Longer slides will be less likely to succeed, thus will be increasingly rare, the steepness of the drop in frequency being a function of the required sequence identity (Fig. (Fig.44B). The sequence identity in the exonic interval for the set of 40 closely spaced nested pairs of introns is 71%, thus, based on Fig. Fig.44B, the distribution of sliding distances is expected to drop precipitously, with nearly all slides being 1–5 bp.

Figure 4
Distance test for DNA-based sliding. (A) An example of the high sequence identity (in this case, 6/7 or 86%) observed for the aligned exonic sequences between closely spaced intron positions. The locations of introns are marked by *; ...

However, the observed distribution of pairwise inter-intron-position distances (Fig. (Fig.44C) is flat, not sharply decreasing. In particular, distances for the nested subset of cases (the best candidates for sliding) are no less flat than for the complete set. The conclusion that DNA-based intron sliding is negligible would seem difficult to avoid. Indeed, Martinez et al. (14) previously made what is essentially a nonquantitative version of this same argument, concluding that DNA-based intron sliding cannot account for intron positions separated by more than a few codons of highly conserved sequence.

More generally, the lack of evidence for excess closeness or for spatial clustering suggests that localized intron sliding—of any type—must be either so infrequent as to be negligible, or so rampant as to disperse clusters beyond recognition. The lack of a significant phylogenetic signal indicative of sliding favors the former alternative.

Summary

Because the introns observed in extant genes occur at far too many positions to have been present all together in a common ancestral gene, most intron positions must have arisen more recently, by intron sliding, intron gain, or both. Case studies of discordant intron positions do not resolve the question of whether intron sliding occurs, because most such cases are artefactual and the remaining cases are ambiguous in the absence of crucial evidence for the homology of the discordant introns. The weak intron sliding hypothesis (i.e., that intron sliding occurs) remains viable in the absence of clear evidence for or against it.

The strong intron sliding hypothesis has been evaluated on the basis of implications with respect to the spatial and phylogenetic distribution of intron positions, using data from five sets of genes comprising 205 distinct intron positions. The phylogenetic distributions of introns suggest that closely spaced nested pairs of introns, which are consistent with intron sliding, are no more common than expected from a comparison with distantly spaced nested pairs, which are not. The spatial distribution of intron positions reveals no sign of the excess closeness or clustering expected from sliding. These results suggest that the influence of intron sliding is negligible, intron position diversity arising primarily by the addition of introns to genes during eukaryotic evolution.

Acknowledgments

We thank B. Diaz and E. Raff for providing data on the tubulin β4 gene of Drosophila melanogaster, and L. Corrochano, H. Domdey, M. Durkin, H. Edenberg, M. Gautam, D. Hui, D. MacLennan, R. Michelmore, A. Odermatt, F. Schuren, J. Sherwood, R. Stick, U. Wewer, R. Wu, Y. Xie, and especially F. Tsui for providing corrections or confirmations of published sequence information. This work was supported by the Program in Evolutionary Biology of the Canadian Institute for Advanced Research (A.S., J.M.L., and W.F.D.), Medical Research Council Grant MT4467 (to W.F.D.), National Science Foundation Grant MCB-9318858 (to J.D.P.), and an Alfred P. Sloan Foundation/National Science Foundation Fellowship in Molecular Evolution (to J.M.L.).

Footnotes

This paper was submitted directly (Track II) to the Proceedings Office.

Abbreviations: GAPDH, glyceraldehyde-3-phosphate dehydrogenase; TPI, triose-phosphate isomerase.

References

1. Crabtree G R, Comeau C M, Fowlkes D M, Fornace A J, Malley J D, Kant J A. J Mol Biol. 1985;185:1–19. [PubMed]
2. Dibb N J, Newman A J. EMBO J. 1989;8:2015–2021. [PMC free article] [PubMed]
3. Dietmaier W, Fabry S. Curr Genet. 1994;26:497–505. [PubMed]
4. Logsdon J M, Jr, Palmer J D. Nature (London) 1994;369:526. [PubMed]
5. Logsdon J M, Jr, Tyshenko M G, Dixon C, D-, Jafari J, Walker V K, Palmer J D. Proc Natl Acad Sci USA. 1995;92:8507–8511. [PMC free article] [PubMed]
6. Weber K, Kabsch W. EMBO J. 1994;13:1280–1286. [PMC free article] [PubMed]
7. Kersanach R, Brinkmann H, Liaud M, Zhang D, Martin W, Cerff R. Nature (London) 1994;367:387–389. [PubMed]
8. Liaud M-F, Brinkmann H, Cerff R. Plant Mol Biol. 1992;18:639–651. [PubMed]
9. Smith M W. J Mol Evol. 1988;27:45–55. [PubMed]
10. Palmer J D, Logsdon J M., Jr Curr Opin Genet Dev. 1991;1:470–477. [PubMed]
11. Gilbert W. Cold Spring Harbor Symp Quant Biol. 1987;52:901–905. [PubMed]
12. Doolittle W F. Am Nat. 1987;130:915–928.
13. Rogers J. Trends Genet. 1986;12:223.
14. Martinez P, Martin W, Cerff R. J Mol Biol. 1989;208:551–565. [PubMed]
15. Cavalier-Smith T. Trends Genet. 1991;7:145–148. [PubMed]
16. Rogers J H. Trends Genet. 1989;5:213–216. [PubMed]
17. Moens L, Vanfleteren J, De Baere I, Jellie A M, Tate W, Trotman C N. FEBS Lett. 1992;312:105–109. [PubMed]
18. Pohajdak B, Dixon B. FEBS Lett. 1993;320:281–283. [PubMed]
19. Cerff R, Martin W, Brinkmann H. Nature (London) 1994;369:527–528.
20. Cerff R. In: Tracing Biological Evolution in Protein and Gene Structures. Go M, Schimmel P, editors. New York: Elsevier; 1995. pp. 205–227.
21. Holland S K, Blake C C F. BioSystems. 1987;20:181–206. [PubMed]
22. Yoshihara C M, Lee J D, Dodgson J B. Nucleic Acids Res. 1987;15:753–770. [PMC free article] [PubMed]
23. Nata K, Sugimoto T, Kohri K, Hidaka H, Hattori E, Yamamoto H, Yonekura H, Okamoto H. Gene. 1993;130:183–189. [PubMed]
24. Moens L, Vanfleteren J, De Baere I, Jellie A M, Tate W, Trotman C N A. FEBS Lett. 1993;320:284–287.
25. Gilbert W, Glynias M. Gene. 1993;135:137–144. [PubMed]
26. Jellie A M, Tate W P, Trotman C N A. J Mol Evol. 1996;42:641–647. [PubMed]
27. Craik C S, Rutter W J, Fletterick R. Science. 1983;220:1125–1129. [PubMed]
28. Muller K, Schmitt R. Nucleic Acids Res. 1988;16:4121–4136. [PMC free article] [PubMed]
29. Brenner S, Corrochano L M. Proc Natl Acad Sci USA. 1996;93:8485–8489. [PMC free article] [PubMed]
30. Hewett-Emmett D, Tashian R E. In: The Carbonic Anhydrases: Cellular Physiology and Molecular Genetics. Dodgson S J, Tashian R E, Gros G, Carter N D, editors. New York: Plenum; 1991. pp. 15–32.
31. Chinery R, Poulsom R, Cox H M. Gene. 1996;171:249–253. [PubMed]
32. Wen G, Leeb T, Reinhart B, Schmoelzl S, Brenig B. Anim Genet. 1996;27:297–304. [PubMed]
33. Baldauf S L, Palmer J D. Proc Natl Acad Sci USA. 1993;90:11558. [PMC free article] [PubMed]
34. Drouin G, de Sa M M, Zuker M. J Mol Evol. 1995;41:841–849. [PubMed]
35. Roger A. Ph.D. thesis. Halifax, Nova Scotia, Canada: Dalhousie University; 1996.
36. Stoltzfus A, Spencer D, Doolittle W F. Comput Appl Biosci. 1995;11:509–515. [PubMed]
37. Stoltzfus A, Doolittle W F. Curr Biol. 1993;3:215–217. [PubMed]
38. Goss P J E, Lewontin R C. Genetics. 1996;143:589–602. [PMC free article] [PubMed]

Articles from Proceedings of the National Academy of Sciences of the United States of America are provided here courtesy of National Academy of Sciences
PubReader format: click here to try

Formats:

Related citations in PubMed

See reviews...See all...

Cited by other articles in PMC

See all...

Links

Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...