![]() | ![]() |
Formats:
|
||||||||||||||||||
Copyright © 2006 Ermakova et al; licensee BioMed Central Ltd. Fast rate of evolution in alternatively spliced coding regions of mammalian genes 1Department of Bioengineering and Bioinformatics, Moscow State University, Vorob'evy gory, 1-73, 119992, Moscow, Russia 2Research and Training Center "Bioinformatics", Institute for Information Transmission Problems, Russian Academy of Sciences, Bolshoi Karetny per. 19, 127994, Moscow, Russia Corresponding author.Ekaterina O Ermakova: ermakova/at/iitp.ru; Ramil N Nurtdinov: n_ramil/at/mail.ru; Mikhail S Gelfand: gelfand/at/iitp.ru Received December 21, 2005; Accepted April 18, 2006. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. This article has been cited by other articles in PMC.Abstract Background At least half of mammalian genes are alternatively spliced. Alternative isoforms are often genome-specific and it has been suggested that alternative splicing is one of the major mechanisms for generating protein diversity in the course of evolution. Another way of looking at alternative splicing is to consider sequence evolution of constitutive and alternative regions of protein-coding genes. Indeed, it turns out that constitutive and alternative regions evolve in different ways. Results A set of 3029 orthologous pairs of human and mouse alternatively spliced genes was considered. The rate of nonsynonymous substitutions (dN), the rate of synonymous substitutions (dS), and their ratio (ω = dN/dS) appear to be significantly higher in alternatively spliced coding regions compared to constitutive regions. When N-terminal, internal and C-terminal alternatives are analysed separately, C-terminal alternatives appear to make the main contribution to the observed difference. The effects become even more pronounced in a subset of fast evolving genes. Conclusion These results provide evidence of weaker purifying selection and/or stronger positive selection in alternative regions and thus one more confirmation of accelerated evolution in alternative regions. This study corroborates the theory that alternative splicing serves as a testing ground for molecular evolution. Background Alternative splicing is a major mechanism for generating functional and evolutionary diversity of proteins in mammals [1,2], for a review see [3]. Indeed, alternative splicing allows for generation of novel proteins without sacrificing old ones [2]. If a new isoform proves to be beneficial, its fraction increases by subtle regulatory changes. On the other hand, unlike gene duplication, alternative splicing does not lead to dramatic changes in protein concentrations. Moreover, it has been demonstrated that duplicated genes are rarely alternatively spliced compared to singletons [4,5]. There are good reasons to believe that some key mutational events driving evolution might reside in introns, untranslated regions (UTRs) and/or nontranscribed regulatory regions [6-8]. A large fraction of alternative splicing events occur in untranslated regions [9]. Nevertheless, most studies of molecular evolution have focused on the analysis of protein coding regions, as these data are simpler to obtain and are more amenable to functional interpretation. From this point of view, alternative regions of genes occupy an intermediate position. Alternatively spliced regions are often evolutionary young: indeed, about a half of genes in human-mouse orthologous pairs have species-specific isoforms [2,10]. In many respects, constitutive and alternative regions are organized in different ways. Alternative human splice sites are on the average weaker than constitutive ones [11]. Non-canonical GC-AG introns tend to be alternative [12]. Among human exons conserved in mouse, about 77% of alternative cassette exons are flanked on both sides by long conserved intronic sequences, compared to only 17% of the constitutive exons [13]. Overall, statistical and evolutionary features of constitutive and alternative exons are sufficiently different to provide for computational recognition of these exons [14-16]. In several early studies it has been observed that patterns of nucleotide substitutions are different in alternative and constitutive coding regions. Iida and Akashi [17] analysed 26 pairs of alternatively spliced human genes and their non-human mammalian orthologs and demonstrated that synonymous divergence was lower and the nonsynonymous divergence was higher in alternative regions compared to constitutive regions. Evidence for diversifying selection was observed in alternative regions of CD45 [18], whereas the reduced rate of synonymous substitutions in an alternative region of BRCA1 [19] was assigned to purifying selection due to exonic splicing enhancer sites [20]. Recently, lower synonymous divergence in alternative exons compared to constitutive ones was demonstrated in a large-scale study of human, chimpanzee, mouse and rat genes [21]. Here we analyze evolutionary patterns in a set of 3029 pairs of orthologous human and mouse genes. We consider all types of alternative splicing and analyze separately 5'-, internal, and 3'-regions of genes, as well as faster and slower evolving genes. Results We considered 3029 alternatively spliced human genes and their mouse orthlogs (Figure (Figure1).1
It turned out that Id(A)<Id(C), dN(A)>dN(C), and ω(A)>ω(C) for alternatively spliced genes irrespective of the rate of evolution. These results show that negative selection is weaker and/or positive selection is stronger in alternative regions and thus confirm that alternatively spliced coding regions are hotspots of molecular evolution. Unexpectedly, dN and ω rise dramatically at the C-terminal alternative regions (Figures (Figures33 The pattern of synonymous substitutions is more complicated, as it depends on the rate of evolution (Figure (Figure4).4 For control, we considered N-terminal and C-terminal constitutively spliced regions and performed similar analysis. All computed evolutionary parameters were the same as in the constitutive regions in general (data not shown). Thus the observed difference cannot be explained simply by faster evolution at gene termini. Next, we analyzed individual gene pairs. For each of 2358 genes with the total lengths of constitutive and of alternative regions both exceeding 80 bp, ω for constitutive (ωC) and alternative (ωA) regions was calculated separately. Figure Figure6a6a
Discussion Evolutionary patterns in different functional regions are known to be significantly different. Conserved genes are duplicated relatively more often [22], although shortly after duplication the evolutionary rate might increase, as the purifying selection is weaker [23,24], and the selection pattern in the two copies may be different [25]. Duret and Mouchiroud [26] observed lower nonsynonymous divergence in genes expressed in multiple tissues when compared to genes with more limited expression patterns, whereas the synonymous substitution rate was roughly the same. Similarly, Pál, Papp and Hurst [27] demonstrated that highly expressed genes tend to be more conserved then genes expressed at a lower level. Our results are consistent with these observations if one assumes that constitutive regions are expressed in more tissues and at a higher rate that alternative ones. Indeed, the former assumption holds for genes with isoforms having clear tissue specific expression pattern, whereas the latter holds for genes with all isoforms expressed ubiquitously. Young gene regions tend to evolve faster. Several studies [23-25,28] demonstrated post-duplicational relaxation of purifying selection in paralogs. Our results provided evidence of stronger positive selection and/or weaker purifying selection in alternative gene regions. One possible explanation for our observations could be that the data are contaminated by non-functional isoforms (hence relaxation of purifying selection takes place). We do not believe that to be the case for the following reasons. Firstly, these regions were conserved between human and mouse at a sufficiently high similarity level of 70% nucleotide identity. Secondly, the observed pattern of increased dN level in alternative regions was the most pronounced in 3' (C-terminal) regions, that are the most reliable as regards gene recognition and have higher EST coverage due to polyA-primed ESTs. As we considered only alternatives derived from RefSeq proteins, we could miss some alternatives and thus label a fraction of the alternative regions as constitutive. However, that could only contaminate the constitutive sample with alternative regions and thus blur the observed differences, but not create any spurious effect. Recently Xing and Lee [21] observed similar rate of non-synonymous substitutions in alternative and constitutive regions whereas the rate of synonymous substitutions was lower in alternative regions, especially in tissue-specific exons [29]. One possible explanation for that was based on the assumption that conserved alternative exons contain more candidate splicing enhancer sites than constitutive ones [15]. As such sites could be expected to be conserved, like in BRCA1 [19,20], that could lead to higher conservation of synonymous codon positions in alternative regions compared to constitutive ones. However, this explanation seems to be incorrect, since, although indeed dS is lower in splicing enhancers, the fractions of constitutive and alternative regions covered by splicing enhancers are the same [30], and if the RNA selection pressure is the same in alternative and constitutive regions, it cannot distort the measurement of ω [31]. However, this effect has not been observed in our study, and the substitution rates differ from those in [21]. As our results are consistent and statistically significant for all classes of genes (fast, medium and slow evolving) and all gene regions (N-terminal, internal, C-terminal), and do not seem to be caused by contamination, there should be other reasons for this discrepancy. One of them could be the fact that we considered all types of alternatives, as opposed to only cassette exons in other studies. We also considered short alternative regions, its skipping or inclusion might be regulated "outside". Another one could be the use of different methods to calculate the rates of evolution. We used our own implementation of the first method of Ina [32] here, as we needed a tool for very long alignments (~ 3·106 bp), whereas Xing and Lee [21,29] used a maximum likelihood method implemented in the PAML package [33]. On the other hand, we considered only RefSeq isoforms and did not distinguish between the minor and major isoform alternatives. An explanation for our finding could be that the total length of regulatory sites experiencing purifying selection is still small compared to the total length of alternative regions. The pattern of substitutions in insects is less consistent [34]: in N-terminal alternatives, the synonymous rate is higher than in constant regions, whereas in internal alternatives, there are more amino acid substitutions, similar to our observations here. Conclusion Overall, this study corroborates the idea that alternative splicing serves as a testing ground for molecular evolution. Several lines of evidence confirm this hypothesis: (i) alternatively spliced isoforms are often evolutionary young both in mammals [2,10] and in insects [35]; (ii) the rate of nonsynonymous substitutions is higher in alternative regions compared to constitutive ones (this study), (iii) constitutive exons in genes with genome-specific alternative splicing evolve faster than constitutive regions in genes with conserved structure [36] (cf. a similar observation for duplicated genes [23-25]), (iv) many young (rodent-specific, missing in human and pig as an outgroup) exons are alternatively spliced and tend to have ω>1 in the mouse-rat comparison [30], and (v) the frequency of nonsynonymous SNPs in human genes is higher in alternative regions than in constitutive regions [37]. Methods Definitions In an alternatively spliced gene, constitutive regions are defined as the ones that are always exonic and coding, and alternative regions as the ones that are either coding or spliced out. An exon can be either completely constitutive, or completely alternative, or non-coding, or consist of constitutive, alternative and non-coding regions. A local meta-alignment is a concatenate of all alignment fragments of a fixed type (for example, coding alternative regions) for one particular gene. A global meta-alignment is the concatenate of local meta-alignments of a fixed type for all genes of a fixed group (for example, for all fast-evolving genes). Data Human and mouse mRNA sequences were taken from the NCBI RefSeq database [38] and orthologs were identified and aligned as described previously [39]. Overall, 12356 pairs of orthologous human and mouse genes were considered. The data flow through the analysis pipeline is shown in Figure Figure1.1 2358 genes were selected for individual substitution rate analysis. These were the ones with both the total length of the human-mouse alignment length of the alternative regions and that of the constitutive regions exceeding 80 base pairs. Data classification We grouped genes with comparable average substitution rates and formed three bins of equal size: slow, medium-speed, and fast-evolving genes. We also considered alternative coding regions corresponding to protein N-terminal, middle, and C-terminal parts separately. Estimation of substitution rates The transitional to transversional substitution rate ratio R, as well as thenumbers of synonymous (dS) and nonsynonymous (dN) substitutions per site were estimated by the Ina method I [32]. Unlike maximum likelihood methods, it afforded considerable results for very long alignments (~ 3·106 bp) and it proved to be fast enough to allow bootstrap resampling. We used our own implementation of this method (a set of Perl scripts). Bootstrapping To evaluate the robustness of the estimates for evolutionary parameters of the global meta-alignments, we used bootstrapping to form 2000 alighments of the same length for reach global meta-alignment and estimated amino-acid identity, dN, dS, and ω = dN/dS. Authors' contributions RN provided the EDAS database. EE and RN analysed the data. MG designed the project. EE and MG wrote the paper. All authors read and approved the final manuscript. Acknowledgements We are grateful to I. King Jordan for alignments of human and mouse mRNAs and to Georgii Bazykin, Alexey Kondrashov, Dmitry Malko, Andrei Mironov, Dmitri Petrov, and Vasily Ramensky for useful discussions. This study was partially supported by grants from the Ludwig Institute of Cancer Research (CDRF RBO-1268), the Howard Hughes Medical Institute (55000309), the Russian Fund of Basic Research (04-04-49440), the Russian Academy of Sciences (program "Molecular and Cellular Biology"), and the Russian Science Support Fund. References
|
PubMed related articles
Your browsing activity is empty. Activity recording is turned off. |
|||||||||||||||||
Trends Genet. 2003 Mar; 19(3):124-8.
[Trends Genet. 2003]Nat Genet. 2003 Jun; 34(2):177-80.
[Nat Genet. 2003]Nat Genet. 2005 Jun; 37(6):588-9.
[Nat Genet. 2005]Genome Res. 2006 Feb; 16(2):182-9.
[Genome Res. 2006]Trends Biochem Sci. 2003 Feb; 28(2):91-8.
[Trends Biochem Sci. 2003]Mol Biol Evol. 2005 Apr; 22(4):1137-46.
[Mol Biol Evol. 2005]Genome Res. 1999 Dec; 9(12):1288-93.
[Genome Res. 1999]Nat Genet. 2003 Jun; 34(2):177-80.
[Nat Genet. 2003]Hum Mol Genet. 2003 Jun 1; 12(11):1313-20.
[Hum Mol Genet. 2003]Hum Mol Genet. 2002 Feb 15; 11(4):451-64.
[Hum Mol Genet. 2002]Nucleic Acids Res. 2000 Nov 1; 28(21):4364-75.
[Nucleic Acids Res. 2000]Genome Res. 2003 Jul; 13(7):1631-7.
[Genome Res. 2003]Gene. 2000 Dec 30; 261(1):93-105.
[Gene. 2000]Mol Biol Evol. 2004 Aug; 21(8):1504-11.
[Mol Biol Evol. 2004]Trends Genet. 2001 Feb; 17(2):62-5.
[Trends Genet. 2001]Trends Genet. 2001 May; 17(5):252-3.
[Trends Genet. 2001]Proc Natl Acad Sci U S A. 2005 Sep 20; 102(38):13526-31.
[Proc Natl Acad Sci U S A. 2005]Genome Biol. 2002; 3(2):RESEARCH0008.
[Genome Biol. 2002]Genome Res. 2003 Sep; 13(9):2052-8.
[Genome Res. 2003]Genome Biol. 2003; 4(9):R56.
[Genome Biol. 2003]Mol Biol Evol. 2000 Jan; 17(1):68-74.
[Mol Biol Evol. 2000]Genetics. 2001 Jun; 158(2):927-31.
[Genetics. 2001]Genome Biol. 2002; 3(2):RESEARCH0008.
[Genome Biol. 2002]Genome Biol. 2003; 4(9):R56.
[Genome Biol. 2003]BMC Evol Biol. 2004 Jul 6; 4():22.
[BMC Evol Biol. 2004]Proc Natl Acad Sci U S A. 2005 Sep 20; 102(38):13526-31.
[Proc Natl Acad Sci U S A. 2005]PLoS Genet. 2005 Sep; 1(3):e34.
[PLoS Genet. 2005]Proc Natl Acad Sci U S A. 2005 Feb 22; 102(8):2850-5.
[Proc Natl Acad Sci U S A. 2005]Trends Genet. 2001 Feb; 17(2):62-5.
[Trends Genet. 2001]Trends Genet. 2001 May; 17(5):252-3.
[Trends Genet. 2001]Proc Natl Acad Sci U S A. 2005 Sep 20; 102(38):13526-31.
[Proc Natl Acad Sci U S A. 2005]J Mol Evol. 1995 Feb; 40(2):190-226.
[J Mol Evol. 1995]PLoS Genet. 2005 Sep; 1(3):e34.
[PLoS Genet. 2005]Comput Appl Biosci. 1997 Oct; 13(5):555-6.
[Comput Appl Biosci. 1997]Nat Genet. 2003 Jun; 34(2):177-80.
[Nat Genet. 2003]Hum Mol Genet. 2003 Jun 1; 12(11):1313-20.
[Hum Mol Genet. 2003]Genome Res. 2006 Apr; 16(4):505-9.
[Genome Res. 2006]Mol Biol Evol. 2005 Nov; 22(11):2198-208.
[Mol Biol Evol. 2005]Genome Biol. 2002; 3(2):RESEARCH0008.
[Genome Biol. 2002]Genome Biol. 2001; 2(12):RESEARCH0053.
[Genome Biol. 2001]Bioinformatics. 2001 Jan; 17(1):13-5.
[Bioinformatics. 2001]J Mol Evol. 1995 Feb; 40(2):190-226.
[J Mol Evol. 1995]