![]() | ![]() |
Formats:
|
||||||||||||||||||||
Copyright © 2007 RNA Society Spliceosomal small nuclear RNA genes in 11 insect genomes 1Department of Cell Biology and Molecular Genetics, University of Maryland, College Park, Maryland 20742-5815, USA 2Center for Bioinformatics and Computational Biology, University of Maryland, College Park, Maryland 20742, USA 3Institute of Molecular Evolutionary Genetics and Department of Biology, The Pennsylvania State University, University Park, Pennsylvania 16802, USA 4Department of Computer Science and Engineering, The Pennsylvania State University, University Park, Pennsylvania 16802, USA 5These authors contributed equally to this work.
Reprint requests to: Wojciech Makalowski, 514 Mueller Laboratory, The Pennsylvania State University, University Park, PA 16802, USA; e-mail: wojtek/at/psu.edu; fax: (814) 865-9366. Received August 8, 2006; Accepted September 26, 2006. This article has been cited by other articles in PMC.Abstract The removal of introns from the primary transcripts of protein-coding genes is accomplished by the spliceosome, a large macromolecular complex of which small nuclear RNAs (snRNAs) are crucial components. Following the recent sequencing of the honeybee (Apis mellifera) genome, we used various computational methods, ranging from sequence similarity search to RNA secondary structure prediction, to search for putative snRNA genes (including their promoters) and to examine their pattern of conservation among 11 available insect genomes (A. mellifera, Tribolium castaneum, Bombyx mori, Anopheles gambiae, Aedes aegypti, and six Drosophila species). We identified candidates for all nine spliceosomal snRNA genes in all the analyzed genomes. All the species contain a similar number of snRNA genes, with the exception of A. aegypti, whose genome contains more U1, U2, and U5 genes, and A. mellifera, whose genome contains fewer U2 and U5 genes. We found that snRNA genes are generally more closely related to homologs within the same genus than to those in other genera. Promoter regions for all spliceosomal snRNA genes within each insect species share similar sequence motifs that are likely to correspond to the PSEA (proximal sequence element A), the binding site for snRNA activating protein complex, but these promoter elements vary in sequence among the five insect families surveyed here. In contrast to the other insect species investigated, Dipteran genomes are characterized by a rapid evolution (or loss) of components of the U12 spliceosome and a striking loss of U12-type introns. Keywords: snRNA, spliceosomal snRNA, secondary structure, U12 introns, honeybee, Insecta INTRODUCTION Protein-coding genes in most eukaryotes are interrupted by introns that must be removed in a process known as RNA splicing (Berget et al. 1977; Chow et al. 1977; Gilbert 1978). This complex process is carried out by the spliceosome, a large macromolecular assembly consisting of perhaps as many as 100 proteins (Jurica and Moore 2003; Hochleitner et al. 2005) and a set of five small nuclear RNAs (snRNAs) that are found complexed with proteins in the form of small nuclear ribonucleoproteins (snRNPs). Most introns are removed by the major spliceosome, which includes the U1, U2, U4, U5, and U6 snRNAs. Spliceosomal components, including the snRNAs, are highly conserved throughout eukaryotes (Mount and Salz 2000; Barbosa-Morais et al. 2006). The snRNAs are encoded by moderately repeated genes that show some variation within a species. Here we take advantage of the recently sequenced honeybee (Apis mellifera) (Honeybee Genome Sequencing Consortium 2006) and 10 other insect genomes to examine the pattern of conservation of spliceosomal snRNA genes within the class Insecta. snRNAs and snRNA variants Past work in Drosophila melanogaster has described variant U1 (Lo and Mount 1990) and U5 (Chen et al. 2005) snRNAs and their distinct patterns of expression throughout development. In each case, individual variants with distinct sequences showed unique patterns of developmental and tissue-specific expression. Similar observations have been made in pea (Hanley and Schuler 1991) and silk moth (Sierra-Montes et al. 2005). The differential expression of snRNA variants could be due to functional differences. However, differential expression could also be explained if sequence variants without functional significance are associated by chance with genes showing distinct patterns of expression driven by other factors (most likely, transcriptional regulation of the amount of snRNA produced during different developmental stages). Here we have addressed this question by examining the evolutionary stability of snRNA variants. The minor spliceosome In the case of ~0.2% of mammalian introns (Burge et al. 1998; Levine and Durbin 2001), splicing is carried out by a minor (U12) spliceosome, which contains U11, U12, U4atac, and U6atac in place of U1, U2, U4, and U6 snRNAs, respectively (Tarn and Steitz 1996). The U5 snRNP is shared between the major and minor spliceosomes (Luo et al. 1999; Schneider et al. 2002). The snRNPs that are unique to this U12 spliceosome contain, in addition to common core snRNP proteins, a number of unique proteins, most of which are homologous to proteins in the U2 spliceosome (Will et al. 2004). The U12 spliceosome was originally identified as required for the removal of introns carrying the noncanonical AT-AC terminal dinucleotides in place of GT-AG (Hall and Padgett 1994; Tarn and Steitz 1996). It has since become clear that most U12 introns carry standard GT and AG dinucleotides (Dietrich et al. 1997; Levine and Durbin 2001), but that the U12 introns can nevertheless be recognized by distinct splice signal sequences (Sharp and Burge 1997). The U12 spliceosome appears to be ancient since U12 introns are found, together with genes for components of the minor spliceosome, in plants (Wu et al. 1996), insects, and chordates (Burge et al. 1998). However, many species (including Caenorhabditis elegans and most, if not all, unicellular eukaryotes) have lost the U12 spliceosomal machinery altogether. The genome of D. melanogaster has a divergent U12 spliceosome, and was originally thought to lack U11 snRNA (Adams et al. 2000), but subsequent analysis including affinity purification revealed a highly divergent U11 snRNA (Schneider et al. 2004). We find that the divergence of the minor spliceosome appears to be limited to Diptera. snRNA promoters With the exception of U6 and U6atac, the spliceosomal snRNA genes are transcribed by RNA polymerase II and have a cap structure resembling that found on messenger RNAs but with a characteristic trimethylation of the cap guanosine. U6 and U6atac genes are transcribed by RNA polymerase III and have a distinct monomethyl phosphate cap. However, all snRNAs are abundant and are transcribed with the help of a common multisubunit transcription factor known as the snRNA activating protein complex (SNAPc) (Hernandez 2001; Lai et al. 2005) that binds to the proximal sequence element A (PSEA). We have identified conserved sequences upstream of snRNAs in each of the insect species that are likely to correspond to the PSEA. RESULTS AND DISCUSSION Putative spliceosomal snRNAs genes in the honeybee genome We searched for snRNA genes in the honeybee genome using a combination of BLAST with customized parameters and covariance model approaches (see Materials and Methods). The honeybee genome has five putative U1 snRNA genes, three U2, two U4, three U5, three U6, and one putative gene for each of the minor spliceosomal snRNAs (Table 1). A putative U1 pseudogene located on Chromosome 1 (NCBI gi 63053347, position 4437-4276) carries two mutations in the essential conserved 5′ terminus and lacks the conserved U1-specific start-site motif AAGC (which is present immediately upstream of all other putative U1 snRNA genes in the insect species examined here). However, this gene has the PSEA signal and relatively few other substitutions. We conclude that this gene was functional until recently and cannot exclude the possibility that it is still expressed and functional. In addition, there are four tandemly repeated U2 pseudogenes adjacent to the U2 snRNA gene on Chromosome 8. Each of them is missing 25 nucleotides (nt) from the 5′ end, which affects formation of the first loop in the U2 snRNA secondary structure (see Supplemental Fig. S1 at http://warta.bio.psu.edu/htt_doc/Projects/snRNA/). They also lack the PSEA signal upstream, which likely affects their transcription.
The 20 snRNA genes in A. mellifera are spread across 11 chromosomes, two of them being located, however, in contigs yet unmapped. Because of this, there is little clustering of the snRNA genes, apart from two U1 genes that lie ~1 kb apart and within 100 kb of a third U1 gene on Chromosome 16. This is in contrast to D. melanogaster, which has four clusters of snRNA genes (Mount and Salz 2000), including one with two U2, one U4, and two U5 genes spread over 6 kb of Chromosome 2L. A comparison of the number of snRNA genes among the 11 insect species analyzed (Table 1) reveals that U1, U2, and U5 genes tend to be the most abundant in each species, while U4 and U6 genes are intermediate in abundance. A. mellifera appears to be an exception to this rule, because it only has three copies for each of the U2 and U5 genes, just as many as there are U6 genes. The genes for snRNAs found in the minor spliceosome, U4atac, U6atac, U11, and U12, are single copy, with the exception of Aedes aegypti U4atac and Anopheles gambiae U11, which have two copies each. Copy number polymorphism for U1 snRNA (five to seven genes) is documented in D. melanogaster (Lo and Mount 1990), and such minor variations in gene number are plausibly common for insect snRNA genes. These relative abundances of gene numbers reflect the overall abundance of snRNAs within cells (Mount and Steitz 1981). Evolution of spliceosomal snRNA genes As mentioned above, it has been shown that different U1 and U5 variants have tissue-specific expression patterns in D. melanogaster. We investigated the relationship between different variants by phylogenetic analysis to see if the pattern of conservation is consistent with expression differentiation. We found that neither the phylogenetic tree of U1 genes nor that of U5 genes clearly supports the functional differentiation of different variants. In fact, the trees of all snRNA genes reveal a pattern that is more consistent with a concerted mode of evolution or extreme purifying selection (Piontkivska et al. 2002; Nei and Rooney 2005). This is because different variants are more similar to variants within a genus than between genera (see a neighbor joining tree of U5 genes in Fig. 1
Even though the concerted mode of evolution appears as a convenient explanation at first, especially because it applies to other RNA gene families as well, it would require clustering of the genes that are being homogenized by gene conversion or unequal crossover (Nei and Rooney 2005). In the case of the spliceosomal snRNA genes, we observe only partial clustering, mostly in the eight Dipteran species that have a small number of chromosomes as compared to the other five insect species included in this analysis. Therefore, the concerted mode of evolution can only partially explain the snRNA phylogenies. We investigated additional forces acting on these gene families by looking at their functional constraints. The case of U6 snRNA genes is particularly interesting due to the high level of conservation observed across all species. All U6 genes are 108 nt long, of which only 13 are variable sites, the rest (95 nt; 88%) being perfectly conserved across all 33 U6 genes detected. It is remarkable that all Drosophila U6 genes are identical, with the exception of the Drosophila yakuba gene Dyak|U6|84681803|81773-81666, which differs only by 2 nt (singleton mutations) in the 5′-loop region and is otherwise perfectly conserved. Had we looked only at Drosophila, where the three U6 genes are found within a 1.5-kb region, the concerted evolution scenario would have seemed highly possible, in spite of protein-coding genes CG6643 and CG13624 flanking the triplet (U6:96Aa is, in fact, located in the last intron of CG6643). This scenario is contradicted, however, by low sequence conservation outside of genes and by the lack of clustering in non-Dipteran species, possibly due to the higher fragmentation of their genomes. And yet, the four Bombyx mori U6 genes are identical to each other, as are two of the three Tribolium castaneum genes, and two of the three A. mellifera genes (in the case of both T. castaneum and A. mellifera, only the first nucleotide is different in the third copy). Out of the 13 variable sites, four more are singletons that all belong to one A. aegypti gene (Aaeg|U6|78152160|82903-82796). Twelve of the 13 variable sites are concentrated in the 18-nt segment of the 5′ end of the gene, the rest being almost invariant (one of the A. aegypti singletons is found at position 43). This conservation pattern agrees well with expected stringent functional constraints of the U6 gene: it binds Lsm proteins and base pairs with U4 (Supplemental Fig. S2 at http://warta.bio.psu.edu/htt_doc/Projects/snRNA/), U2, and the donor splice sites of introns. Based on these facts, one can say that purifying selection rather than concerted evolution is the more likely explanation for the high conservation of U6 genes. Within the spliceosome, U6 snRNA pairs with U4 snRNA (Supplemental Fig. S1 at http://warta.bio.psu.edu/htt_doc/Projects/snRNA/), which, however, does not interact as intimately with the other components of the spliceosome as U6 does. Consequently, U4 genes are subject to more relaxed functional constraints. Interestingly, the U4 phylogeny reveals a mixed evolutionary pattern. Within the genus Drosophila, U4 genes appear to be subject to divergent evolution, as the orthology relationship between genes located in regions of conserved synteny is unambiguously solved by the phylogenetic tree. This indicates that U4 genes are not homogenized by gene conversion or unequal crossover in spite of their being located on the same chromosome (2L in the case of D. melanogaster). For the rest of the species, genes are more similar within a genus than among genera, which is more consistent with a mechanism of concerted evolution. The lack of conservation outside the genic region and the multichromosomal distribution of U4 genes (e.g., A. mellifera U4 genes are located on Chromosomes 1 and 15) indicate that gene conversion or unequal crossover is unlikely to homogenize U4 genes. Instead, it could be that a common functional constraint, such as interacting with U6 genes that are highly conserved (see above), determines all genes to evolve in a concerted manner. The same mechanism could very well apply to functional elements in the promoter region, such as the PSEA (see below). In the case of the other major spliceosomal components, U1, U2, and U5, a complex of evolutionary forces appears to have generated the spectrum of different snRNA variants that we were able to detect (see U5 snRNA example in Fig. 2). Higher clustering in Diptera allows gene conversion and unequal crossover to occur. For example, nine of the 18 A. aegypti U1 genes are separated by 1–3 kb, and six of the seven U1 genes in Drosophila pseudoobscura form two three-gene clusters that are 17 kb apart. The potential of U1 for recombination events is demonstrated by a U1-mediated translocation event in Drosophila (Gonzalez et al. 2004). An increased number of U1 genes in A. aegypti (Table 1) and identification of pseudogenes for all snRNA types (data not shown) both agree with the death-and-birth model of evolution (Nei and Rooney 2005). So does duplication of genes, which can be observed in the case of head-to-head U2-U5 gene pairs found in the genus Drosophila. In the case of D. melanogaster, three such pairs are found on Chromosome 2L (two in region 38AB and one in region 34A) and one on Chromosome X (region 14B). Since no such pair appears to exist outside of the genus Drosophila, it is reasonable to believe they are the result of segmental duplication events rather than independent association of U2 and U5 genes in four genomic places. In spite of these apparent duplication events, the number of U2 and U5 genes in Drosophila is not significantly greater than in other species (Table 1), indicating that other genes were lost, in agreement with the birth-and-death model of evolution. In Drosophila we also observe conservation of certain variants across the genus. A good example is a U5 variant (U5:63BC in D. melanogaster) that has conserved the 3′ end in all Drosophila species. It is not clear, however, if the conservation of this variant is due to functional constraints or because it is located in a region that does not favor recombination events, making it impossible for its sequence to be homogenized. In conclusion, we should re-emphasize that the evolution of spliceosomal snRNA is governed by several concurrent forces. Purifying selection is certainly one major force. In Dipteran species, which have a small number of chromosomes, concerted evolution by gene conversion or unequal crossover may play an important role as well. In non-Dipteran species, where snRNA genes do not form clusters (perhaps because of the higher fragmentation of genomes), concerted evolution by recombination is likely to be extremely rare. It is thus possible that apparently concerted evolution results from coordinated changes in gene sequences as a result of selection and birth–death processes. This scenario could also explain why different species have slightly different PSEA consensi (see below), but it requires further investigations. Small nuclear RNA promoters The PSEA motifs upstream of snRNA genes are remarkably similar within species (Hernandez 2001). This is true for species as diverse as Arabidopsis thaliana, D. melanogaster, and human. We investigated the pattern of conservation of the PSEA elements within the 11 insect genomes available at the time of this study (see Materials and Methods and Fig. 3
Divergence of the minor spliceosome within Diptera The secondary structure of Drosophila U11 snRNA, a component of the minor spliceosome, was described as highly diverged from other known U11 sequences (Schneider et al. 2004), and we expected that the honeybee U11 would have a structure similar to that of Drosophila. To our surprise, we found that the structure of honeybee U11 resembles more closely that of human or plant U11 genes (Fig. 4
This observation is reinforced by another feature of the divergent Drosophila U11 snRNP. Some of the U11/U12 snRNP proteins first characterized in the human U11/U12 snRNP (Will et al. 2004) and conserved broadly through eukaryotes (Lorkovic et al. 2005) are absent from Drosophila (Schneider et al. 2004). We searched for genes encoding these U11/U12 proteins in the 11 insect genomes using translated BLAST searches (Table 2). All six Drosophila species seem to lack the 31K and 35K proteins, yet these same genes are present in the other five insect species, including the two mosquito genomes. The 25K protein is also likely to be missing specifically from the dipteran species. Homologs can be identified in Apis, Tribolium, and Bombyx but not in Drosophila or mosquito species. However, because the 25K protein is not well conserved, this particular negative result is not compelling.
U12-dependent introns To investigate whether structural divergence of the U11/U12 snRNP is coupled with a loss of U12 introns, we examined the A. mellifera and 15 other metazoan genomes for the distribution of U12 introns. Using the statistical criteria described in Materials and Methods, we found that 57 out of 103,861 A. mellifera introns belong to the minor-type category, 49 of which are of the GT-AG type. These 57 introns are present in 56 genes, as one of the genes (ENSAPMT00000031526; a member of voltage-sensitive calcium channels) contains two U12 introns (introns 1 and 15) (Supplemental Table S1 at http://warta.bio.psu.edu/htt_doc/Projects/snRNA/). This large gene family is known to harbor many U12 introns (Wu and Krainer 1999), and our preliminary analysis suggests that the two introns present in the Apis gene are well conserved in different metazoan lineages (data not shown). The list of all the honeybee genes containing U12 introns is presented in Supplemental Table S2 at http://warta.bio.psu.edu/htt_doc/Projects/snRNA/. This count is fourfold higher than the number observed in Diptera (D. melanogaster and A. gambiae), but similar to that of the basal chordate Ciona intestinalis (Table 3).
Although the number of U12 introns in vertebrates is higher, this most likely reflects extensive gene duplications in the vertebrate lineage. The honeybee U12 introns are present in 55 unique gene families. Interestingly, while 41 of these gene families have at least one U12-intron-containing homolog in a vertebrate genome, only six of them have a homolog with a U12 intron in D. melanogaster (see Table 4). While these data are subject to minor errors due to inconsistencies in the annotation of genes between species, they indicate that >63% of the honeybee U12 introns are present in vertebrates but absent from Drosophila. Altogether, these data imply that most U12 introns present in the last common ancestor of flies, bees, and vertebrates have been lost in the Dipteran lineage.
One of the most interesting metazoan introns resides between exons 2 and 3 of the prospero gene of D. melanogaster. This U12 intron contains active splice sites for a U2 spliceosome that are alternatively used, an arrangement referred to as a “twintron.” The alternative splicing of the prospero gene is temporally regulated during fly development (Scamborova et al. 2004). The U12 intron appears to be ancestral and is present in vertebrates, where it is not alternatively spliced (Oliver et al. 1993), and the U2 intron signals are not present. We examined the A. mellifera gene in order to explore the origins of the twintron arrangement. The Apis gene perfectly conserves the U12 splice sites, but not the U2 sites. Although it is possible that nonorthologous cryptic sites are used for U2 splicing in the bee (and ancestral insects), we hypothesize that the twintron arrangement is a recent development in the Dipteran lineage consistent with the switch from U12 to U2 splicing that must be occurring for other introns. Coordinated divergence of the U12 spliceosome and loss of U12-dependent introns Representative Lepidopteran (Bombyx), Coleopteran (Tribolium), and Hymenopteran (Apis) genomes all conserve genes for components of the minor spliceosome that are missing in Drosophila. In addition, the honeybee genome has many more U12 introns than do Dipteran (fly or mosquito) species, indicating that divergence of the U12 spliceosome in Diptera is associated with the loss of U12 introns. These coordinated changes indicate that U12 introns are being lost from genomes that remove them inefficiently, as has been described for Drosophila (Patel et al. 2002). For a number of other genes, we have noted that bee U12 introns are either cleanly lost or replaced by U2 introns. In the case of prospero, the twintron arrangement may represent an evolutionary intermediate where a poorly spliced U12 intron serves as an alternative to a poorly placed U2 intron. The Drosophila genome has a total of only 14 U12 introns, and the genes for three U11/U12 spliceosomal proteins have been lost. It is tempting to speculate that this state represents an intermediate on the path toward the complete loss of U12 introns observed for species such as C. elegans. MATERIALS AND METHODS Sequence data The genome assembly Amel_3.0 provided by the Human Genome Sequencing Center at Baylor College of Medicine (ftp://ftp.hgsc.bcm.tmc.edu/pub/data/Amellifera/) was used for finding and annotating the honeybee snRNA genes and their promoters. For interspecies comparison, NCBI's “BLAST with arthropoda genomes” server was used (http://www.ncbi.nlm.nih.gov/sutils/genom_table.cgi?organism=insects). At the time of this analysis (January 2006), 11 insect genomes were available: Aedes aegypti, Anopheles gambiae str. PEST*, Apis mellifera*, Bombyx mori, Drosophila melanogaster*, Drosophila persimilis, Drosophila pseudoobscura, Drosophila sechellia, Drosophila simulans, Drosophila yakuba, and Tribolium castaneum (*denotes completed genomic sequence). Annotation of the snRNA genes The annotation of the honeybee snRNA genes involved two steps. First, sequences of all snRNAs (U1, U2, U4, U4atac, U5, U6, U6atac, U11, and U12) of D. melanogaster were used as queries against the A. mellifera genome assembly 3.0. NCBI's BLAST was used with the following parameters: -r 5 -q -4 -G 10 -E 6 -W 7 -FF -X 40 -y 20 -Z 100 -e 0.1. In the next step, the nucleotide sequences around each hit were extracted and used as input for the INFERNAL software (Eddy 2002). The snRNA gene number in 11 insect genomes was estimated based on BLAST hits using Drosophila snRNAs as queries and NCBI's insect genomes server (http://www.ncbi.nlm.nih.gov/sutils/genom_table.cgi?organism=insects). The BLAST parameters used for finding Apis genes were used again here. In the case of very short hits or no hit at all, other insect snRNA sequences were used as queries. Functionality of the determined genes was assessed by the integrity of the gene and by the presence of the PSEA element at the expected distance from the transcription start site of a gene. Phylogenetic analysis We investigated the mode of evolution of spliceosomal snRNA genes by constructing a phylogenetic tree of genes specifying each of the U snRNAs found in the major spliceosome. For that purpose, we aligned all genes assessed to be functional (see above) using ClustalW (Thompson et al. 1994) and adjusted the alignment manually where necessary. In the case of U5 genes, we used the secondary structure of the second stem–loop as a guide for aligning the variable 3′ end for Drosophila sequences. The neighbor-joining method with 1000 bootstrap replicates was then used to construct the trees. snRNA secondary structure prediction snRNA secondary structures were drawn based on the INFERNAL alignment with manual adjustment when necessary. In some cases, for example, the Anopheles U11 snRNA, the structure was drawn with the aid of Mfold modeling software (http://www.bioinfo.rpi.edu/applications/mfold/old/rna/; Zuker 2003) run on fragmented sequence. Simply, the sequence was divided into known functional domains (defined by conserved segments), and domains were folded separately. Detection of proximal sequence element A (PSEA) One hundred nucleotides upstream of each snRNA gene were subjected to an initial motif search with MEME software (http://meme.sdsc.edu/meme/intro.html; Bailey and Elkan 1994). Next, every promoter sequence was analyzed individually to overcome the limitations of MEME software, given that the expected conserved motifs are short. Sequences of the promoters from genes that are believed to be functional (Table 1) were used to construct the PSEA profiles using WebLogo (Crooks et al. 2004). U12-dependent introns Intron position information for 16 metazoan genomes (Anopheles gambiae, Apis mellifera, Bos taurus, Canis familiaris, Ciona intestinalis, Drosophila melanogaster, Danio rerio, Fugu rubripes, Gallus gallus, Homo sapiens, Monodelphis domestica, Mus musculus, Pan troglodytes, Rattus norvegicus, Tetraodon nigroviridis, and Xenopus tropicalis) was downloaded from Ensembl database version 36, December 2005 (http://www.ensembl.org/index.html). Each intron was represented by 200-nt sequence windows centered at the 5′- and 3′-splice junctions, respectively. All introns containing an ATCC string at position +3 from the 5′-splice junction were selected as U12 intron candidates. Noncandidate introns with R (A or G) at +1 and G at +5 were used as a U2 intron training set. The U12 intron training set was composed of 46 U12 introns from a variety of higher eukaryotes (Burge et al. 1998). Weight matrices of the 5′-splice site and branch point site (BPS) for both U12 and U2 introns were created from the two training sets, and were used to calculate log-odds ratios for all candidates and introns in the two training sets. The mean and standard deviation of log-odds ratio values were calculated over all training sequences for the 5′-splice site and BPS, respectively, and then were used to normalize the log-odds ratios of all introns that have been evaluated. Normalized BPS scores of the training sequences were plotted, and since they followed a normal distribution, we chose a Z value of 2.31, which corresponds to a 99% confidence interval, as a threshold BPS score to classify U12 intron candidates. Two additional criteria were applied: at least one adenine must be present at the branch point site and the distance from the branch point to the 3′ junction has to be between 9 and 36 nt. Homology information of U12-containing genes was inferred based on the Ensembl gene family assignment, which is defined by the Markov clustering algorithm described by Enright et al. (2002).
ACKNOWLEDGMENTS C.-F.L. and V.G. were partially supported by the Center for Comparative Genomics and Bioinformatics, part of Penn State's Huck Institutes of the Life Sciences. S.M.M. was partially supported by NSF award 0544309. Footnotes Article published online ahead of print. Article and publication date are at http://www.rnajournal.org/cgi/doi/10.1261/rna.259207. REFERENCES
|
PubMed related articles
Your browsing activity is empty. Activity recording is turned off. |
|||||||||||||||||||
Proc Natl Acad Sci U S A. 1977 Aug; 74(8):3171-5.
[Proc Natl Acad Sci U S A. 1977]Cell. 1977 Sep; 12(1):1-8.
[Cell. 1977]Nature. 1978 Feb 9; 271(5645):501.
[Nature. 1978]Mol Cell. 2003 Jul; 12(1):5-14.
[Mol Cell. 2003]J Biol Chem. 2005 Jan 28; 280(4):2536-42.
[J Biol Chem. 2005]Nucleic Acids Res. 1990 Dec 11; 18(23):6971-9.
[Nucleic Acids Res. 1990]RNA. 2005 Oct; 11(10):1473-7.
[RNA. 2005]Nucleic Acids Res. 1991 Nov 25; 19(22):6319-25.
[Nucleic Acids Res. 1991]Gene. 2005 Jun 6; 352():127-36.
[Gene. 2005]Mol Cell. 1998 Dec; 2(6):773-85.
[Mol Cell. 1998]Nucleic Acids Res. 2001 Oct 1; 29(19):4006-13.
[Nucleic Acids Res. 2001]Cell. 1996 Mar 8; 84(5):801-11.
[Cell. 1996]RNA. 1999 Jul; 5(7):893-908.
[RNA. 1999]Mol Cell Biol. 2002 May; 22(10):3219-29.
[Mol Cell Biol. 2002]Science. 2000 Mar 24; 287(5461):2185-95.
[Science. 2000]Proc Natl Acad Sci U S A. 2004 Jun 29; 101(26):9584-9.
[Proc Natl Acad Sci U S A. 2004]J Biol Chem. 2001 Jul 20; 276(29):26733-6.
[J Biol Chem. 2001]Nucleic Acids Res. 2005; 33(20):6579-86.
[Nucleic Acids Res. 2005]J Cell Biol. 2000 Jul 24; 150(2):F37-44.
[J Cell Biol. 2000]Nucleic Acids Res. 1990 Dec 11; 18(23):6971-9.
[Nucleic Acids Res. 1990]Nucleic Acids Res. 1981 Dec 11; 9(23):6351-68.
[Nucleic Acids Res. 1981]Mol Biol Evol. 2002 May; 19(5):689-97.
[Mol Biol Evol. 2002]Annu Rev Genet. 2005; 39():121-52.
[Annu Rev Genet. 2005]Annu Rev Genet. 2005; 39():121-52.
[Annu Rev Genet. 2005]Genetics. 2004 Sep; 168(1):253-64.
[Genetics. 2004]Annu Rev Genet. 2005; 39():121-52.
[Annu Rev Genet. 2005]J Biol Chem. 2001 Jul 20; 276(29):26733-6.
[J Biol Chem. 2001]Proc Natl Acad Sci U S A. 2004 Jun 29; 101(26):9584-9.
[Proc Natl Acad Sci U S A. 2004]RNA. 2004 Jun; 10(6):929-41.
[RNA. 2004]RNA. 2005 Jul; 11(7):1095-107.
[RNA. 2005]Proc Natl Acad Sci U S A. 2004 Jun 29; 101(26):9584-9.
[Proc Natl Acad Sci U S A. 2004]Mol Cell Biol. 1999 May; 19(5):3225-36.
[Mol Cell Biol. 1999]Mol Cell Biol. 2004 Mar; 24(5):1855-69.
[Mol Cell Biol. 2004]Mech Dev. 1993 Nov; 44(1):3-16.
[Mech Dev. 1993]EMBO J. 2002 Jul 15; 21(14):3804-15.
[EMBO J. 2002]BMC Bioinformatics. 2002 Jul 2; 3():18.
[BMC Bioinformatics. 2002]Nucleic Acids Res. 1994 Nov 11; 22(22):4673-80.
[Nucleic Acids Res. 1994]Nucleic Acids Res. 2003 Jul 1; 31(13):3406-15.
[Nucleic Acids Res. 2003]Genome Res. 2004 Jun; 14(6):1188-90.
[Genome Res. 2004]Mol Cell. 1998 Dec; 2(6):773-85.
[Mol Cell. 1998]Nucleic Acids Res. 2002 Apr 1; 30(7):1575-84.
[Nucleic Acids Res. 2002]Genome Res. 2004 Jun; 14(6):1188-90.
[Genome Res. 2004]Proc Natl Acad Sci U S A. 2004 Jun 29; 101(26):9584-9.
[Proc Natl Acad Sci U S A. 2004]