![]() | ![]() |
Formats:
|
||||||||||||||||||||||||
Copyright Sammeth et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. A General Definition and Nomenclature for Alternative Splicing Events Centre de Regulació Genòmica, Barcelona, Spain Michael R. Brent, Editor Washington University, United States of America #Contributed equally. * E-mail: micha/at/sammeth.net Conceived and designed the experiments: MS SF RG. Performed the experiments: MS. Analyzed the data: MS SF RG. Contributed reagents/materials/analysis tools: MS SF. Wrote the paper: MS SF RG. Received December 26, 2007; Accepted July 1, 2008. This article has been cited by other articles in PMC.Abstract Understanding the molecular mechanisms responsible for the regulation of the transcriptome present in eukaryotic cells is one of the most challenging tasks in the postgenomic era. In this regard, alternative splicing (AS) is a key phenomenon contributing to the production of different mature transcripts from the same primary RNA sequence. As a plethora of different transcript forms is available in databases, a first step to uncover the biology that drives AS is to identify the different types of reflected splicing variation. In this work, we present a general definition of the AS event along with a notation system that involves the relative positions of the splice sites. This nomenclature univocally and dynamically assigns a specific “AS code” to every possible pattern of splicing variation. On the basis of this definition and the corresponding codes, we have developed a computational tool (AStalavista) that automatically characterizes the complete landscape of AS events in a given transcript annotation of a genome, thus providing a platform to investigate the transcriptome diversity across genes, chromosomes, and species. Our analysis reveals that a substantial part—in human more than a quarter—of the observed splicing variations are ignored in common classification pipelines. We have used AStalavista to investigate and to compare the AS landscape of different reference annotation sets in human and in other metazoan species and found that proportions of AS events change substantially depending on the annotation protocol, species-specific attributes, and coding constraints acting on the transcripts. The AStalavista system therefore provides a general framework to conduct specific studies investigating the occurrence, impact, and regulation of AS. Author Summary The genome sequence is said to be an organism's blueprint, a set of instructions driving the organism's biology. The unfolding of these instructions—the so-called genes—is initiated by the transcription of DNA into RNA molecules, which subsequently are processed before they can take their functional role. During this processing step, initially identical RNA molecules may result in different products through a process known as alternative splicing (AS). AS therefore allows for widening the diversity from the limited repertoire of genes, and it is often postulated as an explanation for the apparent paradox that complex and simple organisms resemble in their number of genes; it characterizes species, individuals, and developmental and cellular conditions. Comparing the differences of AS products between cells may help to reveal the broad molecular basis underlying phenotypic differences—for instance, between a cancer and a normal cell. An obstacle for such comparisons has been that, so far, no paradigm existed to delineate each single quantum of AS, so-called AS events. Here, we describe a possibility of exhaustively decomposing AS complements into qualitatively different groups of events and a nomenclature to unequivocally denote them. This typological catalogue of AS events along with their observed frequencies represent the AS landscape, and we propose a procedure to automatically identify such landscapes. We use it to describe the human AS landscape and to investigate how it has changed throughout evolution. Introduction Alternative splicing (AS) is a fundamental molecular process regulating eukaryotic gene expression and involved in numerous human diseases [1]–[3]. It is usually postulated as the main mechanism to augment protein diversity from a somehow limited set of protein coding genes [4]. Consequently, over the recent years various large scale studies have been undertaken aiming at the exhaustive identification and analysis of AS events (for recent reviews, see [5]–[7]). Current estimations claim around 60–75% of human multi-exonic genes to undergo AS [4],[8],[9]. Surprisingly, to some extent, the rigorous formalization of the concept of AS event and its categorization has received relatively little attention. Traditionally, terms for only five kinds of AS events have been coined: exon skipping (ES), mutually exclusive exons (ME), intron retention (IR), alternative donor (AD) and acceptor (AA) sites [10]. However, currently available transcript evidence shows a plethora of variations in splicing patterns that involve multiple instances of these classical events in various combinations [11]. Figure 1
Concerning challenge (i) Malko and co-workers proposed to combine the classical terms for each exon observed in a given annotation [26]. While variations of each exon across the compared transcripts can be sufficiently described by this procedure, it does not permit an easy extension for splicing variations across the adjacent introns. However, some splicing evidence (e.g., the structures depicted in Figure 1A and 1C Addressing problem (ii), only few attempts have been undertaken to univocally denote AS events. Malko et al. [26] proposed strings composed of 5 letters identifying each classical event to redundantly describe the variability separately for each exon observed in a certain annotation (e.g., “—AD” for combined variable acceptor and donor sites, Figure 1B With respect to issue (iii), splicing graphs as a non-redundant data structure have gained popularity in AS over the recent years, but definitions vary across literature. Capturing the 5′→3′ directionality of transcription, they naturally all form directed acyclic graphs (DAGs). Going back to [33], matching (parts of) ESTs [22],[34],[35] have been used as nodes connected by edges representing the EST evidence, in order to cluster them and/or to allow the analysis of AS. Heber and co-workers [35] subsequently collapse (remove) vertices with indegree (i.e., the number of inedges) = outdegree (the number of outedges) = 1. Later on, two works from the same year proposed a graph structure where every vertex corresponds to a splice site and the connecting edges represent the intermediate exon/intron [36],[37], labelled according to the mRNA or EST evidence. Another kind of graph uses exons as nodes instead of splice sites [38]. Whereas intuitive for visualization, the graph structure may redundantly contain common exon flanks. Other graph-based approaches on exon–intron structures described in literature use similar techniques [25], [39]–[41]. However, all these analyses focus exclusively on the four types of traditional AS events, and thus capture only a limited fraction of the splicing variation encompassed in the transcriptome—probably a main consequence of problem (i). Indeed, without a universal definition of AS event, the retrieval of a single type of splicing variation requires to define its corresponding sub-graph pattern and to localize all occurrences of this pattern in the whole splicing graph. Consequently, a comprehensive characterization of AS needs an exhaustive set of such ad hoc patterns, which explains why usually only 4–6 types of events are considered.In this work, we propose a general definition of “AS event” and we present a novel notation based on the relative position of alternative exon boundaries to flexibly describe such events. Unlike traditional nomenclatures, this generic notation system allows the assignment of a univocal “AS code” to identify any possible variation of the exon–intron structure between two or more transcripts, and thus provides a platform for the automatic and exhaustive extraction of such variations from a dataset of annotated genes. Here, we also describe in detail the method implemented in AStalavista (Alternative Splicing transcriptional landscape visualization tool) for the dynamic characterization of AS events in splicing graphs. AStalavista is accessible as a web server at (http://genome.imim.es/astalavista) [42]. We have used AStalavista to characterize and compare the “landscape” of AS in different human reference annotations as well as in annotations of other metazoan species, i.e., chimp (Pan troglodytes), mouse (Mus musculus), rat (Rattus norvegicus), dog (Canis familiaris), cow (Bos taurus), chicken (Gallus gallus), frog (Xenopus tropicalis), zebrafish (Danio rerio), honeybee (Apis mellifera), fruitfly (Drosophila melanogaster), and worm (Caenorhabditis elegans). In contrast to previous large-scale studies, our approach focuses on splicing structure variations rather than on (sequence) attributes of alternative exons/introns [43],[44]. Results indicate that while most AS events can be assigned to a few categories, the categorization of AS events in different structures is quite complex, with a plethora of minor AS configurations. Relative frequencies of particular patterns change with respect to the corresponding annotation protocol, species-specific attributes and coding constraints of the respective locus, and we present computational studies that investigate the reasons behind these fluctuations. Results A General Definition of AS Event The concurrent and regulated molecular mechanisms of exon and intron definition are generally responsible for the splicing structure in a certain transcript variant. Although case studies for the mechanics of intron and exon recognition are given in literature [27],[28],[30], no general rule could (yet) be deduced. Therefore, neither of the mechanisms can be excluded from occurring during the splicing process and both are to be considered in a generally robust definition of AS event that is applicable to any organism without being a priori restricted to exon or intron definition. In order to allow for possible interactions of parts of the splicing machinery across all exons and introns when delimiting AS events in exon–intron variations, our definition of AS events is based on sites: given an annotation, i.e., transcript sequences aligned to the genome, we use the terminus “site” to describe genomic locations of aligned exon boundaries (Definition 1). Definition 1 (Site) A site s is an exon boundary as characterized by its genomic position pos(s) and its type type(s) to distinguish between transcription start sites (TSS) type(s) = σ, splice donors type(s) = δ, splice acceptors type(s) = α and polyadenylation sites (PAS) type(s) = ω. Each site is supported by a set of transcripts transcripts(s) that all show evidence for s in the annotated exon–intron structure.A transcript can be described by a sequence of sites, ordered by their genomic positions . A locus comprises k≥1 transcripts that align to a common genomic region (see Materials and Methods, Figure 2 C in order to make our results comparable with previous reports. However, we want to stress that pairwise comparisons do not necessarily provide the complete picture of a polymorphic splicing locus, and that the definitions presented in this work can straightforwardly be applied to the comparison of more than two (up to k) transcripts in a transcriptional locus C.
Definition 2 (Variable Site) Comparing the exon–intron structure of two transcripts {St,Su}, variable sites can be distinguished from sites that are used in both transcripts (“common sites”). A site s is said “variable” with respect to {St,Su}, if one and only one of the transcripts exhibits an exon boundary aligning at the genomic position pos(s), that is |{St,Su} ∩ transcript(s)| = 1, where |X| is the cardinality (the number of elements) of set X.Definition 2 characterizes sites of St as variable if they are missing in Su (and vice versa), regardless whether they map within the genomic region of the primary transcript of Su or not. Variable sites can thus arise either from alternative transcription initiation (e.g., sites s1 through s8 in Figure 2I–K Definition 3 (Alternative Splice Site) Comparing two transcripts , an alternative splice site s is a variable site (Definition 2) that (i) is a splice site type(s) {α,δ}, and (ii) is contained within the common genomic region of both transcripts, i.e., .Alternative splice sites consequently are a subset of variable sites and all splice sites that do not comply with Definition 3 are either used in both transcripts (common sites), or missing in some of them due to alternative TSSs and/or PASs. Note that the same site can be classified differentially with respect to the pair of compared transcripts. For instance, the sites flanking the 4th exon in the transcript NM_020553 are alternative splice sites when comparing with transcript NM_020554 (s1 and s2 in Figure 2H Definition 4 (AS event): comparing two transcripts (St,Su), an AS event delimited by the common sites (beginning) and (end) describes a sequence of variables sites satisfying the following conditions:
By this, Definition 4 delimits AS events as g consecutive variable sites—with at least one alternative splice site—between common sites of both transcripts St and Su. In Figure 2A Flexible Code for Alternative Splicing Events We propose a novel notation system to allow a complete classification of AS events. The general idea is to assign to any AS event a string-based “AS code” that describes the structure of the splicing variation in a concise and univocal manner. AS events of the same type (e.g., exon skipping) are given an identical code and thus can be classified in the same structural group. The codes are built dynamically with respect to each observed splicing variation without the requirement of an a priori defined catalogue of putative AS events. Our notation system is based on the relative position of the variable sites that are involved in the AS event and proceeds as follows: first, all the variable sites of an AS event (see Definition 4) are considered in the order of their genomic position from 5′ to 3′. The indices i N+ defined by this relative order are assigned to the corresponding variable sites . In addition, a symbol is attributed to each site depending on its type. We use the alphabet Σ = {[, ^, -,]}, where “[” denotes a TSS , “ ^” a splice donor , “-” an acceptor , and “]” a PAS . Therefore, each site is represented by a number (the relative position i) and a symbol (identifying the type). To describe one of the splicing structures resulting from an AS event, the number and the symbol of all of the sites that are used by the corresponding mRNA within the event are concatenated into a string. The digit “0” is used if the transcript does not use any variable site (for instance by skipping an exon). The AS code of the event corresponds to the concatenation of these strings, separating the descriptions of the variants by a comma. We order the strings according to the relative position of their first site. Examples are presented in Figures 1Using this notation, AS events with identical codes are structurally equivalent, e.g., all exon skipping or all alternative donor events. Moreover, a specific AS code can always be defined for any splicing variation, which guarantees the exhaustiveness of the notation system. For instance, the nonconventional events observed in Figure 1 Implementation AStalavista is a JAVA-based tool designed to extract and visualize the structural landscape of AS events as reflected by a given annotation. The input is provided in GTF format, containing the genomic coordinates of exons in the transcripts (and, optionally, the coordinates of the coding regions). AStalavista can be applied to any species for delineating the AS landscape from a whole genome annotation, or to a subset of genes composed according to custom criteria. The output depicts the AS landscape by giving a summary of all pairwise AS events grouped into structurally equal classes which are ranked according to their observed abundances. The web server [42] (http://genome.imim.es/astalavista) has been upgraded and depicts the spectrum of AS structures as described in this manuscript, including variable TSSs/PASs as pointed out by Definition 3 and Definition 4. This means that it is now possible to investigate for instance potential correlations between AS and alternative transcription initiation. Also, the number of species and reference annotations that are supported has been increased. To assess the agreement of AS events predicted according to our definition with data available from public sources, we compared the output of AStalavista for 5 well studied genes with the events classified for these in recently published or updated databases (Table 1). Since AStalavista is a method rather than a fixed database, the number of AS events that are predicted crucially depends on the transcript annotation(s) under consideration. Therefore, we conducted a first comparison of events extracted by AStalavista from mRNA annotations in Genbank [46] with the EuSplice database that is based on gene annotations. In another run, we enriched the input data by ESTs from dbEST [47] and compared the corresponding results to the EST-based databases ASD, ATD and Hollywood. In order to make the number of events in AStalavista quantitatively comparable with the number of events from public databases, we disregarded in either case AS events predicted in correlation with alternative transcription initiation or polyadenylation. Table 1 shows that AStalavista clearly finds more bona fide events in either dataset than is available from public databases.
We additionally set off to investigate the overlap of the events in a case study (Figure S2) and found that in the FOXP2 gene AStalavista (Figure S2A) finds 5 out of 6 events reported by Hollywood (Figure S2B) and 2 out of 3 events in EuSplice (Figure S2C): in one instance Hollywood marked an alternative splice donor with a very untypical sequence that is supported exclusively by 2 ESTs (Figure_S2B), and in the other case EuSplice predicted a cryptic exon based on the alignment of 2 nt in an intronic stretch which subsequently is tagged with the warning “short exon” and excluded from the analysis on splice site sequences (Figure S2C). For those AStalavista events that are not retrieved from both reference databases (8 out of 10 for EuSplice and 19 out of 24 for Hollywood), we found in total 4 cases that—although the evidence is present in the reference database—have not been reported, probably due to a limitation of the applied classification scheme. These cases are: 0,1–2^3–4^ (i.e., the skipping of two consecutive exons in events 14 and 15), 1–2^,3–4^ (the mutually exclusive exons in event 23) and 1–2^3-,4- (the skipping of an exon when an alternative downstream acceptor is used, event 24). Assessing the Landscape of AS Patterns in Human Reference Annotations We ran AStalavista on three human popular annotation datasets, namely RefSeq [12], EnsEmbl [48] and Gencode [49]. With our clustering method (see Materials and Methods), the 25,170 RefSeq transcripts clustered into 18,334 loci, the 43,102 EnsEmbl transcripts into 22,303 loci, and the 1,352 coding transcripts of Gencode into 381 loci (Table 2). The differences in the average number of coding transcripts per locus between these annotations (1.4 for RefSeq, 1.9 for EnsEmbl, and 3.6 for Gencode) reflect the differences in exhaustiveness among them. We extracted all variations of the exon–intron structures according to Definition 4. To compensate for artefacts that may occur in automatic annotation pipelines, we omitted AS events that involved introns with no canonical splice site dinucleotides (i.e., not GT/AG). Note that this filtering step consumes a considerable part of the observed running time (Table 2), since for each intron the splice site nucleotides are extracted from the genomic sequence. As expected, the observed running times reflect the number and distribution of transcripts in each input annotation and the longest run (for EnsEmbl) took a bit more than a minute (Table 2).
Next, we analyzed the transcript diversity by characterizing the AS landscapes produced by AStalavista from the different annotations (Figure 3
Obviously, there are differences in the AS landscape between the different reference annotations. This probably reflects the differences in biological data and in the annotation process: manually reviewed full-length cDNA sequences in RefSeq, automatically annotated proteins/cDNAs in EnsEmbl and manually annotated transcripts including ESTs evidence augmented by experimentally verified computational predictions in Gencode. Nevertheless, the different proportions of events agrees with previous results (e.g., [29],[37]) and their ranking is consistent across the sets, which illustrates the general consistence in the AS taxonomies reflected by these annotation systems. Particularly relevant is, in our opinion, the consistency in the AS landscape between the RefSeq and the much richer Gencode annotation. Even though Gencode contains 2.5-fold the number of alternative transcripts per locus, it includes only a marginally larger proportion of the “other” complex AS events than the conservative RefSeq, indicating that while only a fraction of the protein coding transcripts in the human genome may be currently known, the broad AS landscape characterizing the RefSeq annotation is also likely to characterize the entire human transcript complement. Differences of the AS Landscapes between 5′ UTR and CDS We have investigated the differences in the type of AS events occurring in the CDS (coding sequence) from those occurring only in the 5′ UTR (5′ untranslated region). Figure 4
The bias against AAs in 5′ UTRs can be explained by the shorter sequence span where alternate acceptor sites can appear without disrupting the downstream protein sequence. Indeed, if we consider the 5′ UTRs that contain exactly one intron (75% of the spliced 5′ UTRs), the length of the potential target for alternative upstream donor site creation, that is the first exon, is significantly larger than the length of the potential target for alternative downstream acceptor sites creation in 5′ UTR, that is from the acceptor site to the ATG codon (260 vs. 47 nucleotides on average). In order to confirm that the bias against AAs in the 5′ UTR is mainly due to constraints of the start codon, we considered in multi-intronic 5′ UTRs the AS events that do not affect the last intron. Then, the AD/AA ratio drops from factor >1.64 to factor 1.2 (30 AD events compared to 25 AA events in RefSeq). In our opinion, the remaining polarity stems from the fact that the first exon is significantly longer than the second (median 149 vs. 137, p-value ~3e-6, Kolmogorov-Smirnov-Test), probably resulting from differences in the mechanism for exon definition [27]. On the other hand, the observed asymmetry against ADs in the CDS can be explained by the propensity towards the creation of stop codons when considering alternative downstream donor sites, due to the peculiar composition of the donor site consensus sequence. As already reported in the past [53], splicing consensus sequences harbor a high content of intrinsic stop codons (shaded grey in Figure 5A and 5B
AS in Noncoding Transcripts Additional evidence of the strong effects of the protein coding constraints in shaping the AS landscape comes from the comparison of AS in protein coding and noncoding transcripts. For this comparison, the Gencode annotation is particularly appropriate: it contains many non protein-coding transcripts (2,247 vs. 1,332 coding transcripts), most of them actually occurring also in protein coding loci. In other words, protein coding loci seem to be able to encode both, protein coding and noncoding transcripts. Figure 6
Distribution of AS Events throughout Metazoan Genomes To investigate the evolution of the AS landscape, we have applied AStalavista to the annotation of 12 different metazoan genomes: human (Homo sapiens), chimp (Pan troglodytes), mouse (Mus musculus), rat (Rattus norvegicus), dog (Canis familiaris), cow (Bos taurus), chicken (Gallus gallus), frog (Xenopus tropicalis), zebrafish (Danio rerio), honeybee (Apis mellifera), fruitfly (Drosophila melanogaster), and worm (Caenorhabditis elegans). While many of the fluctuations observed are likely due to the species-specific differences in amount and quality of the transcriptional data from which the annotations have been derived, our study reveals some interesting trends, suggesting overall that AS patterns did not change gradually but rather abruptly during metazoan evolution (Figure 7
Discussion Alternative Splicing increases enormously the encoding capacity of the genome of the higher eukaryotic organisms. Its differential regulation is likely to play a substantial role in defining the phenotype of a given cell type, or cell state. We have developed a method to automatically catalogue the patterns of AS events occurring in a given gene/transcript annotation. The method (and the resulting) taxonomy relies on a precise definition of AS event. We have implemented the method in a publicly available software system, named AStalavista. As a proof of concept, the application of AStalavista to a number of popular annotations of the human genomes has revealed the existence of a plethora of AS types that are usually ignored in published analyses. Indeed, about one quarter of all AS events in these collections belong to this category. Some of these complex AS events, like double exon skipping or mutually exclusive exons, are likely to be under specific regulation. In addition, we report notable differences in the AS landscape between coding and noncoding regions and transcripts, with the landscape in coding regions being largely modelled by protein coding constraints and the landscape in noncoding transcripts suggesting a relaxation of selective constraints. Our comparison of the AS landscape across 12 metazoan genomes reveals strong differences between vertebrate and non-vertebrate genomes. We observe a higher fraction of intron retention events in invertebrates, while in contrast exon skipping and complex splicing events are more prevalent in vertebrates. While the latter could simply reflect the richer transcript data available for vertebrate, and specifically mammalian genomes, we think that the data is overall suggestive that AS is both more complex and more regulated there, an hypothesis which is compatible with recent studies, according to which there was a substantial increase in AS in the lineage leading to vertebrates, after the separation from invertebrates [55]. Our studies, which we have performed here as a proof of concept of our method, illustrate the potentiality of the AStalavista system to globally characterize the AS landscape of transcriptomes. One could think of many other scenarios—in addition to the basal characterization of the AS landscape in the genome of newly sequenced species—where the characterization of the AS landscape by our system could be of interest. For instance, the AS landscape could be compared across genes clustered in different functional classes, as defined for example by the Gene Ontology project [56], or according to their level or their pattern of expression, or to their conservation across evolution, or to the analyzed tissue or cell type, etc.—in general modulus any biologically relevant partition of the genes from a given species that one can possibly delineate. With the generalization of the new generation of high throughput sequencing instruments, our capacity of effectively surveying various transcriptomes will be greatly enhanced. Differences in such AS landscapes may help to reveal the underlying biological mechanisms responsible for specific phenotypes of the cell (for instance in cancer cells), by pinpointing general splicing de-regulation accidents leading to an alternation of the splicing patterns. One issue that may remain controversial is the grouping of transcripts into loci, within which the transcripts will be compared in order to identify the occurring AS events. Different groupings may indeed lead to different sets of AS events. Intuitively, one would expect AS to be investigated by comparing transcripts from the same gene. However, recent in-depth annotations projects have had the effect of blurring gene boundaries, up to challenging the definition of a gene [57],[58]. Also, since cases of overlapping transcripts from hitherto distinctly annotated genes are increasingly reported [59],[60], genes can no longer be regarded as isolated units of transcription. Transcription Induced Chimeras [60]–[62], i.e., genes that are fused by a transcript sharing at least one splice site with either one of them, are to be respected when investigating the phenomenon of AS. Therefore, AStalavista includes its own clustering schema in order to ensure an exhaustive detection of AS events, by pooling in a single transcriptional locus all transcripts that overlap on the same strand of the genome sequence. Using these loci instead of the native gene names, we can objectively compare AS classifications across gene sets that involve different criteria for assigning transcripts to genes. In any case, we believe that the introduction of a consistent and rigorous definition of alternative splicing event, which allows in particular a standard characterization of the AS landscape of a given transcriptome, will certainly contribute to a better understanding of the phenomenon of Alternative Splicing. Materials and Methods Datasets Annotated transcripts for RefSeq and Gencode (March 2007 freeze) have been downloaded from the UCSC genome browser (http://genome.ucsc.edu) and the annotations for 12 metazoan genomes from EnsEmbl (build 43, http://www.ensembl.org). RefSeq is a nonredundant dataset of gene annotations generated by human supervised alignments of cDNA sequences to the genome [12]. EnsEmbl is a semi-automatic annotation system relying mainly on protein-to-genome sequence alignments [48]. Gencode (http://genome.imim.es/gencode/) is based on the human supervised mapping of all available ESTs, cDNAs and protein sequences onto the Encode regions of the genome [63], which is augmented with computational predictions, and subsequently verified experimentally by RT-PCR and RACE [49]. Additional data in the comparison of metazoan genomes has been obtained from the EnsEmbl web server, containing the version 43 (February 2007) of the EnsEmbl annotation [48] for most of the species, the currently discontinued version 38 (April 2006) of the EnsEmbl annotation for A. mellifera, the FlyBase (March 2006) annotation for D. melanogaster [64], and the WormBase (May 2006) annotation for C. elegans [65]. In each annotation dataset, transcripts that align to genomic regions overlapping on the same strand are clustered into common loci. To avoid some alignment/annotation errors in the datasets, we applied a filtering step discarding all subsequently extracted AS events which contain intron(s) that do not exhibit the consensus dinucleotides GT/AG at their extremities. To assign AS events to a certain region of a gene (e.g., 5′ UTR or CDS), we required that all of the variable sites of the event are located in the respective region. Events spanning more than one region, by this, are excluded in the respective analysis. For the analysis of AS in noncoding transcripts, transcripts with an annotated reading frame have been filtered off the dataset before extracting AS events. A Graph Theoretical Approach To Extract Pairwise AS Events In this section we present the method used in AStalavista to (1) build a splicing graph from a set of transcripts mapped to the genome and (2) efficiently process this graph to extract all pairwise AS events. To infer a splicing graph (see Introduction), the first step is to retrieve the exon boundaries si from all transcripts in a locus C. To ensure that the sites of a transcript si St preserve the usual 5′→3′ directionality in the order given by pos(si), we artificially invert the genomic coordinates of sites that align to the negative strand. Therefore, splicing graphs G = (V,E) herein are directed acyclic graphs with each node s V representing nonredundantly a site of the transcripts in C. Each edge (si→sj) E corresponds to an exon (type(si) {α,σ}) or intron (otherwise) delimited by pos(si) and pos(sj) and supported by the transcripts transcripts(si) ∩ transcripts(sj)≠{}. Note that G is non-redundant, i.e., each splice site si and each exon/intron (si→sj) is stored once, regardless of the number of transcripts that support it. In order to include AS events associated with variable TSSs and PASs (Definition 4), the graph is completed by the addition of two terminal nodes: a root node root (pos(root) = −∞, type(root) = Α, transcripts(root) = C) that connects to all TSSs and a leaf node leaf (pos(leaf) = ∞, type(leaf) = Ω, transcripts(leaf) = C) that connects from all PASs, where Α and Ω are unique types to identify the root/leaf.Definition 5 (Variants) In G, variants are paths that exhibit a nonempty intersection of transcript evidence . The latter property prevents from connecting freely throughout the graph and creating “hybrid” splicing structures that are not observed in the annotation.By Definition 5, each variant represents an exonic structure that is supported by at least one transcript evidence. Lemma 1 (Subgraphs Described by Pairwise AS Events) A pairwise AS event between the transcripts {St,Su} is reflected in G by two variants that intersect exactly twice, in their start and end vertices .Proof All sites with form a variant Sp (condition of consecutiveness in Definition 4) and correspondingly do all sites with . Consequently, the corresponding vertices are connected by edges with at least one common transcript (i.e., St, respectively, Su). The paths Sp and Sq intersect in the common sites flanking the event, (Definition 4). Furthermore, because of the minimality criterion for common flanks in an AS event, G cannot contain any vertex .To exhaustively extract pairwise AS events, G has to be decomposed into all of the possible subgraphs that suffice Lemma 1. Since the graph structures described in Lemma 1 are necessary but not sufficient for all criteria of Definition 4, Sp and Sq have additionally to be checked for the presence of an alternative splice site. To this end, for each possible transcript pair (St,Su) in a locus C, AS events are retrieved by the iteration sketched in Figure 8
The algorithm proceeds as follows: In a priority queue W, all nodes si of G that are supported by at least one of the compared transcripts (St or Su) are iterated according to their genomic position pos(si), from 5′ to 3′ starting with root and ending at leaf. As by Lemma 1, the algorithm collects successively sequences Xt,u of sites alternatively used in one of the transcripts (|{St,Su} ∩ transcripts(si)| = 1) flanked by common sites (|{St,Su} ∩ transcripts(si)| = 2, intrinsic to the else condition since 1≤|{St,Su} ∩ transcripts(si)|≤2 µ si W). In order to suffice Definition 4, these sequences are additionally checked for the presence of an AS site (boolean c) before the event is added to L, the list of AS events. Because all transcript pairs (St,Su) in C are iterated, the main loop of the algorithm in Figure 8Complexity Estimation AStalavista implements the graph-theoretical approach as sketched in the previous section for extraction of pairwise AS events from a given annotation. In this approach initially time is required to build up G for each locus C by adding each site annotated in the input to V and checking a preceding exonic/intronic edge for eventual creation. Once completely constructed, G consumes memory.Making with appropriate data structures the operation {St,Su} ∩ transcripts(si) feasible in constant time and disregarding the overhead of the operations extract(), respectively, insert() in Figure 8 , where k is the number of transcript variants in C, |W| the number of nodes that are supported by one of the transcripts in St and/or Su, outdegree(si) counting the number of outgoing edges for a node si V, and L denoting the set of redundant AS events found in C. Obviously, ~k2 pairwise transcript comparisons are to be performed in a locus, for each one the nodes that describe a site of the transcripts are to be iterated and their outedges have to be checked whether they overlap with {St,Su}. Finally, all pairwise events found are to be checked for redundancy in an all-against-all comparison that costs additionally |L|2. Both quadratic factors, k2 and |L|2, grow naturally with the transcript diversity that is investigated. Reference annotations—even on the complete human genome—are computed in not much more than a minute (Table 2), but the time effort increases when including loci that are annotated extensively with mRNA/EST sequences.Table S1 The landscape of AS in different human reference annotations. Complete landscape of coding transcripts annotated in RefSeq (A), Gencode (B), and EnsEmbl (C). For each different structure, the number of events, their relative abundance (in percent) and the AS code is shown. The 1,070 AS events detected in REFSEQ correspond to 85 structural distinct classes, whereas the 4,321 events in ENSEMBL show 388 classes. (0.99 MB PDF) Click here for additional data file.(965K, pdf) Table S2 Medium exon/intron-length in 12 metazoan species. The EnsEmbl annotations for the genomes of the 12 metazoan species have been used to determine the medium exon and intron length (in nt). Introns with non-canonical splice site dinucleotides (i.e., not GT/AG) and exons that are flanked by such have been disregarded for the analysis. Based on these the median exon and intron length has been estimated, that confirms current estimates: whereas there is not much fluctuation in the median exon length, introns are substantially longer in mammals than in other vertebrates, and even shorter in invertebrates. (0.10 MB PDF) Click here for additional data file.(93K, pdf) Table S3 Attributes of the transcriptome in 12 metazoan species. For each of the 12 species under analysis, this table shows the number of loci (according to the transcript clustering described herein) and the number of transcripts in the corresponding EnsEmbl annotation. Next, the number of variations in the exon-intron structure detected by our method is reported and the subgroup of them that conforms with the requirements for an AS event (Definition 4), exhibits canonical GT/AG splice site dinucleotides and does not involve alternative transcription start/poly-adenylation sites. Finally, the average number of exons per locus that are flanked by canonical GT/AG splice sites is given with the respective standard-deviation across the genome. (0.24 MB PDF) Click here for additional data file.(238K, pdf) Figure S1 UCSC genome browser screenshots for 5 AS events. Screenshots of UCSC genome browser depicting the AS events discussed in Figure 1 (0.31 MB PDF) Click here for additional data file.(302K, pdf) Figure S2 AS events in the FOXP2 gene. Exploded assembly drawing of the AS events found by AStalavista (A), Hollywood (B), and EuSplice (C) in the FOXP2 gene. The region of events is outlined by a rectangle and double arrows indicate the pairwisely compared variants. The events are numbered consecutively and colors mark different structures: 0,1–2^ is blue (events 1–12 and 26), 0,1–2^3–4^ is purple (events 13–17), 1-,2- is red (events 18–20), 0,1–2^3–4^5–6^ is pink (event 22), 1–2^,3–4^ is electric blue (event 23), 1–2^3-,4- is orange (event 24). Hollywood shows splice donor variation (event 25) that is not found by AStalavista since it exhibits the unusual splice donor sequence AAAAT. EuSplice predicts additionally event 26, a cryptic exon that has been inferred from a 2 nt alignment of the mRNA sequence to the genome. In contrast, AStalavista finds 8 more bona fide events with mRNA support than EuSplice and 19 more events in ESTs than Hollywood. (0.26 MB PDF) Click here for additional data file.(255K, pdf) Figure S3 Formed by AS events overlapping the 5′ UTR/CDS. Pie diagrams depicting the landscape of AS events in the RefSeq annotation that are overlapping the respective 5′UTR (A) or the CDS (B) of coding transripts. Qualitatively the same trends can be observed as in Figure 4 (0.27 MB PDF) Click here for additional data file.(261K, pdf) Figure S4 Cumulative exon truncation at the splice donor/acceptor. The plot shows the cumulative curve for the data presented in Figure 5 (0.15 MB PDF) Click here for additional data file.(148K, pdf) Figure S5 AS landscape in random subsets of noncoding transcripts. In order to compare the landscape of AS events located in CDSs of coding transcripts (A) with the landscape formed by events in non-coding transcripts (B) in equally sized sets (see Figure 6 (0.30 MB PDF) Click here for additional data file.(293K, pdf) Acknowledgments We would like to thank our former group member F. Denoeud and J. Mudge of the HAVANA annotation team for sharing their expertise on Gencode, as well as T. Alioto, A. Kedzierska, T. Kisiel, H. Tilgner, V. Lacroix, J. Lagarde, and O. Gonzalez from our Genome Bioinformatics Lab (GBL) for many fruitful discussions. Footnotes The authors have declared that no competing interests exist. This work has been funded by a DAAD (German Academic Exchange Service) postdoctoral fellowship to MS. Further support has been provided by grants from the NHGRI Encode project, the European Union ATD project, and the Spanish Plan Nacional de I+D. References 1. Black DL. Mechanisms of alternative pre-messenger RNA splicing. Annu Rev Biochem. 2003;72:291–336. [PubMed] 2. Lopez AJ. Alternative splicing of pre-mRNA: developmental consequences and mechanisms of regulation. Annu Rev Genet. 1998;32:279–305. [PubMed] 3. Smith CW, Valcarcel J. Alternative pre-mRNA splicing: the logic of combinatorial control. Trends Biochem Sci. 2000;25:381–388. [PubMed] 4. Lander ES, Linton LM, Birren B, Nusbaum C, Zody MC, et al. Initial sequencing and analysis of the human genome. Nature. 2001;409:860–921. [PubMed] 5. Florea L. Bioinformatics of alternative splicing and its regulation. Brief Bioinform. 2006;7:55–69. [PubMed] 6. Xing Y, Lee C. Alternative splicing and RNA selection pressure—evolutionary consequences for eukaryotic genomes. Nat Rev Genet. 2006;7:499–509. [PubMed] 7. Zavolan M, van Nimwegen E. The types and prevalence of alternative splice forms. Curr Opin Struct Biol. 2006;16:362–367. [PubMed] 8. Johnson JM, Castle J, Garrett-Engele P, Kan Z, Loerch PM, et al. Genome-wide survey of human alternative pre-mRNA splicing with exon junction microarrays. Science. 2003;302:2141–2144. [PubMed] 9. Kim H, Klein R, Majewski J, Ott J. Estimating rates of alternative splicing in mammals and invertebrates. Nat Genet. 2004;36:915–916; author reply 916–917. [PubMed] 10. Breithart RE, Andreadis A, Nadal-Ginard B. Alternative splicing: a ubiquitous mechanism for the generation of multiple protein isoforms from single genes. Annu Rev Biochem. 1987;56:467–495. [PubMed] 11. The ENCODE Project Consortium. Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project. Nature. 2007;447:799–816. [PubMed] 12. Pruitt KD, Tatusova T, Maglott DR. NCBI reference sequences (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins. Nucleic Acids Res. 2007;35:D61–D65. [PubMed] 13. Stamm S, Zhu J, Nakai K, Stoilov P, Stoss O, et al. An alternative-exon database and its statistical analysis. DNA Cell Biol. 2000;19:739–756. [PubMed] 14. Stamm S, Riethoven JJ, Le Texier V, Gopalakrishnan C, Kumanduri V, et al. ASD: a bioinformatics resource on alternative splicing. Nucleic Acids Res. 2006;34:D46–D55. [PubMed] 15. Le Texier V, Riethoven JJ, Kumanduri V, Gopalakrishnan C, Lopez F, et al. AltTrans: transcript pattern variants annotated for both alternative splicing and alternative polyadenylation. BMC Bioinformatics. 2006;7:169. [PubMed] 16. Holste D, Huo G, Tung V, Burge CB. HOLLYWOOD: a comparative relational database of alternative splicing. Nucleic Acids Res. 2006;34:D56–D62. [PubMed] 17. Zhou Y, Zhou C, Ye L, Dong J, Xu H, et al. Database and analyses of known alternatively spliced genes in plants. Genomics. 2003;82:584–595. [PubMed] 18. Coward E, Haas SA, Vingron M. SpliceNest: visualization of gene structure and alternative splicing based on EST clusters. Trends Genet. 2002;18:53–55. 19. Huang YH, Chen YT, Lai JJ, Yang ST, Yang UC. PALS db: Putative Alternative Splicing database. Nucleic Acids Res. 2002;30:186–190. [PubMed] 20. Burset M, Seledtsov IA, Solovyev VV. SpliceDB: database of canonical and non-canonical mammalian splice sites. Nucleic Acids Res. 2001;29:255–259. [PubMed] 21. Ji H, Zhou Q, Wen F, Xia H, Lu X, et al. AsMamDB: an alternative splice database of mammals. Nucleic Acids Res. 2001;29:260–263. [PubMed] 22. Modrek B, Resch A, Grasso C, Lee C. Genome-wide detection of alternative splicing in expressed sequences of human genes. Nucleic Acids Res. 2001;29:2850–2859. [PubMed] 23. Huang HD, Horng JT, Lee CC, Liu BJ. ProSplicer: a database of putative alternative splicing information derived from protein, mRNA and expressed sequence tag sequence data. Genome Biol. 2003;4:R29. [PubMed] 24. Bhasi A, Pandey RV, Utharasamy SP, Senapathy P. EuSplice: a unified resource for the analysis of splice signals and alternative splicing in eukaryotic genes. Bioinformatics. 2007;23:1815–1823. [PubMed] 25. Kim N, Alekseyenko AV, Roy M, Lee C. The ASAP II database: analysis and comparative genomics of alternative splicing in 15 animal species. Nucleic Acids Res. 2007;35:D93–D98. [PubMed] 26. Malko DB, Makeev VJ, Mironov AA, Gelfand MS. Evolution of exon–intron structure and alternative splicing in fruit flies and malarial mosquito genomes. Genome Res. 2006;16:505–509. [PubMed] 27. Collins L, Penny D. Proceedings of the SMBE Tri-National Young Investigators' Workshop 2005. Investigating the intron recognition mechanism in eukaryotes. Mol Biol Evol. 2006;23:901–910. [PubMed] 28. Fox-Walsh KL, Dou Y, Lam BJ, Hung SP, Baldi PF, et al. The architecture of pre-mRNAs affects mechanisms of splice-site pairing. Proc Natl Acad Sci U S A. 2005;102:16176–16181. [PubMed] 29. Nagasaki H, Arita M, Nishizawa T, Suwa M, Gotoh O. Species-specific variation of alternative splicing and transcriptional initiation in six eukaryotes. Gene. 2005;364:53–62. [PubMed] 30. Buratti E, Baralle M, Baralle FE. Defective splicing, disease and therapy: searching for master checkpoints in exon definition. Nucleic Acids Res. 2006;34:3494–3510. [PubMed] 31. Kornblihtt AR, de la Mata M, Fededa JP, Munoz MJ, Nogues G. Multiple links between transcription and splicing. RNA. 2004;10:1489–1498. [PubMed] 32. Listerman I, Sapra AK, Neugebauer KM. Cotranscriptional coupling of splicing factor recruitment and precursor messenger RNA splicing in mammalian cells. Nat Struct Mol Biol. 2006;13:815–822. [PubMed] 33. Mironov AA, Fickett JW, Gelfand MS. Frequent alternative splicing of human genes. Genome Res. 1999;9:1288–1293. [PubMed] 34. Eyras E, Reymond A, Castelo R, Bye JM, Camara F, et al. Gene finding in the chicken genome. BMC Bioinformatics. 2005;6:131. [PubMed] 35. Heber S, Alekseyev M, Sze SH, Tang H, Pevzner PA. Splicing graphs and EST assembly problem. Bioinformatics. 2002;18:S181–S188. [PubMed] 36. Sperisen P, Iseli C, Pagni M, Stevenson BJ, Bucher P, et al. trome, trEST and trGEN: databases of predicted protein sequences. Nucleic Acids Res. 2004;32:D509–D511. [PubMed] 37. Sugnet CW, Kent WJ, Ares M, Jr, Haussler D. Transcriptome and genome conservation of alternative splicing events in humans and mice. Pac Symp Biocomput. 2004:66–77. [PubMed] 38. Bollina D, Lee BT, Tan TW, Ranganathan S. ASGS: an alternative splicing graph web service. Nucleic Acids Res. 2006;34:W444–W447. [PubMed] 39. Gupta S, Zink D, Korn B, Vingron M, Haas SA. Genome wide identification and classification of alternative splicing based on EST data. Bioinformatics. 2004;20:2579–2585. [PubMed] 40. Kim N, Shin S, Lee S. ECgene: genome-based EST clustering and gene modeling for alternative splicing. Genome Res. 2005;15:566–576. [PubMed] 41. Lee BT, Tan TW, Ranganathan S. DEDB: a database of Drosophila melanogaster exons in splicing graph form. BMC Bioinformatics. 2004;5:189. [PubMed] 42. Foissac S, Sammeth M. ASTALAVISTA: dynamic and flexible analysis of alternative splicing events in custom gene datasets. Nucleic Acids Res. 2007;35:W297–W299. [PubMed] 43. Kondrashov FA, Koonin EV. Evolution of alternative splicing: deletions, insertions and origin of functional parts of proteins from intron sequences. Trends Genet. 2003;19:115–119. [PubMed] 44. Yandell M, Mungall CJ, Smith C, Prochnik S, Kaminker J, et al. Large-scale trends in the evolution of gene structures within 11 animal genomes. PLoS Comput Biol. 2006;2:e15. doi:10.1371/journal.pcbi.0020015. [PubMed] 45. Swinburne IA, Meyer CA, Liu XS, Silver PA, Brodsky AS. Genomic localization of RNA binding proteins reveals links between pre-mRNA processing and transcription. Genome Res. 2006;16:912–921. [PubMed] 46. Benson DA, Karsch-Mizrachi I, Lipman DJ, Ostell J, Wheeler DL. GenBank. Nucleic Acids Res. 2008;36:D25–D30. [PubMed] 47. Boguski MS, Lowe TM, Tolstoshev CM. dbEST—database for “expressed sequence tags”. Nat Genet. 1993;4:332–333. [PubMed] 48. Hubbard TJ, Aken BL, Beal K, Ballester B, Caccamo M, et al. Ensembl 2007. Nucleic Acids Res. 2007;35:D610–D617. [PubMed] 49. Harrow J, Denoeud F, et al. GENCODE: Producing a reference annotation for ENCODE. Genome Biol. 2006;7:S4.1–S4.9. [PubMed] 50. Akerman M, Mandel-Gutfreund Y. Alternative splicing regulation at tandem 3′ splice sites. Nucleic Acids Res. 2006;34:23–31. [PubMed] 51. Chern TM, van Nimwegen E, Kai C, Kawai J, Carninci P, et al. A simple physical model predicts small exon length variations. PLoS Genet. 2006;2:e45. doi:10.1371/journal.pgen.0020045. [PubMed] 52. Hiller M, Huse K, Szafranski K, Jahn N, Hampe J, et al. Widespread occurrence of alternative splicing at NAGNAG acceptors contributes to proteome plasticity. Nat Genet. 2004;36:1255–1257. [PubMed] 53. Senapathy P. Possible evolution of splice-junction signals in eukaryotic genes from stop codons. Proc Natl Acad Sci U S A. 1988;85:1129–1133. [PubMed] 54. Kaessmann H, Zollner S, Nekrutenko A, Li WH. Signatures of domain shuffling in the human genome. Genome Res. 2002;12:1642–1650. [PubMed] 55. Kim E, Magen A, Ast G. Different levels of alternative splicing among eukaryotes. Nucleic Acids Res. 2007;35:125–131. [PubMed] 56. Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, et al. Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat Genet. 2000;25:25–29. [PubMed] 57. Gerstein MB, Bruce C, Rozowsky JS, Zheng D, Du J, et al. What is a gene, post-ENCODE? History and updated definition. Genome Res. 2007;17:669–681. [PubMed] 58. Pearson H. Genetics: what is a gene? Nature. 2006;441:398–401. [PubMed] 59. Akiva P, Toporik A, Edelheit S, Peretz Y, Diber A, et al. Transcription-mediated gene fusion in the human genome. Genome Res. 2006;16:30–36. [PubMed] 60. Parra G, Reymond A, Dabbouseh N, Dermitzakis ET, Castelo R, et al. Tandem chimerism as a means to increase protein complexity in the human genome. Genome Res. 2006;16:37–44. [PubMed] 61. Denoeud F, Kapranov P, Ucla C, Frankish A, Castelo R, et al. Prominent use of distal 5′ transcription start sites and discovery of a large number of additional exons in ENCODE regions. Genome Res. 2006;17:746–759. [PubMed] 62. Takeda J, Suzuki Y, Nakao M, Barrero RA, Koyanagi KO, et al. Large-scale identification and characterization of alternative splicing variants of human gene transcripts using 56,419 completely sequenced and manually annotated full-length cDNAs. Nucleic Acids Res. 2006;34:3917–3928. [PubMed] 63. Birney E, Stamatoyannopoulos JA, Dutta A, Guigo R, Gingeras TR, et al. Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project. Nature. 2007;447:799–816. [PubMed] 64. The FlyBase Consortium. FlyBase—the Drosophila database. Nucleic Acids Res. 1994;22:3456–3458. [PubMed] 65. Stein L, Sternberg P, Durbin R, Thierry-Mieg J, Spieth J. WormBase: network access to the genome and biology of Caenorhabditis elegans. Nucleic Acids Res. 2001;29:82–86. [PubMed] 66. Crooks GE, Hon G, Chandonia JM, Brenner SE. WebLogo: a sequence logo generator. Genome Res. 2004;14:1188–1190. [PubMed] 67. Hedges SB. The origin and evolution of model organisms. Nat Rev Genet. 2002;3:838–849. [PubMed] |
PubMed related articles
Your browsing activity is empty. Activity recording is turned off. |
|||||||||||||||||||||||
Annu Rev Biochem. 2003; 72():291-336.
[Annu Rev Biochem. 2003]Trends Biochem Sci. 2000 Aug; 25(8):381-8.
[Trends Biochem Sci. 2000]Nature. 2001 Feb 15; 409(6822):860-921.
[Nature. 2001]Brief Bioinform. 2006 Mar; 7(1):55-69.
[Brief Bioinform. 2006]Curr Opin Struct Biol. 2006 Jun; 16(3):362-7.
[Curr Opin Struct Biol. 2006]Annu Rev Biochem. 1987; 56():467-95.
[Annu Rev Biochem. 1987]Nature. 2007 Jun 14; 447(7146):799-816.
[Nature. 2007]Nucleic Acids Res. 2007 Jan; 35(Database issue):D61-5.
[Nucleic Acids Res. 2007]DNA Cell Biol. 2000 Dec; 19(12):739-56.
[DNA Cell Biol. 2000]Nucleic Acids Res. 2006 Jan 1; 34(Database issue):D46-55.
[Nucleic Acids Res. 2006]Genome Res. 2006 Apr; 16(4):505-9.
[Genome Res. 2006]Mol Biol Evol. 2006 May; 23(5):901-10.
[Mol Biol Evol. 2006]Proc Natl Acad Sci U S A. 2005 Nov 8; 102(45):16176-81.
[Proc Natl Acad Sci U S A. 2005]Gene. 2005 Dec 30; 364():53-62.
[Gene. 2005]Nucleic Acids Res. 2006; 34(12):3494-510.
[Nucleic Acids Res. 2006]Genome Res. 2006 Apr; 16(4):505-9.
[Genome Res. 2006]Genome Res. 1999 Dec; 9(12):1288-93.
[Genome Res. 1999]Nucleic Acids Res. 2001 Jul 1; 29(13):2850-9.
[Nucleic Acids Res. 2001]BMC Bioinformatics. 2005 May 30; 6():131.
[BMC Bioinformatics. 2005]Bioinformatics. 2002; 18 Suppl 1():S181-8.
[Bioinformatics. 2002]Nucleic Acids Res. 2004 Jan 1; 32(Database issue):D509-11.
[Nucleic Acids Res. 2004]Nucleic Acids Res. 2007 Jul; 35(Web Server issue):W297-9.
[Nucleic Acids Res. 2007]Trends Genet. 2003 Mar; 19(3):115-9.
[Trends Genet. 2003]PLoS Comput Biol. 2006 Mar; 2(3):e15.
[PLoS Comput Biol. 2006]Mol Biol Evol. 2006 May; 23(5):901-10.
[Mol Biol Evol. 2006]Proc Natl Acad Sci U S A. 2005 Nov 8; 102(45):16176-81.
[Proc Natl Acad Sci U S A. 2005]Nucleic Acids Res. 2006; 34(12):3494-510.
[Nucleic Acids Res. 2006]Annu Rev Biochem. 2003; 72():291-336.
[Annu Rev Biochem. 2003]Annu Rev Genet. 1998; 32():279-305.
[Annu Rev Genet. 1998]Trends Biochem Sci. 2000 Aug; 25(8):381-8.
[Trends Biochem Sci. 2000]Nature. 2001 Feb 15; 409(6822):860-921.
[Nature. 2001]RNA. 2004 Oct; 10(10):1489-98.
[RNA. 2004]Genome Res. 2006 Jul; 16(7):912-21.
[Genome Res. 2006]Nucleic Acids Res. 2007 Jul; 35(Web Server issue):W297-9.
[Nucleic Acids Res. 2007]Nucleic Acids Res. 2008 Jan; 36(Database issue):D25-30.
[Nucleic Acids Res. 2008]Nat Genet. 1993 Aug; 4(4):332-3.
[Nat Genet. 1993]Nucleic Acids Res. 2007 Jan; 35(Database issue):D61-5.
[Nucleic Acids Res. 2007]Nucleic Acids Res. 2007 Jan; 35(Database issue):D610-7.
[Nucleic Acids Res. 2007]Genome Biol. 2006; 7 Suppl 1():S4.1-9.
[Genome Biol. 2006]Gene. 2005 Dec 30; 364():53-62.
[Gene. 2005]Gene. 2005 Dec 30; 364():53-62.
[Gene. 2005]Pac Symp Biocomput. 2004; ():66-77.
[Pac Symp Biocomput. 2004]Gene. 2005 Dec 30; 364():53-62.
[Gene. 2005]Nucleic Acids Res. 2006; 34(1):23-31.
[Nucleic Acids Res. 2006]Nat Genet. 2004 Dec; 36(12):1255-7.
[Nat Genet. 2004]Mol Biol Evol. 2006 May; 23(5):901-10.
[Mol Biol Evol. 2006]Proc Natl Acad Sci U S A. 1988 Feb; 85(4):1129-33.
[Proc Natl Acad Sci U S A. 1988]Genome Res. 2004 Jun; 14(6):1188-90.
[Genome Res. 2004]Genome Res. 2002 Nov; 12(11):1642-50.
[Genome Res. 2002]Nat Rev Genet. 2002 Nov; 3(11):838-49.
[Nat Rev Genet. 2002]Nucleic Acids Res. 2007; 35(1):125-31.
[Nucleic Acids Res. 2007]Nat Genet. 2000 May; 25(1):25-9.
[Nat Genet. 2000]Genome Res. 2007 Jun; 17(6):669-81.
[Genome Res. 2007]Nature. 2006 May 25; 441(7092):398-401.
[Nature. 2006]Genome Res. 2006 Jan; 16(1):30-6.
[Genome Res. 2006]Genome Res. 2006 Jan; 16(1):37-44.
[Genome Res. 2006]Nucleic Acids Res. 2006; 34(14):3917-28.
[Nucleic Acids Res. 2006]Nucleic Acids Res. 2007 Jan; 35(Database issue):D61-5.
[Nucleic Acids Res. 2007]Nucleic Acids Res. 2007 Jan; 35(Database issue):D610-7.
[Nucleic Acids Res. 2007]Nature. 2007 Jun 14; 447(7146):799-816.
[Nature. 2007]Genome Biol. 2006; 7 Suppl 1():S4.1-9.
[Genome Biol. 2006]Nucleic Acids Res. 1994 Sep; 22(17):3456-8.
[Nucleic Acids Res. 1994]