• We are sorry, but NCBI web applications do not support your browser and may not function properly. More information
Logo of narLink to Publisher's site
Nucleic Acids Res. Jul 2007; 35(Web Server issue): W297–W299.
Published online May 7, 2007. doi:  10.1093/nar/gkm311
PMCID: PMC1933205

ASTALAVISTA: dynamic and flexible analysis of alternative splicing events in custom gene datasets

Abstract

In the process of establishing more and more complete annotations of eukaryotic genomes, a constantly growing number of alternative splicing (AS) events has been reported over the last decade. Consequently, the increasing transcript coverage also revealed the real complexity of some variations in the exon–intron structure between transcript variants and the need for computational tools to address ‘complex’ AS events. ASTALAVISTA (alternative splicing transcriptional landscape visualization tool) employs an intuitive and complete notation system to univocally identify such events. The method extracts AS events dynamically from custom gene annotations, classifies them into groups of common types and visualizes a comprehensive picture of the resulting AS landscape. Thus, ASTALAVISTA can characterize AS for whole transcriptome data from reference annotations (GENCODE, REFSEQ, ENSEMBL) as well as for genes selected by the user according to common functional/structural attributes of interest: http://genome.imim.es/astalavista

INTRODUCTION

Alternative splicing (AS) is a fundamental cellular process involved in eukaryotic gene expression (1–3). To decipher the molecular mechanisms responsible for AS, several computational studies have been presented over the last years, producing a considerable quantity of dedicated analyses and databases representing the transcript diversity resulting from AS events (4,5).

In contrast to the global transcript diversity, it is also possible to identify local variations observed in the exon–intron structures amongst the transcripts. For the investigation of the molecular mechanisms giving rise to AS, the separation of events in different classes has shown promise. Historically, four main AS events have been reported in literature: exon skipping, intron retention, alternative donor and acceptor splice sites. Obviously, the frequent observation of these ‘simple’ events is correlated with the fact that they involve the minimum number of variable splice sites (i.e. two).

However, current annotation datasets show a plethora of more complex variations that can be seen as variable combinations of simple events overlapping each other, indicating connections in regulation and function. Only recently efforts have been undertaken in order to properly identify and describe such ‘complex’ AS events. In previous work, ‘bit matrices’ identifying exonic and intronic segments of transcripts have been used to generally describe AS events (6). Alternatively, we have developed an intuitive notation system based on the relative position of alternatively used splice sites in order to univocally identify any possible AS event (Sammeth et al., submitted).

In the process of investigating the phenomenon of AS, many web resources have already been made available (7–12). In general, these tools can be considered as AS-dedicated gene or genome browsers, suitable to access much information about each gene of interest but not convenient for comprehensive analyses of AS across genes. Moreover, none of them propose an exhaustive identification of AS events from custom input data.

Herein, we describe the ASTALAVISTA web server (alternative splicing transcriptional landscape visualization tool) that allows to dynamically identify, extract and display complex AS events from annotated genes. ASTALAVISTA gives the opportunity to investigate and compare types and distributions of the different AS events found in the input—whole genome annotations as well as user provided gene sets. To our knowledge, this is the first time a tool for the exhaustive extraction of AS events from custom datasets is made publicly available.

METHODS

ASTALAVISTA adopts a generic definition of AS events and a flexible notation system assigning a code based on the relative position of alternatively used splice sites. In brief, given a set of annotated transcripts, the method consists in first considering all pairwise comparisons between overlapping transcripts. A variation of the splicing structure is detected if some splice sites are not used in both transcripts. Then, according to the genomic coordinates, the relative order of the splice sites that are included in such variations is used to build a code describing the corresponding AS event. This approach overcomes limitations of methods focusing exclusively on simple events and circumvents the problem of chosing a reference transcript, defining a ‘main’ splice form to be compared with. The intrinsic transcript clustering prevents the method from being dependent on the assignment of transcripts to a certain gene name or locus. Furthermore, a redundancy filtering is applied in order to identify the list of unique AS events, regardless of how many transcript comparisons exhibit the same splicing variation. The genericity of the notation based on relative splice site positions enables to compare AS events across different genes, chromosomes or genomes. By this, events describing equal variations in the exon–intron structures are pooled in a common group. Finally, the distribution of AS events across these groups is used to depict the AS landscape in a dataset. More details about the method and the resulting ‘AS code’ are provided on the web site.

WEB SITE DESCRIPTION

Input: annotation datasets

ASTALAVISTA requires a set of transcripts with known exon–intron structure (e.g. from mRNAs or ESTs). As primary data source for custom transcript sets, the genomic positions of the exon boundaries for each transcript are provided using the gene transfer format (GTF). Each GTF line has nine required fields, of which the feature (e.g. ‘exon’, ‘CDS’, etc.), start and end coordinates on a chromosome (or a contig), the strand and an identifier for the corresponding transcript are used. Note that no gene identifiers are necessary due to the intrinsic clustering of transcripts into loci. To check the GTF format requirements, an example input is available on the web server. Optionally, if any protein coding region information is provided within the input (feature ‘CDS’), transcripts and/or extracted AS events may be filtered according to the annotated CDSs. This straightforwardly allows to compare the AS landscapes of coding versus non-coding transcripts or of events localized in CDSs versus UTRs.

Alternatively, the user may also analyze the anatomy of AS as characterized by popular genomic human annotations (GENCODE, REFSEQ, ENSEMBL), or just provide any set of human genes. In the latter case, the list of genes can be specified by identifiers from various nomenclature systems, like REFSEQ mRNA IDs, SWISSPROT IDs, HUGO gene symbols, ENSEMBL transcript/gene/protein identifiers, etc. Therefore, the AS topology can be differentially assessed for custom datasets containing the respective genes of interest.

Output: landscape of AS events

From the provided annotation with respect to the specified options, the ASTALAVISTA protocol dynamically extracts AS events. As a summary, the main result page shows a list where each event type is depicted and its unique code in the relative-position notation is given. The list is ranked according to the occurrence (number or proportion) of the events. A graphical overview is provided in form of a pie diagram that displays the distribution of events across the groups, considering differentially each type of simple event and pooling the others in one group (Figure 1, left).

Figure 1.
Analysis of the AS landscape in a sample dataset. The AS landscape is described by a list of AS events grouped according to equal variations in the exon–intron structure between transcripts (left). A schematic picture illustrates every type of ...

From the result summary page, the genomic coordinates of all AS events counted in a group are accessible by clicking on the corresponding list entry (Figure 1, top-right). For each AS event, the transcript identifiers and the variable splice sites giving rise to it are specified. Finally, each event is linked to the UCSC Genome Browser for further comparative analyses (Figure 1, bottom-right).

CONCLUSION AND PERSPECTIVE

ASTALAVISTA is an explorative tool to exhaustively extract AS events reflected by a certain input dataset, to compare and to group them according to equal exon–intron structure variations. As a key feature, arbitrarily complex combinations of hitherto described AS events can be distinguished, either visually or by representation in a univocal notation system. The event-based model of AS permits to easily identify genes that involve the same type of event, e.g. alternative donors or double exon skipping. On the other hand, the comprehensive analysis of observed AS events provides a powerful tool for investigating correlations between differences in AS patterns and functional/structural features of genes, gene sets or complete genomes. In this concern, ASTALAVISTA can handle custom inputs according to any discriminatory criteria, e.g. common evolutionary conservation, pattern or intensity of expression, function/cellular localization of the gene product, etc.

Although the reference datasets currently provided on the server are dedicated to reference organisms, the generic ASTALAVISTA protocol is applicable to any genome, even if the sequencing/annotation process has not been completed. In the future, the web resource will be completed by reference annotations for more species.

ACKNOWLEDGEMENTS

The authors thank R. Guigò, E. Eyras and other members of the PRBB for inspiring and fruitful discussions. This work has been supported by a post-doctoral fellowship to M.S. Funding to pay the Open Access publication charge was provided by Centre de Regulacio Genomica, Catalunya.

Conflict of interest statement. None declared.

REFERENCES

1. Black DL. Mechanisms of alternative pre-messenger RNA splicing. Annu. Rev. Biochem. 2003;72:291–336. [PubMed]
2. Lopez AJ. Alternative splicing of pre-mRNA: developmental consequences and mechanisms of regulation. Annu. Rev. Genet. 1998;32:279–305. [PubMed]
3. Smith CW, Valcarcel J. Alternative pre-mRNA splicing: the logic of combinatorial control. Trends Biochem. Sci. 2000;25:381–388. [PubMed]
4. Florea L. Bioinformatics of alternative splicing and its regulation. Brief Bioinformatics. 2006;7:55–69. [PubMed]
5. Xing Y, Lee C. Alternative splicing and RNA selection pressure - evolutionary consequences for eukaryotic genomes. Nat. Rev. Genet. 2006;7:499–509. [PubMed]
6. Nagasaki H, Arita M, Nishizawa T, Suwa M, Gotoh O. Automated classification of alternative splicing and transcriptional initiation and construction of visual database of classified patterns. Bioinformatics. 2006;22:1211–1216. [PubMed]
7. Holste D, Huo G, Tung V, Burge CB. HOLLYWOOD: a comparative relational database of alternative splicing. Nucleic Acids Res. 2006;34:D56–D62. [PMC free article] [PubMed]
8. Bollina D, Lee BT, Tan TW, Ranganathan S. ASGS: an alternative splicing graph web service. Nucleic Acids Res. 2006;34:W444–W447. [PMC free article] [PubMed]
9. Castrignano T, Rizzi R, Talamo IG, De Meo PD, Anselmo A, Bonizzoni P, Pesole G. ASPIC: a web resource for alternative splicing prediction and transcript isoforms characterization. Nucleic Acids Res. 2006;34:W440–W443. [PMC free article] [PubMed]
10. Krause A, Haas SA, Coward E, Vingron M. SYSTERS, GeneNest, SpliceNest: exploring sequence space from genome to protein. Nucleic Acids Res. 2002;30:299–300. [PMC free article] [PubMed]
11. Leipzig J, Pevzner P, Heber S. The alternative splicing gallery (ASG): bridging the gap between genome and transcriptome. Nucleic Acids Res. 2004;32:3977–3983. [PMC free article] [PubMed]
12. Stamm S, Riethoven JJ, Le Texier V, Gopalakrishnan C, Kumanduri V, Tang Y, Barbosa-Morais NL, Thanaraj TA. ASD: a bioinformatics resource on alternative splicing. Nucleic Acids Res. 2006;34:D46–D55. [PMC free article] [PubMed]

Articles from Nucleic Acids Research are provided here courtesy of Oxford University Press
PubReader format: click here to try

Formats:

Related citations in PubMed

See reviews...See all...

Cited by other articles in PMC

See all...

Links

  • PubMed
    PubMed
    PubMed citations for these articles
  • Substance
    Substance
    PubChem Substance links

Recent Activity

    Your browsing activity is empty.

    Activity recording is turned off.

    Turn recording back on

    See more...