• We are sorry, but NCBI web applications do not support your browser and may not function properly. More information
Logo of narLink to Publisher's site
Nucleic Acids Res. Jan 2007; 35(Database issue): D93–D98.
Published online Nov 15, 2006. doi:  10.1093/nar/gkl884
PMCID: PMC1669709

The ASAP II database: analysis and comparative genomics of alternative splicing in 15 animal species

Abstract

We have greatly expanded the Alternative Splicing Annotation Project (ASAP) database: (i) its human alternative splicing data are expanded ~3-fold over the previous ASAP database, to nearly 90 000 distinct alternative splicing events; (ii) it now provides genome-wide alternative splicing analyses for 15 vertebrate, insect and other animal species; (iii) it provides comprehensive comparative genomics information for comparing alternative splicing and splice site conservation across 17 aligned genomes, based on UCSC multigenome alignments; (iv) it provides an ~2- to 3-fold expansion in detection of tissue-specific alternative splicing events, and of cancer versus normal specific alternative splicing events. We have also constructed a novel database linking orthologous exons and orthologous introns between genomes, based on multigenome alignment of 17 animal species. It can be a valuable resource for studies of gene structure evolution. ASAP II provides a new web interface enabling more detailed exploration of the data, and integrating comparative genomics information with alternative splicing data. We provide a set of tools for advanced data-mining of ASAP II with Pygr (the Python Graph Database Framework for Bioinformatics) including powerful features such as graph query, multigenome alignment query, etc. ASAP II is available at http://www.bioinformatics.ucla.edu/ASAP2.

INTRODUCTION

Alternative splicing plays an important role in protein diversity and gene regulation (13). Recent studies on alternative splicing estimate that 40–70% of human genes are alternatively spliced (46). Moreover, many splice variants alter the function of the protein product, and are involved in human diseases (7). Thus, alternative splicing is an important medical target for development of novel diagnostics and therapeutic drugs (8).

Genome-wide analyses of alternative splicing are mainly based on publicly available sequence databases such as GenBank (9) and Swiss-Prot/TrEMBL. HOLLYWOOD (10) and ASD (11) give comprehensive analyses of alternative splicing for human and mouse. Notably, those two databases provide with comparative studies between human and mouse. Lee et al. (12) constructed DEDB for genome-wide analysis of alternative splicing for Drosophila melanogaster. As well as alternative splicing analysis, ECgene (13,14) gives comprehensive analysis results for functional annotation of proteins and expression analysis. Furthermore, it has been recently expanded to nine species.

The Alternative Splicing Annotation Project (ASAP) database (15) is a widely used resource providing a genome-wide analysis of human alternative splicing and tissue-specific splicing (4,1620) based on expressed sequence tag (EST), messenger RNA (mRNA) and genome sequences. It has served as the basis for a wide variety of studies (2128).

Here we describe a major expansion of the ASAP database, designed to make it a good resource for analyzing and comparing alternative splicing between a wide range of animal genomes. Whereas the original release of ASAP focused entirely on human data, we have now included genome-wide analyses of alternative splicing for 15 animal species from human to nematodes. Furthermore, we have added a new dimension of comparative genomics tools, for comparing alternative splicing patterns, conservation of splice sites, exons and introns, across 17 animal genomes.

MATERIALS AND METHODS

We downloaded UniGene (29), GenBank (9) and Entrez Genes (30) from NCBI ftp site (UniGene; ftp://ftp.ncbi.nih.gov/repository/UniGene/, GenBank; ftp://ftp.ncbi.nih.gov/genbank/, Entrez Genes;ftp://ftp.ncbi.nih.gov/gene/) in January 2006. Genome assembly sequences, RefSeq (31)/mRNA alignments and RepeatMasker tracks were downloaded from UCSC genome browser except for yellow fever mosquito genome from Ensembl genome browser (32). Multigenome alignments for human (hg17), mouse (mm7), chicken (galGal2), fruit fly (dm2), zebrafish (danRer3) and western clawed frog (xenTro1) were downloaded from UCSC genome browser.

In order to update lists of tissue and cancer versus normal specific genes for human, we downloaded EST library information from UniLib (ftp:/ftp.ncbi.nih.gov/repository/UniLib/). A total of 2895 new human EST libraries were classified and added into existing 47 tissue categories and normal/tumor types. In total, 8828 human EST libraries were classified into 47 tissues and normal/tumor. We used same method used by Xu and Lee (19) for LOD value calculation for tissue and normal versus cancer specificity.

Orthologous exons, introns and splice site sequences were extracted using Pygr, which gives us less than a millisecond access to any location of any genome in multigenome alignments. Moreover, Pygr can be easily integrated with ASAP II database and more detailed information will be available at ASAP II website.

We defined as orthologous exons and introns if at least one of the splice sites of exons (those of flanking exons for introns) from two species is exactly aligned in multigenome alignments. This strategy can increase the possibilities of finding orthologous exons, because the exons can be within well-conserved blocks of multigenome alignments. Conventional protein similarity-based method can give only orthologous genes only if protein sequences are available. Moreover, multigenome alignment-based method enables us to interpret how alternatively spliced exons and introns are evolved across distant species.

RESULTS AND DISCUSSION

Alternative splicing analyses

Compared with the previous release of ASAP (15), ASAP II provides an ~3-fold expansion in human alternative splicing events, to a total of 89 078 distinct alternative splicing relationships in human, detected within 11 717 genes (UniGene clusters). Out of the total set of multi-exon genes (22 220), 53% were detected to contain alternative splicing (Table 1). Focusing on genes with at least one mRNA sequence (for which our gene model is therefore likely to be full-length, and which generally have higher EST coverage), 75% (10 202 out of 13 690) were detected to contain alternative splicing. The continuing rapid growth in alternative splicing detection as a function of increased EST and mRNA counts suggests that the field is still far from saturation, and that far more experimental data will be required to obtain a complete catalog of human alternative splicing.

Table 1
Statistics for ASAP II database

Another major change is the addition of alternative splicing analyses for 14 new animal genomes (Table 1), ranging from mammals, birds and fish, to insects, C.elegans and Ciona. This provides a very large dataset of non-human alternative splicing events (a total 67 095 alternative splicing relationships, over three-quarters the size of the human alternative splicing dataset). However, due to the limited EST coverage for many animal genomes (e.g. Fugu, honeybee), these data cannot be considered comprehensive. Numbers of mapped UniGene clusters can be lower than expected for Ciona, Fugu and yellow fever mosquito due to the incomplete genome assemblies. For mouse, 8711 (53%) out of 16 404 multi-exon genes were detected to contain alternative splicing and 60% (8203 out of 13 626) for genes with at least one mRNA. Twenty five percent of Rat, 22% of western clawed frog, 22% of chicken, 26% of cow and 19% of fruit fly multi-exon genes were detected to contain alternative splicing. Proportions of the alternatively spliced multi-exon genes for C.elegans (6%) and African malaria mosquito (8%) were lower than mammals. Alternative splicing analyses of 15 most sequenced species can expand our research area from human to nematodes as well as comparative and evolutionary studies between distantly related species.

As an illustration of ASAP II's value for biological discovery, we performed analyses of tissue-specificity and cancer versus normal specificity of human alternative splice forms. ASAP II yielded ~2- to 3-fold larger identification of tissue-specific splice forms than the previous ASAP release (19,20). We added 2895 new EST libraries to our tissue classification database (Materials and Methods): each library source was classified as one of 47 tissue types, and also as tumor versus normal in origin. We found 1709 high-confidence (LOD ≥ 3) tissue-specific alternative splicing relationships from 960 genes, and 273 high-confidence (LOD ≥ 3) cancer-specific relationships from 198 genes. The largest categories of tissue-specific splice forms were identified from brain/nerve, testis, skin, muscle and lymph. Users can download all EST library classification and log-odds (LOD) calculation results from ASAP II download page and mine their own experimental candidates.

Comparative genomics analyses

To help researchers easily compare alternative splicing data between species, we performed a comprehensive comparative genomics analysis across 17 genomes (Table 2), identifying orthologous exons, introns and alternative splice events between these genomes. As a separate analysis that is valid even when the target genome has little or no alternative splicing data, we also analyzed the conservation of alternative exons and splice sites across 17 genomes. To do this, we used the well-established and characterized multigenome alignments (33) constructed for the UCSC genome browser (34). Orthologous exons and introns were defined by sharing at least one splice site in multigenome alignments (Materials and Methods). Out of 129 981, 85 673 (66%) human internal exons have at least one orthologous exon, which are identified by hg17 referenced 17 species multigenome alignments. Total numbers of orthologous exons found by five different multigenome alignments are summarized in Table 2. This method can give more comprehensive database for orthologous genes than conventional protein similarity-based method. Furthermore, we constructed multigenome splice site database from UCSC multigenome alignments (Figure 1D). These data give users both the ability to compare observed splicing patterns between experimental data for different species, but also to study the evolution of alternative exons and splice sites (by looking at their conservation) even in genomes for which no splicing data are available.

Figure 1
Popup page for orthologous exons, introns and splice sites. (A) List of orthologous genes are described in UniGene summary section. (B) Orthologous Exons. ‘EXACT’ means both splice sites are exactly aligned in multigenome alignments. Change ...
Table 2
Statistics for orthologous exons and introns

Database mining and tools

Users can mine ASAP II in several ways:

  1. by using the web interface (below);
  2. by downloading it as MySQL tables and performing SQL queries;
  3. by using Python tools that work directly with the ASAP II schema, for graph query of alternative splicing graphs and comparative genomics query of multigenome alignments.

Although there's no space to discuss the latter tools (Pygr, the Python Graph Database Framework for Bioinformatics) here, extensive documentation is available on the web (http://www.bioinformatics.ucla.edu/pygr), including many tutorial examples about mining ASAP II.

Web interface

ASAP II can be searched by several different criteria such as gene symbol, gene name and ID [UniGene (29), GenBank (9), etc.]. The web interface provides seven different kinds of views:

  1. user query, UniGene annotation, orthologous genes and genome browsers;
  2. genome alignment;
  3. exons & orthologous exons;
  4. introns & orthologous introns;
  5. alternative splicing;
  6. isoform and protein sequences;
  7. tissue & cancer versus normal specificity.

ASAP II shows genome alignments of isoforms, exons and introns in UCSC-like genome browser. Users can easily navigate among all the views by clicking links of interest. Alternative and constitutive exons are highlighted in red and blue, respectively. All alternative splicing relationships with supporting evidence information, types of alternative splicing patterns, and inclusion rate for skipped exons are listed in separate tables. Users can also search human data for tissue- and cancer-specific splice forms at the bottom of the gene summary page. We report P-values for tissue-specificity as LOD scores, and highlight the results for LOD ≥ 3 and at least three EST sequences (19,20). A short introduction to the web interface and a comprehensive user guide are available at the ASAP II website, http://www.bioinformatics.ucla.edu/ASAP2.

Comparative genomics is a major focus of the ASAP II web interface, displaying results from its new orthologous exons and introns database. For example, it displays the multiple alignments of splice site sequences as a phylogenetic tree (Figure 1D), enabling users to infer the evolutionary history of introns at a glance. In Figure 1D, one can easily that this pair of splice sites appears to have evolved in an early mammalian ancestor, but not before. Many applications are possible. For example, researchers could identify ‘recently evolved splice sites’ by selecting introns whose canonical splice site sequences (GT/AG) are only conserved within closely related species, but not in distant species. ASAP II includes links to comparative genomics information from all views. All orthologous genes identified by multigenome alignments are listed in its annotation summary (Figure 1A). If the user clicks ‘Show Orthologous Exons/Introns’ on any page, detailed information will be shown in new window (Figure 1B and C).

Comparison with other alternative splicing databases

Alternative splicing analysis results can be significantly different between different databases because each database uses different sequence databases, genome assembly, methods for sequence alignments, alignment filtering and stringency, etc. Total numbers of alternatively spliced genes and exons for other databases are summarized in Table 3. ASAP II has more alternatively spliced genes than ASD for human (11 717 versus 9929) and mouse (8711 versus 8211). But, DEDB has more spliced genes than ASAP II (13 222 versus 9683). ECgene has twice as many spliced genes as the other databases suggesting the use of different stringency criteria for alignment filtering. HOLLYWOOD has more human internal exons than ASAP II (151 199 versus 129 981), but percentage of alternative exons is significantly lower for human (25% versus 36%) and mouse (13% versus 21%). Presumably, sequence database for HOLLYWOOD (January 2004) is older than ASAP II (January 2006).

Table 3
Comparison of alternative splicing analyses with other databases

Update and future directions

ASAP II gives alternative splicing analysis of UniGene data released in January 2006 (Version JAN06). In order to provide with up-to-date alternative splicing analysis, ASAP II database will be updated within 2 years if total number of available sequences are significantly increased. Availability of genome assembly is essential for supporting new species; we will add new species if the genome assembly is publicly available as well as the orthologous Exon/Intron database.

We will also develop novel analysis methods for alternative splicing such as evolutionary history of exons and introns and make available in ASAP II. We hope that ASAP II can become a useful resource for comparative genomics studies in the post-genome era.

Acknowledgments

The authors wish to thank Calvin Pan, Qi Wang and Dr Yi Xing for valuable comments on this work. This work has been supported by NIH grant U54 RR021813, Department of Energy grant DE-FC02-02ER63421, and by a Dreyfus Foundation Teacher-Scholar Award to C.J.L. Funding to pay the Open Access publication charges for this article was provided by NIH grant U54 RR021813.

Conflict of interest statement. None declared.

REFERENCES

1. Black D.L. Mechanisms of alternative pre-messenger RNA splicing. Annu. Rev. Biochem. 2003;72:291–336. [PubMed]
2. Graveley B.R. Alternative splicing: increasing diversity in the proteomic world. Trends Genet. 2001;17:100–107. [PubMed]
3. Maniatis T., Tasic B. Alternative pre-mRNA splicing and proteome expansion in metazoans. Nature. 2002;418:236–243. [PubMed]
4. Modrek B., Resch A., Grasso C., Lee C. Genome-wide detection of alternative splicing in expressed sequences of human genes. Nucleic Acids Res. 2001;29:2850–2859. [PMC free article] [PubMed]
5. Kan Z., States D., Gish W. Selecting for functional alternative splices in ESTs. Genome Res. 2002;12:1837–1845. [PMC free article] [PubMed]
6. Johnson J.M., Castle J., Garrett-Engele P., Kan Z., Loerch P.M., Armour C.D., Santos R., Schadt E.E., Stoughton R., Shoemaker D.D. Genome-wide survey of human alternative pre-mRNA splicing with exon junction microarrays. Science. 2003;302:2141–2144. [PubMed]
7. Caceres J.F., Kornblihtt A.R. Alternative splicing: multiple control mechanisms and involvement in human disease. Trends Genet. 2002;18:186–193. [PubMed]
8. Mangasarian A. Alternative RNA splicing and drug target identification. IDrugs. 2005;8:725–729. [PubMed]
9. Benson D.A., Karsch-Mizrachi I., Lipman D.J., Ostell J., Wheeler D.L. GenBank. Nucleic Acids Res. 2006;34:D16–D20. [PMC free article] [PubMed]
10. Holste D., Huo G., Tung V., Burge C.B. HOLLYWOOD: a comparative relational database of alternative splicing. Nucleic Acids Res. 2006;34:D56–D62. [PMC free article] [PubMed]
11. Stamm S., Riethoven J.J., Le Texier V., Gopalakrishnan C., Kumanduri V., Tang Y., Barbosa-Morais N.L., Thanaraj T.A. ASD: a bioinformatics resource on alternative splicing. Nucleic Acids Res. 2006;34:D46–D55. [PMC free article] [PubMed]
12. Lee B.T., Tan T.W., Ranganathan S. DEDB: a database of Drosophila melanogaster exons in splicing graph form. BMC Bioinformatics. 2004;5:189. [PMC free article] [PubMed]
13. Kim P., Kim N., Lee Y., Kim B., Shin Y., Lee S. ECgene: genome annotation for alternative splicing. Nucleic Acids Res. 2005;33:D75–D79. [PMC free article] [PubMed]
14. Kim N., Shin S., Lee S. ECgene: genome-based EST clustering and gene modeling for alternative splicing. Genome Res. 2005;15:566–576. [PMC free article] [PubMed]
15. Lee C., Atanelov L., Modrek B., Xing Y. ASAP: the alternative splicing annotation project. Nucleic Acids Res. 2003;31:101–105. [PMC free article] [PubMed]
16. Le K., Mitsouras K., Roy M., Wang Q., Xu Q., Nelson S.F., Lee C. Detecting tissue-specific regulation of alternative splicing as a qualitative change in microarray data. Nucleic Acids Res. 2004;32:e180. [PMC free article] [PubMed]
17. Xing Y., Resch A., Lee C. The multiassembly problem: reconstructing multiple transcript isoforms from EST fragment mixtures. Genome Res. 2004;14:426–441. [PMC free article] [PubMed]
18. Xing Y., Xu Q., Lee C. Widespread production of novel soluble protein isoforms by alternative splicing removal of transmembrane anchoring domains. FEBS Lett. 2003;555:572–578. [PubMed]
19. Xu Q., Lee C. Discovery of novel splice forms and functional analysis of cancer-specific alternative splicing in human expressed sequences. Nucleic Acids Res. 2003;31:5635–5643. [PMC free article] [PubMed]
20. Xu Q., Modrek B., Lee C. Genome-wide detection of tissue-specific alternative splicing in the human transcriptome. Nucleic Acids Res. 2002;30:3754–3766. [PMC free article] [PubMed]
21. Resch A., Xing Y., Alekseyenko A., Modrek B., Lee C. Evidence for a subpopulation of conserved alternative splicing events under selection pressure for protein reading frame preservation. Nucleic Acids Res. 2004;32:1261–1269. [PMC free article] [PubMed]
22. Cusack B.P., Wolfe K.H. Changes in alternative splicing of human and mouse genes are accompanied by faster evolution of constitutive exons. Mol. Biol. Evol. 2005;22:2198–2208. [PubMed]
23. Lian Y., Garner H.R. Evidence for the regulation of alternative splicing via complementary DNA sequence repeats. Bioinformatics. 2005;21:1358–1364. [PubMed]
24. Roy M., Xu Q., Lee C. Evidence that public database records for many cancer-associated genes reflect a splice form found in tumors and lack normal splice forms. Nucleic Acids Res. 2005;33:5026–5033. [PMC free article] [PubMed]
25. Xing Y., Lee C.J. Protein modularity of alternatively spliced exons is associated with tissue-specific regulation of alternative splicing. PLoS Genet. 2005;1:e34. [PMC free article] [PubMed]
26. Chen F.C., Wang S.S., Chen C.J., Li W.H., Chuang T.J. Alternatively and constitutively spliced exons are subject to different evolutionary forces. Mol. Biol. Evol. 2006;23:675–682. [PubMed]
27. Xing Y., Wang Q., Lee C. Evolutionary divergence of exon flanks: a dissection of mutability and selection. Genetics. 2006;173:1787–1791. [PMC free article] [PubMed]
28. Modrek B., Lee C.J. Alternative splicing in the human, mouse and rat genomes is associated with an increased frequency of exon creation and/or loss. Nature Genet. 2003;34:177–180. [PubMed]
29. Schuler G.D. Pieces of the puzzle: expressed sequence tags and the catalog of human genes. J. Mol. Med. 1997;75:694–698. [PubMed]
30. Maglott D., Ostell J., Pruitt K.D., Tatusova T. Entrez Gene: gene-centered information at NCBI. Nucleic Acids Res. 2005;33:D54–D58. [PMC free article] [PubMed]
31. Pruitt K.D., Tatusova T., Maglott D.R. NCBI Reference Sequence (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins. Nucleic Acids Res. 2005;33:D501–D504. [PMC free article] [PubMed]
32. Birney E., andrews D., Caccamo M., Chen Y., Clarke L., Coates G., Cox T., Cunningham F., Curwen V., Cutts T., et al. Ensembl 2006. Nucleic Acids Res. 2006;34:D556–D561. [PMC free article] [PubMed]
33. Blanchette M., Kent W.J., Riemer C., Elnitski L., Smit A.F., Roskin K.M., Baertsch R., Rosenbloom K., Clawson H., Green E.D., et al. Aligning multiple genomic sequences with the threaded blockset aligner. Genome Res. 2004;14:708–715. [PMC free article] [PubMed]
34. Hinrichs A.S., Karolchik D., Baertsch R., Barber G.P., Bejerano G., Clawson H., Diekhans M., Furey T.S., Harte R.A., Hsu F., et al. The UCSC Genome Browser Database: update 2006. Nucleic Acids Res. 2006;34:D590–D598. [PMC free article] [PubMed]

Articles from Nucleic Acids Research are provided here courtesy of Oxford University Press
PubReader format: click here to try

Formats:

Related citations in PubMed

See reviews...See all...

Links

  • PubMed
    PubMed
    PubMed citations for these articles

Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...