![]() | ![]() |
Formats:
|
||||||||||||
Copyright © 2005 Oxford University Press ECgene: genome annotation for alternative splicing Division of Molecular Life Sciences, Ewha Womans University, Seoul 120-750, Korea and 1School of Chemistry, Seoul National University, Seoul 151-747, Korea *To whom correspondence should be addressed. Tel: +82 2 3277 2888; Fax: +82 2 3277 2384; Email: sanghyuk/at/ewha.ac.kr aThe authors wish it to be known that, in their opinion, the first four authors should be regarded as joint First Authors aThe online version of this article has been published under an open access model. Users are entitled to use, reproduce, disseminate, or display the open access version of this article for non-commercial purposes provided that: the original authorship is properly and fully attributed; the Journal and Oxford University Press are attributed as the original place of publication with the correct citation details given; if an article is subsequently reproduced or disseminated not in its entirety but only in part or as a derivative work this must be clearly indicated. For commercial re-use permissions, please contact journals.permissions/at/oupjournals.org. a© 2005, the authors Received August 15, 2004; Revised October 9, 2004; Accepted October 20, 2004. This article has been cited by other articles in PMC.Abstract ECgene provides annotation for gene structure, function and expression, taking alternative splicing events into consideration. The gene-modeling algorithm combines the genome-based expressed sequence tag (EST) clustering and graph-theoretic transcript assembly procedures. The website provides several viewers and applications that have many unique features useful for the analysis of the transcript structure and gene expression. The summary viewer shows the gene summary and the essence of other annotation programs. The genome browser and the transcript viewer are available for comparing the gene structure of splice variants. Changes in the functional domains by alternative splicing can be seen at a glance in the transcript viewer. We also provide two unique ways of analyzing gene expression. The SAGE tags deduced from the assembled transcripts are used to delineate quantitative expression patterns from SAGE libraries available publically. Furthermore, the cDNA libraries of EST sequences in each cluster are used to infer qualitative expression patterns. It should be noted that the ECgene website provides annotation for the whole transcriptome, not just the alternatively spliced genes. Currently, ECgene supports the human, mouse and rat genomes. The ECgene suite of tools and programs is available at http://genome.ewha.ac.kr/ECgene/. INTRODUCTION Alternative splicing (AS) is a major mechanism of increasing transcriptome diversity in eucaryotic genomes (1–3). Recent studies on AS estimated that 40–70% of human genes show evidence of encoding transcripts that are alternatively spliced (4–6). Numerous databases on AS, mostly concentrating on specific aspects of gene structure (7–14), have been published so far. Databases on AS are generated either by data mining the experimental databases such as GenBank, Swiss-Prot/TrEMBL and Medline, or by comparing sequence alignments. The former includes ASDB (14), Xpro (12) and AEdb in ASD (7). Computational approaches are even more diverse. Many attempts compared the expressed sequence tag (EST) alignments with mRNA or known gene sequences (10,12,13,15,16). However, examining genomic alignment of EST and mRNA sequences has many advantages once the genome map is available. This includes AltExtron (17), TAP (5), ASAP (8) and altGraphX (18). Notably, the European Bioinformatics Institute (EBI) launched the Alternative Splicing Database (ASD) consortium to annotate AS events (7). Currently, three major databases are available in ASD—AltSplice and AltExtron from computational pipeline and AEdb which is manually generated from the literature. AceView at the NCBI is another website that provides ample information on alternatively spliced genes (D. Thierry-Mieg, J. Thierry-Mieg, M. Potdevin and M. Sienkiewicz, unpublished data; http://www.ncbi.nih.gov/IEB/Research/Acembly/). Recently, we developed a gene prediction program, ECgene (Gene prediction by EST Clustering), taking alternative splicing into consideration. The algorithm combines the genome-based EST clustering and graph-theoretic transcript assembly procedures in a coherent fashion. A brief summary of the algorithm was described with the ASmodeler work, the web server version (19). Further details of the algorithm will be published elsewhere (N. Kim, S. Shin and S. Lee, manuscript submitted). In this paper, we describe the ECgene database and website with several application programs. Besides the gene prediction algorithm, our viewers and applications have many unique features useful for analyzing the transcript structure, gene function and expression pattern. ECgene DATABASE The ECgene algorithm has been applied to the human, mouse and rat genomes to produce transcriptomes that include alternative splicing events. ECgene transcriptome must be one of the most complete versions even though it contains many unreal transcripts. We classify the transcript models into three groups according to reliability. ECgene Part A represents transcripts of almost RefSeq quality with a clone evidence covering all exons of the transcript. Part B includes transcripts of slightly lower quality (but still highly probable). Their transcript structure requires concatenation of minimum two clones to recover all exons in the transcript. No single clone is available to be full-length for transcript model. All other transcripts belong to Part C and are of low reliability. Their gene structure may be questionable since it requires concatenation of more than two clones. However, individual event of alternative splicing implied in the transcripts should be real unless the genomic alignment of mRNA/EST sequences is wrong. They have a fair chance to be real transcripts with more sequence data. Statistics on the number of genes and transcripts is available at the website. WEB INTERFACE AND APPLICATIONS The ECgene website contains several independent but closely related applications—the summary viewer, genome browser, alignment viewer, ECfunction and ECexpression. The summary viewer shows the overall picture of the gene. Each application page provides more detailed information and diverse options to explore a specific aspect of the gene. Summary viewer The summary viewer shows the gene summary and the essence of other annotation programs. It supports querying by gene symbol, accession number of mRNA/EST and the ECgene ID. Figure Figure11
Links to various viewers are given next to the gene structure in Figure Figure1.1 Clicking on the variant in the picture or in the table opens the mRNA page. This page provides more detailed information on the selected transcript in a similar interface. Domain/motif information is added in the picture. ECgene genome browser The transcript structure and cluster members can be seen via the ECgene genome browser that adds ECgene models as custom tracks in the UCSC genome browser. Figure Figure22
ECfunction—transcript/domain/motif viewer ECfunction available at http://genome.ewha.ac.kr/ECgene/ECfunction/ collects information related to gene function. It includes the InterPro domains, transmembrane helices, signal peptides, BLAST hits and GO annotations. ECfunction has a unique transcript viewer as shown in Figure Figure3.3
Being on the mRNA coordinate, the transcript viewer can be used to show the functional domains and motifs. This enables the user to recognize immediately any loss of functional domains due to alternative splicing. ECexpression ECexpression available at http://genome.ewha.ac.kr/ECgene/ECexpression/ analyzes the gene expression pattern using ~260 SAGE and ~8600 cDNA libraries available publically. Our SAGE analysis is much different from the NCBI's SAGEmap (22) in that SAGE tags are extracted from the assembled transcript models, not from EST sequences in the cluster. The clustering is also different from the UniGene even though both are genome-based at this point. We explain here the output pictures briefly. Figure Figure44
Gene expression from cDNA libraries is quite similar except that the graph represents the number of EST sequences from the specified tissue in the cluster. We manually classified ~8600 human cDNA libraries in terms of tissue (organ), pathology and developmental stage. Gene expression predicted by using EST members is mostly qualitative since many cDNA libraries are normalized or subtracted to find genes with low expression level. However, the coverage of cDNA libraries is much more extensive as can be seen in the number of libraries available publically. FUTURE DIRECTIONS ECgene is an ongoing project. It aims to be one of the reference sites for genome annotation. A number of new features and applications are currently under development. Gene profiler, ortholog finder and ChimerDB are among the applications to be completed in the near future. We plan to expand and optimize the system resources to shorten the response time. Other model organisms will be added too. ACKNOWLEDGEMENTS We are grateful to the UCSC genome center for making such a wonderful resource available to the public. This work was supported by the Ministry of Science and Technology of Korea through the bioinformatics research program of MOST NRDP (M1-0217-00-0027) and the Korea Science and Engineering Foundation through the Center for Cell Signaling Research at Ewha Womans University. REFERENCES 1. Graveley B.R. (2001) Alternative splicing: increasing diversity in the proteomic world. Trends Genet., 17, 100–107. [PubMed] 2. Maniatis T. and Tasic,B. (2002) Alternative pre-mRNA splicing and proteome expansion in metazoans. Nature, 418, 236–243. [PubMed] 3. Black D.L. (2003) Mechanisms of alternative pre-messenger RNA splicing. Annu. Rev. Biochem., 72, 291–336. [PubMed] 4. Modrek B., Resch,A., Grasso,C. and Lee,C. (2001) Genome-wide detection of alternative splicing in expressed sequences of human genes. Nucleic Acids Res., 29, 2850–2859. [PubMed] 5. Kan Z., States,D. and Gish,W. (2002) Selecting for functional alternative splices in ESTs. Genome Res., 12, 1837–1845. [PubMed] 6. Johnson J.M., Castle,J., Garrett-Engele,P., Kan,Z., Loerch,P.M., Armour,C.D., Santos,R., Schadt,E.E., Stoughton,R. and Shoemaker,D.D. (2003) Genome-wide survey of human alternative pre-mRNA splicing with exon junction microarrays. Science, 302, 2141–2144. [PubMed] 7. Thanaraj T.A., Stamm,S., Clark,F., Riethoven,J.J., Le Texier,V. and Muilu,J. (2004) ASD: the Alternative Splicing Database. Nucleic Acids Res., 32, D64–D69. [PubMed] 8. Lee C., Atanelov,L., Modrek,B. and Xing,Y. (2003) ASAP: the Alternative Splicing Annotation Project. Nucleic Acids Res., 31, 101–105. [PubMed] 9. Brett D., Hanke,J., Lehmann,G., Haase,S., Delbruck,S., Krueger,S., Reich,J. and Bork,P. (2000) EST comparison indicates 38% of human mRNAs contain possible alternative splice forms. FEBS Lett., 474, 83–86. [PubMed] 10. Pospisil H., Herrmann,A., Bortfeldt,R.H. and Reich,J.G. (2004) EASED: Extended Alternatively Spliced EST Database. Nucleic Acids Res., 32, D70–D74. [PubMed] 11. Huang H.D., Horng,J.T., Lee,C.C. and Liu,B.J. (2003) ProSplicer: a database of putative alternative splicing information derived from protein, mRNA and expressed sequence tag sequence data. Genome Biol., 4, R29. [PubMed] 12. Gopalan V., Tan,T.W., Lee,B.T. and Ranganathan,S. (2004) Xpro: database of eukaryotic protein-encoding genes. Nucleic Acids Res., 32, D59–D63. [PubMed] 13. Krause A., Haas,S.A., Coward,E. and Vingron,M. (2002) SYSTERS, GeneNest, SpliceNest: exploring sequence space from genome to protein. Nucleic Acids Res., 30, 299–300. [PubMed] 14. Dralyuk I., Brudno,M., Gelfand,M.S., Zorn,M. and Dubchak,I. (2000) ASDB: database of alternatively spliced genes. Nucleic Acids Res., 28, 296–297. [PubMed] 15. Mironov A.A., Fickett,J.W. and Gelfand,M.S. (1999) Frequent alternative splicing of human genes. Genome Res., 9, 1288–1293. [PubMed] 16. Zavolan M., van Nimwegen,E. and Gaasterland,T. (2002) Splice variation in mouse full-length cDNAs identified by mapping to the mouse genome. Genome Res., 12, 1377–1385. [PubMed] 17. Clark F. and Thanaraj,T.A. (2002) Categorization and characterization of transcript-confirmed constitutively and alternatively spliced introns and exons from human. Hum. Mol. Genet., 11, 451–464. [PubMed] 18. Sugnet C.W., Kent,W.J., Ares,M.,Jr and Haussler,D. (2004) Transcriptome and genome conservation of alternative splicing events in humans and mice. Pac. Symp. Biocomput., 66–77. [PubMed] 19. Kim N., Shin,S. and Lee,S. (2004) ASmodeler: gene modeling of alternative splicing from genomic alignment of mRNA, EST and protein sequences. Nucleic Acids Res., 32, W181–W186. [PubMed] 20. Mulder N.J., Apweiler,R., Attwood,T.K., Bairoch,A., Barrell,D., Bateman,A., Binns,D., Biswas,M., Bradley,P., Bork,P. et al. (2003) The InterPro Database, 2003 brings increased coverage and new features. Nucleic Acids Res., 31, 315–318. [PubMed] 21. Boeckmann B., Bairoch,A., Apweiler,R., Blatter,M.C., Estreicher,A., Gasteiger,E., Martin,M.J., Michoud,K., O'Donovan,C., Phan,I. et al. (2003) The SWISS-PROT protein knowledgebase and its supplement TrEMBL in 2003. Nucleic Acids Res., 31, 365–370. [PubMed] 22. Lash A.E., Tolstoshev,C.M., Wagner,L., Schuler,G.D., Strausberg,R.L., Riggins,G.J. and Altschul,S.F. (2000) SAGEmap: a public gene expression resource. Genome Res., 10, 1051–1060. [PubMed] |
PubMed related articles
Your browsing activity is empty. Activity recording is turned off. |
|||||||||||
Trends Genet. 2001 Feb; 17(2):100-7.
[Trends Genet. 2001]Annu Rev Biochem. 2003; 72():291-336.
[Annu Rev Biochem. 2003]Nucleic Acids Res. 2001 Jul 1; 29(13):2850-9.
[Nucleic Acids Res. 2001]Science. 2003 Dec 19; 302(5653):2141-4.
[Science. 2003]Nucleic Acids Res. 2004 Jan 1; 32(Database issue):D64-9.
[Nucleic Acids Res. 2004]Nucleic Acids Res. 2000 Jan 1; 28(1):296-7.
[Nucleic Acids Res. 2000]Nucleic Acids Res. 2004 Jan 1; 32(Database issue):D59-63.
[Nucleic Acids Res. 2004]Nucleic Acids Res. 2004 Jan 1; 32(Database issue):D64-9.
[Nucleic Acids Res. 2004]Nucleic Acids Res. 2004 Jan 1; 32(Database issue):D70-4.
[Nucleic Acids Res. 2004]Nucleic Acids Res. 2002 Jan 1; 30(1):299-300.
[Nucleic Acids Res. 2002]Nucleic Acids Res. 2004 Jul 1; 32(Web Server issue):W181-6.
[Nucleic Acids Res. 2004]Nucleic Acids Res. 2003 Jan 1; 31(1):315-8.
[Nucleic Acids Res. 2003]Nucleic Acids Res. 2003 Jan 1; 31(1):365-70.
[Nucleic Acids Res. 2003]Genome Res. 2000 Jul; 10(7):1051-60.
[Genome Res. 2000]