Refined analysis of an existing gene annotation. (A, B and C) depict three stages in the annotation of gene structure for a region of A.thaliana chromosome five. In each panel, spliced alignments and inferred (established) gene structures are represented by arrows extending from the first exon to the last, pointing in the most probable direction of transcription. Exons are represented as boxes connected by introns shown as single lines. (A) In this display, which is available for all A.thaliana genome regions at http://www.plantgdb.org/AtGDB/, spliced alignments originating from native (Arabidopsis) ESTs are shown in red. Following the AtGDB convention, ESTs are identified by their GenBank gi number, 5′-ESTs are marked by a green dot, 3′-ESTs are marked by a blue arrow and clone pair ESTs are bounded by a green box. In this example, there are only two matching ESTs, representing the 5′- and 3′-sequences of RIKEN clone RAFL07-12-J11. The current gene annotations, as established by AGI (Arabidopsis Genome Initiative), are shown in blue, with start and stop codons labeled with green and red triangles, respectively. (B) This graphic, generated using the GeneSeqer@PlantGDB web service, summarizes the results of spliced alignment using the PlantGDB ‘All Plants’ EST and cDNA collections. Spliced alignments of ESTs and cDNAs, depicted in red, can be attributed to the following sources: cDNA: 1 Glycine max; ESTs: 6 Medicago truncatula, 5 Glycine max, 2 A.thaliana, 1 Lotus japonicus, 1 Sorghum bicolor, 1 Triticum aestivm. Alternative gene structures are shown in green (representing consistent predictions from multiple ESTs) and long open reading frames in the predicted gene structures are indicated in orange. Established gene annotation, as reported in GenBank, is shown in blue. (C) All gene structures shown are the results of GeneSeqer@PlantGDB spliced alignments using putative homologous proteins of non-Arabidopsis origin. (D) summarizes the most probable gene structure prediction for this region. The blue, orange, green and red structures represent respectively the established gene annotation, the longest predicted open reading frame, the predicted gene structure and the consensus transcribed sequence spliced alignment (derived from B). The purple structure represents the alignment with the most closely related protein, a Drosophila protein (GenBank gi: 20177035) with high similarity to vertebrate transportin-SR proteins.