| PubMed | Nucleotide | Protein | Genome | Structure | Taxonomy |
| Apis mellifera (honey bee) gene prediction using Gnomon | Revised August 17, 2006 |
| Gnomon is a gene prediction program that allows for the prediction and annotation of genes for any genome but is particularly valuable for previously unannotated genomes. This page describes the specific process and parameters used for the Apis mellifera Gnomon gene prediction and annotation. Upon familiarization, return to the Apis mellifera genome overview page or stop by the Map Viewer home page, where you can search the genome data of any organism represented in Map Viewer. |
|
|
| Training Set for Apis mellifera |
|
|
The Gnomon gene prediction program is described in general principle, and in relation to the NCBI genome assembly and annotation pipeline, in other NCBI documentation. The following description is for the A. mellifera-specific Gnomon processing.
With a limited amount of bee-specific transcript information and an early version of the genome assembly, a new set of parameters and processing trials were required to generate a training set for Gnomon. Blastx hits against the A. mellifera genomic sequence from version 1.0 of the honey bee assembly (Amel_1.0) from Baylor College of Medicine were done for all FlyBase Drosophila proteins and the best non-overlapping chains were selected. There were about 9500 such chains, and they were considered as "seeds" for bee genes. The number of chains in this set provides an indication of how many genes one might expect to find on the honey bee genome. These hits and the Drosophila parameters were used for a Gnomon run on the repeat-masked genome (Amel_1.0). This initial run produced 8214 models, 6167 of which had either protein and/or transcribed support. The supported models were used for the parameters training. For subsequent honey bee genome assemblies, the same set of bee-specific parameters was used. |
| Gnomon Gene Prediction for A. mellifera |
|
|
For gene prediction on version 2.0 of the honey bee genome assembly (Amel_2.0), we used the concept of a target protein set for the first time. The target set is a collection of high quality proteins whose homologs we expect to find in the genome. For the bee genome, we opted to use a) Human RefSeqs, b) Drosophila RefSeqs and c) all insect cDNAs containing annotated proteins from the GenBank database. These proteins were aligned on the genome before the annotation.
The target proteins and available bee transcript alignments were chained and combined together. Overlapping and consistent chained alignments with a full-length CDS were grouped together forming genes with alternatively spliced isoforms. All other overlapping chained alignments were filtered, forming a set of best-scoring nonoverlapping alignments. These alignments were fed into Gnomon where they were extended/combined using the ab initio prediction technique. All obtained models were aligned against our search protein set. This set includes a much wider selection of eukaryotic proteins. All found proteins were aligned back on the genome and the prediction steps were repeated using all alignments (target set + search set + transcripts). All models that were built using some alignment information were selected for our Gene track. If a Gnomon model overlaps with a honey bee RefSeq, the latter takes precedence. For the Amel_2.0 honey bee genome assembly, Gnomon produced 9400 genes and 10087 models. There were 403 genes with 1090 alternatively spliced variants. Gnomon produced a total of 6626 supported models. These numbers reflect what is shown in the Ab initio Models map in Map Viewer for the previous assembly Amel_2.0. For Amel_4.0, Gnomon produced 16206 genes and 16693 models. Once again, many of the gene predictions include alternatively spliced variants. Gnomon produced a total of 9049 supported genes and 10047 supported models. These numbers reflect what is shown in the Ab initio Models map in Map Viewer for the current assembly Amel_4.0. |
|
|
| Predicted Models |
|
| All Gnomon predictions from the second iteration with a score above an established threshold were selected as models for annotation and were included in the Apis mellifera Map Viewer Model map. To display the Model track in Map Viewer, select "Ab initio" from the list of sequence-based maps in "Maps and Options". A summary of the features and objects making up the maps in the current honey bee genome build can be viewed on the statistics page. |
| Questions or comments: Write to NCBI Service Desk |