Documentation
PubMed Entrez BLAST OMIM Taxonomy Structure

 

Contents:

Purpose

Graphical display

Text Alignment display

Background

 

Purpose

The evidence viewer is designed to display the biological evidence supporting a particular gene model. It displays all RefSeq models, GenBank mRNAs, annotated known or potential transcripts, and ESTs that align to the area of interest.


Graphical Display

The first line in the graphical display represents the region of the genomic contig that is covered by the gene given. The numbers just above the blue bar are the coordinates of the 5' and 3' extremes of the gene on the contig. If the leftmost number is larger than the rightmost number, the genomic sequence is being shown in reverse orientation (because the RefSeqs and mRNAs are always shown in the forward orientation).

The next set of lines (red/black) represents the XM model mRNAs. XM sequences are produced by NCBI's Genome Annotation Project and represent the known or potential transcripts of a gene. For more on XMs, see the Genome Guide. The baseline shows the extent of each XM, with the taller blocks representing exons. Because of the way the sequences are mapped to the pixels of the display, exons can end up on top of each other. When this happens, the exon is colored black.

The next lines (purple/orange) represent the RefSeq mRNA models (NM accessions) as well as other GenBank mRNAs that map to the region. RefSeqs will have accessions starting with NM_; the other sequences will have non-NM accession numbers. See the Reference sequence homepage for more information about RefSeqs. As with the XMs, the NM and GenBank mRNA exons may be mapped on top of each other; these double exons are colored orange.

Depending on the arguments used (do_est present or absent), the next line may show the EST density along the region of interest. ESTs are not shown as individual alignments for space purposes; instead, they are displayed as a shaded line, darker shades indicating more ESTs.

Following the NM and GenBank mRNA lines are three lines indicating the positions of any mismatches, insertion/deletions (indels), or NM/mRNA sequence gaps in any of the NM/mRNA alignments for a given exon. Mousing over the vertical bars will display the number of the exon that the indel/mismatch/gap is contained in; click on the mouseover or the links just below the graphic go to these exons. The differences shown are differences between the NMs/mRNAs and the genomic sequence because the XMs are derived from the genomic sequence they match it exactly. The gaps are regions of the NM or mRNA sequences that were not aligned to the genomic sequence.


Alignment Display

Below the graphic display are exon-by-exon alignments of all of the NM and mRNA sequences with the genomic sequence. If there are multiple NMs and mRNAs for an exon, the alignment is a pairwise multiple display. The exons are shown in biological order (5' to 3'), so they may be backwards with respect to the genomic coordinates. For each exon, the flanking genomic sequence is shown on each side to reveal the presence or absence of splice sites. Also, any proteins annotated on the NMs and mRNAs are shown. Nucleotide mismatches are shown in red; if an NM or mRNA nucleotide does not match the corresponding genomic nucleotide, it is colored red. Also, protein mismatches are colored in light blue; if both the genomic sequence and the NM or mRNA have proteins annotated, and these proteins differ, the mismatched genomic amino acids will be light blue. Annotated proteins, derived from the coding regions annotated on the contig (including XMs), are shown on the genomic sequence along with the number of annotations that contributed to each reading frame shown (in parentheses following frame, e.g. 'frame(3)' means that three identical proteins are shown).


Background

The input to the Evidence Viewer is a contig accession number and a gene symbol, locus id, or range. The program fetches the GenBank record for the contig and locates the requested gene or range on that contig. If the requested gene is not on the given contig, the viewer exits with an error message. If the gene is found, the program finds the 5' and 3' limits of the gene on the contig and then looks in a database to find all RefSeq and GenBank models that are contained within those limits. Once the records are found, the Evidence Viewer aligns them, one by one, to the contig.