In this issue

PubMed Abstract Plus

CD Tree and Cn3D Release

Whole Genome Shotgun Growth

New BLAST View Options

New Genome Builds–Map Viewer

New Organisms in UniGene

RefSeq Release 22

GenBank Release 158

NCBI Courses

Submissions Corner

PubChem Grows to 15 Million


New Database and View Options for Nucleotide BLAST Services

The nucleotide-nucleotide BLAST pages linked to the BLAST homepage now offer the human and mouse transcript and genomic reference sequences as database options (Figure 1).

click for larger image

Click on image to view larger

Figure 1. New database and display options for NCBI Web BLAST services. A. The human and mouse genomic and transcript databases can now be selected using the radio buttons on the BLAST form. The traditional BLAST databases are available through the pull-down list once the "Others (nr etc.)" radio button is selected. B. The improved display with the sorting capabilities is invoked through the "New View" checkbox that is now selected by default on the "Format" section of the nucleotide BLAST forms.

A new output display for all searches provides a more organized presentation of the search results and has powerful new sorting options (Figure 2).

click for larger image

Click on image to view larger

Figure 2. New BLAST output display. The results of a search with the crab-eating macaque (Macaca fascicularis) CDC20 mRNA (AB168636) against the default human genomic plus transcript database.

RID: 1168282777-19032-214373696615.BLASTQ3

A. The descriptions section of the output is a table with separate sections for Transcripts and Genomic Sequences. The search finds the corresponding human RefSeq transcript. It also finds matches to chromosomes 9 and 1 in both the reference and alternate assemblies of the human genome. The local metrics, E value and Max Score, identify the best match as the retrocopy psuedogene on chromosome 9. The intronless structure of the pseudogene gives a single high-scoring match. The total score metric makes it easy to find the functional eleven-exon gene on chromosome 1 even though the small individual exon matches give lower scores than the pseudogene. B. The alignment of the macaque mRNA to a contig on chromosome 1 from the reference human genome. Segments are sorted by "Query start position" to show the matches to the exons in order of position rather than E-value. C. Alignments displayed in the human map viewer are available by following the linked genomic sequence identifiers.

In addition to the traditional sorting by Expect Value, results can now be sorted by Maximum Score, Total Score, Percent Query Coverage, and Maximum Percent Identity. The Box below provides definitions for these metrics. Sorting options are also available for multiple hits within the alignments for each subject (database) sequence with the additional option of sorting by query or subject position-an option especially useful for ordering potential exons on genomic matches (Figure 2 B). Matches to genomic sequences in the new genome and transcript databases can now be displayed in the mouse and human Map Viewers as with the specialized genomic BLAST services (Figure 2 C).

These new database and display options provide rapid access to the most popular annotated genomes at the NCBI and expand the power of BLAST as a genome annotation tool.

Local Extreme Metrics

These measures treat each aligned segment independently. Where there are multiple matches to the same subject (database) sequence, only the metric for the best match is considered. The E(xpect) Value is the traditional BLAST statistic used to sort output by significance.

E(xpect) Value
the number of alignments expected by chance with a particular score or better. The expect value is the default sorting metric and normally gives the same sorting order as Max Score.

Max(imum) Score


the highest alignment score of a set of aligned segments from the same subject (database) sequence. The score is calculated from the sum of the match rewards and the mismatch, gap open and extend penalties independently for each segment. This normally gives the same sorting order as the E Value.

Max(imum) Identity

the highest percent identity for a set of aligned segments to the same subject sequence.

Total Metrics 

These metrics are summed over or include all aligned segments for the same subject sequence. These are most useful for analyzing BLASTmatches to genomic sequences.

Tot(al) Score

the sum of alignment scores of all segments from the same subject sequence. This sorting order may help promote the position of mRNA matches to genomic sequences where there are multiple exons. The Total Score is useful for distinguishing hits to functional multi-exon genes from those to the corresponding intronless retrotransposed psuedogenes (Figure 2).

Query Coverage

the percent of the query length that is included in the aligned segments. This is calculated over all segments as with the Tot Score.

back to previous articleContinue to next article

NCBI News | Fall/Winter 2002 NCBI News: Spring 2003