In this issue

Open Mass Spectrometry Search Algorithm (OMSSA)

Probe Database Debut

New Structure Link from Protein

BLAST Download Update

New Microbial Genomes in GenBank

Nucleotide Database Splits

NCBI 4-Pack Course

RefSeq Release 14

New Organisms in UniGene

GenBank Passes 100 Gigabases

New BLAST Formatter

Splign Alignment Tool

GenBank Release 150

New Genome Builds

Submission Corner


New BLAST Formatter for BLAST Web Services—Old Formatter Retired

The enhanced BLAST formatter first announced as an option in the Summer/Fall 2004 NCBI News has now replaced the old formatter for the BLAST Web service. The new formatter has several features that make BLAST output more informative and easier to interpret. These include improvements to the graphical overview, reporting of features in database sequences, and clearer displays of low-complexity and other filtered regions in the query sequence.

The redesigned graphical overview is assembled within cells of an HTML table, allowing most browsers to easily print it. The new overview connects hits from the same database sequence with a crisp, thin line, making features like exon-intron structure or repeated domains more apparent. The overview of Figure 1 shows two regions of alignment, corresponding to exon sequences, between chimpanzee beta-2-microglobulin mRNA and the human genomic sequence query.

Figure 1. The new BLAST graphical overview. Thinner lines connect the two matches to the human beta-2-microglobulin of (query) exons from the chimpanzeee mRNA sequence.

The new formatter also simplifies the interpretation of hits to large database sequences bearing many annotated features by giving links to the features that lie within or close to the match. These links, given for database sequences in excess of 200Kb in length, highlight associations between regions of alignment and biological features such as genes and repeat regions. The BLAST output of Figure 2 shows the match to the albumin gene in a whole genome shotgun supercontig from the dog genome. Use the link to serum albumin to generate a display of the relevant portion of NW_876257, the two megabase supercontig.

Figure 2. Alignment from a translating (tblastn) search of the human albumin protein against the dog genome. The new formatter display indicates that this hit lies within the annotation for the albumin gene on the supercontig, NW_87627.

Perhaps the biggest improvement provided by the new formatter lies in its handling of masked, low-complexity regions within the query sequence. Low-complexity regions are those with biased amino acid or nucleotide compositions and are usually masked prior to a search in order to provide more meaningful alignments. In traditional BLAST output, one-letter codes for masked amino acids and nucleotides are replaced with X's and n's, respectively. The new formatter allows masked residues, to be displayed in lower case, in order to preserve the identities of the masked residues and in color for better highlighting. Matches within filtered regions are now taken into account when computing the percent identity for an alignment. The lower case option and the mask color selection are available in the ‘Format’ section of the BLAST submission or formatting pages. Figure 3 shows the new masking displays for the default replacement masking and the lower case masking that retains the original query sequence residues.

Figure 3. BLAST protein alignments containing low complexity sequence. The upper alignment shows the default replacement masking. The lower alignments shows the lower-case masking option that preserves the query sequence in the output.

For questions regarding the BLAST services, please contact the BLAST help desk.

back to previous articleContinue to next article

NCBI News | Fall/Winter 2002 NCBI News: Spring 2003