Open Mass Spectrometry Search Algorithm (OMSSA)
Probe Database Debut
New Structure Link from Protein
BLAST Download Update
New Microbial Genomes in GenBank
Nucleotide Database Splits
NCBI 4-Pack Course
RefSeq Release 14
New Organisms in UniGene
GenBank Passes 100 Gigabases
Splign Alignment Tool
GenBank Release 150
New Genome Builds
The new formatter also simplifies the interpretation of hits to large database sequences bearing many annotated features by giving links to the features that lie within or close to the match. These links, given for database sequences in excess of 200Kb in length, highlight associations between regions of alignment and biological features such as genes and repeat regions. The BLAST output of Figure 2 shows the match to the albumin gene in a whole genome shotgun supercontig from the dog genome. Use the link to serum albumin to generate a display of the relevant portion of NW_876257, the two megabase supercontig.
Perhaps the biggest improvement provided by the new formatter lies in its handling of masked, low-complexity regions within the query sequence. Low-complexity regions are those with biased amino acid or nucleotide compositions and are usually masked prior to a search in order to provide more meaningful alignments. In traditional BLAST output, one-letter codes for masked amino acids and nucleotides are replaced with X's and n's, respectively. The new formatter allows masked residues, to be displayed in lower case, in order to preserve the identities of the masked residues and in color for better highlighting. Matches within filtered regions are now taken into account when computing the percent identity for an alignment. The lower case option and the mask color selection are available in the ‘Format’ section of the BLAST submission or formatting pages. Figure 3 shows the new masking displays for the default replacement masking and the lower case masking that retains the original query sequence residues.