NCBI Logo NCBI News Masthead





In this issue...

Model Maker

Virus Reference
Sequences


Release of the
1,000th Virus
Reference Genome


New MapViewer
Displays


Other Genomic
MapViews


Mouse Genome
BLAST


New Genomes
in GenBank


Organism-Specific
BLAST


ProtEST

Trace Archive
Expands


BLAST Lab

Find Out
“About NCBI”


New FTP
Hierarchy


Barbara Rapp
Leaves NCBI


Masthead





Searching Finished and Unfinished Microbial Genomes


The genomes of microbes contain gene sequences required for life under a variety of conditions such as those of harsh temperature and pH, as well as those privileged conditions experienced by intracellular parasites. A comparison of the gene complements needed by organisms living under different environmental constraints is important in elucidating the mechanisms of pathogenesis and in defining the genetic diversity of life on earth.

There are now over 80 complete bacterial and archaeal genomes in GenBank and about an equal number of unfinished genomes for which sequencing is in progress. The unfinished genomes have not been deposited in GenBank and are therefore not available for downloading or for conventional BLAST searching. However, because of the importance of this set of genome sequences, NCBI offers the Microbial Genomes BLAST page for similarity searches of both finished sequence now in GenBank, and unfinished microbial genomic sequence provided by sequencing centers prior to publication.

As an example, consider the protein NP_248745, a hypothetical protein from Pseudomonas aeruginosa that is conserved among three organisms: Mesorhizobium loti, Caulobacter crescentus, and Pseudomonas aeruginosa. The hierarchal taxonomic tree of Figure 1, generated by clicking on the link at the top of the Microbial Genomes BLAST page, places the first two organisms in the alpha proteobacterial lineage and the third in the gamma lineage. It might be of interest to determine if a similar protein is also found in the beta proteobacteria. This search is easily performed by using NP_248745 as a tblastn query for a search against all of the beta proteobacterial genomic nucleotide sequences. The beta proteobacteria are selected as the BLAST database by clicking on the appropriate node of this tree.



Figure 1: Genomes Tree display of the bacterial genomes offered for searching on the Microbial Genomes BLAST page. Clicking on a node shows the organisms included.

The one-line BLAST descriptions returned by this search, shown in Figure 2, indicate a number of good hits to sequences from the beta proteobacteria, as indicated by the low Expect values given. Hence, this conserved protein from the gamma and alpha proteobacteria has a potential homolog in some of the beta proteobacteria. Note that 4 of the 5 hits are to incomplete genomes, not in GenBank, and searchable at NCBI only through this interface. The Microbial Genomes BLAST page is linked from the main NCBI BLAST page.


Sequences producing significant alignments:
gnl|DOE_134537|Contig394 Burkholderia fungorum unfinished f...
gnl|TIGR_13373|24 Burkholderia mallei unfinished fragment o...
gnl|Sanger_28450|bpsmalle_Contig784 Burkholderia pseudomall...
gnl|DOE_119219|Contig600 Ralstonia metallidurans unfinished...
ref|NC_003296.1| Ralstonia solanacearum, complete genome
(bits)
145
136
136
135
121
Value
3e-35
1e-32
1e-32
2e-32
5e-28

Figure 2: Highest-scoring BLAST hits resulting from a search using NP_248745 as a tblastn query against all of the beta proteobacterial genomic nucleotide sequences.


Continue Link


NCBI News | Winter 2001

NCBI News