|
|
Searching Finished and Unfinished Microbial Genomes
The genomes of microbes contain gene sequences
required for life under a variety of conditions such as those of harsh
temperature and pH, as well as those privileged conditions experienced
by intracellular parasites. A comparison of the gene complements needed
by organisms living under different environmental constraints is important
in elucidating the mechanisms of pathogenesis and in defining the genetic
diversity of life on earth.
There are now over 80 complete bacterial
and archaeal genomes in GenBank and about an equal number of unfinished
genomes for which sequencing is in progress. The unfinished genomes have
not been deposited in GenBank and are therefore not available for downloading
or for conventional BLAST searching. However, because of the importance
of this set of genome sequences, NCBI offers the Microbial Genomes BLAST
page for similarity searches of both finished sequence now in GenBank,
and unfinished microbial genomic sequence provided by sequencing centers
prior to publication.
As an example, consider the protein NP_248745, a hypothetical protein
from Pseudomonas aeruginosa that is conserved among three organisms:
Mesorhizobium loti, Caulobacter crescentus, and Pseudomonas
aeruginosa. The hierarchal taxonomic tree of Figure 1, generated by
clicking on the link at the top of the Microbial Genomes BLAST page, places
the first two organisms in the alpha proteobacterial lineage and the third
in the gamma lineage. It might be of interest to determine if a similar
protein is also found in the beta proteobacteria. This search is easily
performed by using NP_248745 as a tblastn query for a search against all
of the beta proteobacterial genomic nucleotide sequences. The beta proteobacteria
are selected as the BLAST database by clicking on the appropriate node
of this tree.
Figure 1: Genomes
Tree display of the bacterial genomes offered for searching on the Microbial
Genomes BLAST page. Clicking on a node shows the organisms included.
The one-line BLAST descriptions returned by this search, shown in Figure
2, indicate a number of good hits to sequences from the beta proteobacteria,
as indicated by the low Expect values given. Hence, this conserved protein
from the gamma and alpha proteobacteria has a potential homolog in some
of the beta proteobacteria. Note that 4 of the 5 hits are to incomplete
genomes, not in GenBank, and searchable at NCBI only through this interface.
The Microbial Genomes BLAST page is linked from the main NCBI BLAST page.
Sequences producing
significant alignments:
gnl|DOE_134537|Contig394 Burkholderia
fungorum unfinished f...
gnl|TIGR_13373|24 Burkholderia mallei unfinished fragment
o...
gnl|Sanger_28450|bpsmalle_Contig784 Burkholderia pseudomall...
gnl|DOE_119219|Contig600 Ralstonia metallidurans unfinished...
ref|NC_003296.1| Ralstonia solanacearum, complete genome
|
(bits)
145
136
136
135
121
|
Value
3e-35
1e-32
1e-32
2e-32
5e-28
|
|
Figure 2: Highest-scoring
BLAST hits resulting from a search using NP_248745 as a tblastn query
against all of the beta proteobacterial genomic nucleotide sequences.
|