|
|
 |
 
BLAST Your Way Into the Human Genome
BLAST
searches can be run against the latest NCBI Human Genome assembly using
Human Genome BLAST available from www.ncbi.nlm.nih.gov/BLAST/.
In this BLAST example, we will try to locate the human homologue of the
mouse Brca2 gene. Because amino acid sequence is more strongly conserved
across species than nucleotide sequence, we will perform a tblastn search
using the protein sequence for mouse Brca2 (NP_033895) as our query. Next,
we will look for a region of the human genome which, when translated,
matches the protein sequence of mouse Brca2. Here will use the default
database genome in order to search against the human genomic
sequence. Other database options include mrna and protein,
which allow searches of NCBI-predicted mRNA and protein sequences respectively.
Searches of the human genome are filtered for low-complexity and repetitive
elements by default, and there is no need to change this setting for this
search.
The human genome BLAST search results returned are much like those returned
from a conventional BLAST search, with the exception that the hits are
only to NCBI-constructed contig sequences rather than to user-submitted
GenBank sequences. A button is provided that allows a BLAST user to view
hits on the human genome. When this button is pressed, a Human Genome
Map View is generated, showing the positions of the BLAST hits on the
Contig and Genes_ seq maps, as shown in Figure 1. On the Genes_ seq map,
one sees a thin line representing the BRCA2 gene, interspersed with thick
segments, representing exons. To the right of this line is an array of
line segments representing BLAST hits, color coded by quality as indicated
on the scale at the top of the display. Note that these BLAST hits appear
to track closely the exons of BRCA2. Links to BLAST hit lead
to a conventional pairwise alignment display.

Figure
1: Human Genome BLAST
Hits: Genome View
The two most extensive matches, receiving the largest BLAST scores, are
to two large exons found towards the bottom of the display. However, the
percent identities for these two hits are 49% and 54% respectively, two
of the lowest percentages shown. These two regions are also well-populated
with SNPs, as seen in Figure 1 of Tour the Human Genome in
this issue. Although SNP sampling bias in these regions cannot be ruled
out, the variation within the human gene in this region is consonant with
the greater degree of variation between the mouse and human in these regions.
| The
BLAST Lab feature is intended to provide detailed technical
information on some of the more specialized uses of the BLAST
family of programs. Topics are selected from the range of questions
received by the BLAST Help Group. |
|

|