Problem Summary:
Find the 3-dimensional structure for a
protein of interest or for similar protein sequences. View residues in active
site.
|
| Sample User Question |
 |
|
| |
Is there a known (resolved) three-dimensional structure for the protein
encoded by the human MLH1 gene? If not, are there similar protein sequences that
have known structures?
Also, one of the known MLH1 mutations in colon cancer patients is of particular
interest to me (GLY67TRP)? Is that mutation possibly in an active site of the
protein?
|
|
|
|
| Analysis/Comments |
 |
|
Finding a resolved structure for a protein is the exception rather than the rule.
This is true because the currently available >2.7 million protein sequence records
far exceeds the available number of individual structure records, currently
>20,000 in Entrez's Molecular Modeling Database (MMDB). (Tip on how
to find statistics for Entrez databases.) However, the presence of
a homologous structure can assist in the analysis of protein function.
|
| Flow Chart |
 |
RESOURCE USED: Entrez, MMDB, Cn3D
FEATURES HIGHLIGHTED: Protein neighbors for the MLH1 RefSeq
record, then Structure links for that complete
set of neighbors, various Cn3D features
(see note about "Answers", below).
|
| Step By Step Guide |
 |
The "Links" menu for NP_000240 in Entrez Proteins does not include "Structure",
indicating that this sequence record is not directly associated with a 3-D protein
structure record. Several options exist to find possible homologous structures:
(1) retrieve the approximately 600 related sequences for NP_000240 and then
display the "Structure Links" for the complete set; (2) use BLink to graphically
view the related sequences and then view only the subset that has 3-D structures;
and (3) use the BLAST system to compare the NP_000240 protein sequence against all
the protein sequences from PDB.
In this case, all three options retrieve the same set of six structures, although
retrieval can sometimes vary because of the differences in the three systems. For
example, BLAST might retrieve additional sequences, depending on the cutoff score
used. BLink, on the other hand, might retrieve fewer sequences because it uses a
non-redundant set of proteins, and it shows only the top 200 hits. We will use
the first option in this example.
The first three structure links (1B62, 1BKN, 1B63) are from Escherichia
coli, and the last three (1H7S, 1H7U, 1EA6) are from human. The latter were
deposited by the Guarne lab and represent a free protein, a protein bound to AGP,
and a protein bound to ADP, respectively. For this example, we will look at 1H7U
to see what we might be able to discern from that structure about the sequence in
NP_000240.
If the Cn3D program is already installed on the computer, the "View Structure"
button will automatically open Cn3D. One window will display the
three-dimensional structure of 1H7U, and a second window will display the
corresponding protein sequences for protein chains A and B (referred to as
1H7U_A and 1H7U_B, respectively). From here, Cn3D offers a wide range of features
that enable us to label residues, zoom in or out, render the structure in
different styles, color the structure by various features, import and align a
protein sequence from Entrez Proteins, and more.
In this example, use the "Style" menu to render the structure as "tubes" and
change the coloring shortcut to "domains". The resulting pink and blue regions of
1H7U_A represent the HATPase and DNA mismatch repair domains, respectively. The
brown and green regions represent the same domains in 1H7U_B. These colors
correspond to the graphic summary of 1H7U in the Entrez Structure database. The
Cn3D sequence alignment window also now colors the residues in 1H7U_A and 1H7U_B
by domains.
Because we are interested in the relationship between the protein sequence in 1H7U
and that in NP_000240, we can now import NP_000240 (gi 4557757) and align it to
1H7U_A. That is the protein chain identified by BLAST and BLink as being similar
to NP_000240. The steps to import and align NP_000240 are provided below.
Steps to import and align NP_000240 with 1H7U_A
While viewing 1H7U in Cn3D 4.1:
- in the Sequence/Alignment Viewer window, select the menu item
"Imports/Show Imports". This will cause the Import Viewer window to
appear.
- in the Import Viewer window, select the menu item "Edit/Import
Sequences".
- In the Select Chain dialog box, select 1H7U_A and click OK.
- In the Select Import Source dialog box, select "Network via GI/Accession"
and click OK.
- In the Input Identifier dialog box, enter the accession NP_000240 and
click OK. The new sequence will appear in the Import Viewer window.
- Select "Algorithms/BLAST single" and, using the crosshair, click anywhere on
the sequence for NP_000240 to align it to 1H7U_A using the BLAST algorithm.
- To make the alignment appear in the Sequence/Alignment Viewer window,
select the menu item "Alignments/Merge All" in the Import Viewer window.
- The alignment should now appear in the Sequence/Alignment Viewer window, and
the coloring scheme changes to show the aligned residues in red. Dismiss the
Import Viewer window, if desired.
- Reset the "Style/coloring shortcut" in the structure window to "domains", and
set the mouse mode in the Sequence/Alignment Viewer window to "select
rectangle".
Now, we can see the high degree of sequence alignment between NP_000240 and the
pink-colored residues of the HATPase domain in 1H7U_A.
Given this alignment, how might the observed Gly67Trp substitution in NP_000240
affect its structure, based on the view of the homologous structure? In the
alignment window, mouse over the NP_000240 residues until the grey footer bar of
the sequence shows "gi 4557757, loc 67". Click on the corresponding glycine
residue in 1H7U_A (loc 74) to highlight it. In the structure window, use the left
mouse button to spin the 3D structure until you can clearly see and identify the
highlighted residue. Is it possibly in the active site? For example, is it within
5 Angstroms of the AGP molecule? To find out, remove the highlighting from
residue #74 of 1H7U_A by clicking on any residue in NP_000240 in the sequence
alignment window. Going back to the structure window, double click on the
Mg-containing AGP to highlight it. Then use the menu bar option called
"Show/Hide|Select By Distance|Residues Only" to highlight all residues within 5
Angstroms (or other desired distance) of the AGP. Indeed, the glycine at position
#74 is within 5 Angstroms and is likely part of the active site for this
energy-producing domain. This hints at the possible problems a Gly -> Trp
mutation might cause at that position.
|
| Additional Notes |
 |
This exercise is also narrated as part of Entrez tutorial:
- Geer RC, Sayers EW. 2003. Entrez: making use of its power.
Brief Bioinform., 4(2):179-84 (June). PMID: 12846398
The Entrez
Tutorial page provides a brief summary of the article and a link to the full
text *.pdf file.
Please note that the search results (number of hits) noted
in the article reflect the data that were available as of March 2003. The number
of search hits will change as the databases grow, but the general search concepts
will continue to apply.
|
|