Structures: User Question and Answer
Course Home Modules Schedule Exercises Comments Credits
Problem Summary:

Find the 3-dimensional structure for a protein of interest or for similar protein sequences. View residues in active site.

  Sample User Question
Analysis/Comments
Flow Chart
Step By Step Guide
Additional Notes
 

Sample User Question back to top

 
Is there a known (resolved) three-dimensional structure for the protein encoded by the human MLH1 gene? If not, are there similar protein sequences that have known structures?

Also, one of the known MLH1 mutations in colon cancer patients is of particular interest to me (GLY67TRP)? Is that mutation possibly in an active site of the protein?
 

Analysis/Comments back to top

Finding a resolved structure for a protein is the exception rather than the rule. This is true because the currently available >2.7 million protein sequence records far exceeds the available number of individual structure records, currently >20,000 in Entrez's Molecular Modeling Database (MMDB). (Tip on how to find statistics for Entrez databases.) However, the presence of a homologous structure can assist in the analysis of protein function.

Flow Chart back to top

RESOURCE USED: Entrez, MMDB, Cn3D

FEATURES HIGHLIGHTED: Protein neighbors for the MLH1 RefSeq record, then Structure links for that complete set of neighbors, various Cn3D features (see note about "Answers", below).

Step By Step Guide back to top

The "Links" menu for NP_000240 in Entrez Proteins does not include "Structure", indicating that this sequence record is not directly associated with a 3-D protein structure record. Several options exist to find possible homologous structures: (1) retrieve the approximately 600 related sequences for NP_000240 and then display the "Structure Links" for the complete set; (2) use BLink to graphically view the related sequences and then view only the subset that has 3-D structures; and (3) use the BLAST system to compare the NP_000240 protein sequence against all the protein sequences from PDB.

In this case, all three options retrieve the same set of six structures, although retrieval can sometimes vary because of the differences in the three systems. For example, BLAST might retrieve additional sequences, depending on the cutoff score used. BLink, on the other hand, might retrieve fewer sequences because it uses a non-redundant set of proteins, and it shows only the top 200 hits. We will use the first option in this example.

The first three structure links (1B62, 1BKN, 1B63) are from Escherichia coli, and the last three (1H7S, 1H7U, 1EA6) are from human. The latter were deposited by the Guarne lab and represent a free protein, a protein bound to AGP, and a protein bound to ADP, respectively. For this example, we will look at 1H7U to see what we might be able to discern from that structure about the sequence in NP_000240.

If the Cn3D program is already installed on the computer, the "View Structure" button will automatically open Cn3D. One window will display the three-dimensional structure of 1H7U, and a second window will display the corresponding protein sequences for protein chains A and B (referred to as 1H7U_A and 1H7U_B, respectively). From here, Cn3D offers a wide range of features that enable us to label residues, zoom in or out, render the structure in different styles, color the structure by various features, import and align a protein sequence from Entrez Proteins, and more.

In this example, use the "Style" menu to render the structure as "tubes" and change the coloring shortcut to "domains". The resulting pink and blue regions of 1H7U_A represent the HATPase and DNA mismatch repair domains, respectively. The brown and green regions represent the same domains in 1H7U_B. These colors correspond to the graphic summary of 1H7U in the Entrez Structure database. The Cn3D sequence alignment window also now colors the residues in 1H7U_A and 1H7U_B by domains.

Because we are interested in the relationship between the protein sequence in 1H7U and that in NP_000240, we can now import NP_000240 (gi 4557757) and align it to 1H7U_A. That is the protein chain identified by BLAST and BLink as being similar to NP_000240. The steps to import and align NP_000240 are provided below.

Steps to import and align NP_000240 with 1H7U_A

While viewing 1H7U in Cn3D 4.1:
  1. in the Sequence/Alignment Viewer window, select the menu item "Imports/Show Imports". This will cause the Import Viewer window to appear.

  2. in the Import Viewer window, select the menu item "Edit/Import Sequences".

  3. In the Select Chain dialog box, select 1H7U_A and click OK.

  4. In the Select Import Source dialog box, select "Network via GI/Accession" and click OK.

  5. In the Input Identifier dialog box, enter the accession NP_000240 and click OK. The new sequence will appear in the Import Viewer window.

  6. Select "Algorithms/BLAST single" and, using the crosshair, click anywhere on the sequence for NP_000240 to align it to 1H7U_A using the BLAST algorithm.

  7. To make the alignment appear in the Sequence/Alignment Viewer window, select the menu item "Alignments/Merge All" in the Import Viewer window.

  8. The alignment should now appear in the Sequence/Alignment Viewer window, and the coloring scheme changes to show the aligned residues in red. Dismiss the Import Viewer window, if desired.

  9. Reset the "Style/coloring shortcut" in the structure window to "domains", and set the mouse mode in the Sequence/Alignment Viewer window to "select rectangle".
Now, we can see the high degree of sequence alignment between NP_000240 and the pink-colored residues of the HATPase domain in 1H7U_A.

Given this alignment, how might the observed Gly67Trp substitution in NP_000240 affect its structure, based on the view of the homologous structure? In the alignment window, mouse over the NP_000240 residues until the grey footer bar of the sequence shows "gi 4557757, loc 67". Click on the corresponding glycine residue in 1H7U_A (loc 74) to highlight it. In the structure window, use the left mouse button to spin the 3D structure until you can clearly see and identify the highlighted residue. Is it possibly in the active site? For example, is it within 5 Angstroms of the AGP molecule? To find out, remove the highlighting from residue #74 of 1H7U_A by clicking on any residue in NP_000240 in the sequence alignment window. Going back to the structure window, double click on the Mg-containing AGP to highlight it. Then use the menu bar option called "Show/Hide|Select By Distance|Residues Only" to highlight all residues within 5 Angstroms (or other desired distance) of the AGP. Indeed, the glycine at position #74 is within 5 Angstroms and is likely part of the active site for this energy-producing domain. This hints at the possible problems a Gly -> Trp mutation might cause at that position.

Additional Notes back to top

This exercise is also narrated as part of Entrez tutorial:
  • Geer RC, Sayers EW. 2003. Entrez: making use of its power. Brief Bioinform., 4(2):179-84 (June). PMID: 12846398
    The Entrez Tutorial page provides a brief summary of the article and a link to the full text *.pdf file.
Please note that the search results (number of hits) noted in the article reflect the data that were available as of March 2003. The number of search hits will change as the databases grow, but the general search concepts will continue to apply.


Structures: User Question Return to Slides Revised 11/06/2007
Return to Colon Cancer Umbrella Page