NCBI logo
module of the MLA course on Introduction to Molecular Biology Information
Course Home Modules Schedule Exercises Comments Credits
Slide 1 Previous Next Slide List

Using the Back Doors:
Finding structures in the absence
of an exact query match

It was easy to retrieve a structure for the P53 tumor suppressor because the structure of that protein had already been resolved and submitted to a public database. However, because of the discrepancy noted earlier between the number of protein sequences and the number of structure records available, "front door" search strategies do not work for many proteins. That is the case for the "gene stories" of MLH1 and PER2 that we have been following through the course. In the following slides, we'll use the "back door" approaches summarized below:

If a protein sequence DOES NOT HAVE a resolved structure:
Back Door #1 Find similar protein sequences and see if any of them have a resolved 3-D structure. *

MLH1 (human colon cancer)

retreive RefSeq Protein NP_000240, display Related Sequences and then Structure Links *
Back Door #2 Identify the conserved domain(s) in the protein of interest and view the 3-D structure of the domain(s).

PER2 (human period 2 protein)

retreive RefSeq Protein NP_073728, display Conserved Domains, and follow links for the PAS domain to display its 3-D structure
More "back doors" exist -- these are just examples. They simply shed light on possible structure of a protein of interest. The actual structure must of course be confirmed experimentally.

* Just for your reference, after the course -- There are actually a few different versions of Back Door #1, i.e., a few ways you can find similar protein sequences that have resolved 3-D structures.

  1. protein sequence -> related sequences -> structure links (shown above in back door #1)
  2. protein sequence -> related structures (this is actually a shortcut of the approach above and just bypasses the middle step; it also displays output in a somewhat different way, as described below.)
  3. protein sequence -> BLink -> 3D structures
The notes below describe the similarities/differences among these approaches, using NP_000240 as a sample query protein:
  1. protein sequence -> related sequences -> structure links (shown above in back door #1)

    • if you retrieve protein NP_000240 (refseq protein from the human MLH1 gene), and follow the Related Sequences link, you get >3000 protein sequence records. If you then select the Display: Structure Links option near the top of the page, you get 9 STRUCTURE RECORDS (some of those records can contain more than one PROTEIN CHAIN)
      1. 1NHJ
      2. 1NHI
      3. 1NHH
      4. 1H7S
      5. 1H7U
      6. 1EA6
      7. 1B63
      8. 1BKN
      9. 1B62

  2. protein sequence -> related structures

    • if you retrieve protein NP_000240 (refseq protein from the human MLH1 gene), and follow the related structures link, you get 13 hits
    • Those are 13 individual PROTEIN CHAINS that come from the 9 STRUCTURE RECORDS noted above. So the results are essentially the same, but displayed in a different way (at a different level of granularity -- protein chain vs. structure record, and with graphical overview showing the region of alignment between the original query protein and the similar protein sequences that exist in the structure records)
    • The 13 hits were also determined in the same way as in #1 above (i.e., the system first looks for similar protein sequences, then structure links), but using the related structures link just bypasses the similar sequences page in the protein database and jumps right to the Structure database.

  3. protein sequence -> Blink -> 3D structures

    • this approach is easy and quick (and therefore handy), although a user might miss some structure records this way
    • if you retrieve protein NP_000240 (refseq protein from the human MLH1 gene), then follow the BLink link and press the button for 3D structures, you get 7 STRUCTURE RECORDS:
      1. 1BKNB
      2. 1B62A
      3. 1NHIA
      4. 1NHJA
      5. 1H7SB
      6. 1EA6B
      7. 1H7UB
    • Those 7 hits were included in the 9 structure records retrieved by the first approach noted above. But the two records NOT retrieved by BLink were: 1NHH and 1B63.
    • Reason: The BLink system searches a query protein sequence against a non-redundant set of proteins that came from structure records (probably for speed). So if the same protein chain (i.e., same protein sequence) appears in two or more structure records, only one of those records will be represented in BLink results. Although this might be ok for some folks, others might want to see more comprehensive results and might not realize there are more structures if they just use BLink.
    • Question: Why would a user want to see more comprehensive results -- i.e., to retrieve additional structures if they have the identical protein sequence? Answer: one of the structures might be a free protein, and another structure might be the same protein bound to a ligand. The shape of the protein might be different if it is free vs. if it is bound to ligand X or ligand Y. With BLink, a user would only retrieve one of them, not realizing there are additional structures of potential interest that could have a different 3-d shape. BLink is of course a very useful and handy tool, but it is important for users to be aware of potential caveats (e.g., what it does and does not retrieve). So for the purposes of this introductory course, we focus on the other versions of back door #1 because they result in more comprehensive retrieval.



Slide 1 Previous Next Slide List
Revised 11/06/2007