The three dimensional structures for representatives of
nearly half of all protein families are now available in public databases.
Thus, no matter which protein one investigates, it is increasingly likely
that the 3D structure of a homolog will be known and may reveal unsuspected
structure-function relationships. The goal of Entrez's 3D-structure database
is to make this information accessible and usable by molecular biologists
To this end Entrez provides two major analysis tools, a search engine based
on sequence and structure `neighboring' and an integrated visualization
system for sequence and structure alignments. From a protein's sequence
`neighbors' one may rapidly identify other members of a protein family,
including those where 3D structure is known. By comparing aligned sequences
and/or structures in detail, using the visualization system, one may
identify conserved features and perhaps infer functional properties. Here
we describe how these analysis tools may be used to investigate the
structure and function of newly discovered proteins, using the PTEN gene
product as an example.
Germline mutations in the PTEN gene product have been shown to cause
Cowden Disease, and somatic mutations have been associated with a variety
of cancers (1). To illustrate the analysis
tools of Entrez's (2,3
) 3D-structure database we describe how it may be used to answer the
following question: is the 3D structure known for the PTEN gene product,
or a homologous protein, and does this information suggest mechanisms by which these mutations might cause disease? To use this article as a tutorial readers should address their WWW browser to the Entrez site (http://www.ncbi.nlm.nih.gov/Entrez ) and perform the analysis step by step, as we describe it.
To retrieve the sequence of the PTEN gene product we enter Entrez's
Pubmed literature database and type the query `PTEN and Cowden Disease'.
This identifies a number of articles describing mutations in the PTEN gene
and their associated phenotypes. Choosing `Display Protein Links' leads one
to sequences reported in these articles, and in particular to the
Swiss-Prot (4) entry PTEN_HUMAN. Studying the
associated GenPept Report we see annotations recording many of the
mutations reported in the literature. In particular, we see that mutations
at residues 123, 124, and 129 in the PTEN gene product have been implicated in Cowden Disease.
The sequence PTEN_HUMAN does not have a Structure Link, since Entrez has collected this entry from Swiss-Prot, not PDB (5). To find a structure we need to search among sequences similar to PTEN, and to do so we choose its Protein Neighbors. This link retrieves all sequences with significant similarity to PTEN_HUMAN, the results of the pre-computed BLAST (6) searches that comprise Entrez's sequence neighbor database. To see if the 3D structure is known for any of these homologous sequences one may browse this list, looking for a Structure Link, or choose to Display Structure Links for all of these sequences. Again the results are negative, indicating that none of the sequences detected as similar to PTEN in a single round of BLAST neighboring has 3D structure.
At this point one might conclude that no 3D structure relevant to PTEN is known, and indeed one has learned that there is no close homolog with 3D structure. One may use Entrez to continue the search with greater sensitivity, however, by examining the `neighbors of neighbors' of PTEN. A strategy to follow is to browse the list of PTEN's sequence neighbors, searching for a sequence that is annotated as having the same function as PTEN, and at the same time a larger number of sequence neighbors. Following this strategy one skips over PTEN neighbors that are large multi-domain proteins, such as kinases containing tensin-like domains. One identifies Cdc14b2, however, a human protein 202 residues in length, which, like PTEN, is annotated as a phosphatase. Cdc14b2 has nearly 300 sequence neighbors, and Display Structure Links yields a hit: 1VHR, the 3D structure of human VH1-related dual-specificity phosphatase (7). This search strategy is illustrated in Figure 1.
5. Abola,E.E., Bernstein,F.C., Bryant,S.H., Koetzle,T.F. and Weng,J. (1987) In Allen,F.H., Bergerhoff,G. and Sievers,R. (eds), Crystallographic Databases-Information Content, Software Systems, Scientific Applications. Data Commission of the International Union of Crystallography, Bonn/Cambridge/Chester, pp. 107-132.
6. Altschul,S.F., Gish,W., Miller,W., Myers,E.W. and Lipman,D.J. (1990) J. Mol. Biol., 215, 403-410. MEDLINE Abstract