NCBI VAST logo
PubMed BLAST OMIM Taxonomy Structure
  Search Entrez  for

MMDB Home

NCBI's Structure database

MMDB help

Short summary

Linking to MMDB

Direct WWW access to the MMDB server

Read about MMDB

Papers about MMDB


Cn3D v4.1

3D-structure viewer

VAST

Structure comparisons

VAST Search

Submit structure database searches

CDD

Conserved Domain Database

Research

Research topics and staff


Updated 11/26/03


"VAST Structure Neighbor" Help

Index



What is VAST and a VAST page?

Protein structure neighbors in Entrez are determined by direct comparison of 3D protein structures with the Vector Alignment Search Tool (VAST) algorithm. Each of the more than 87,000 domains and complete protein chains in MMDB is compared to every other one. Entrez can list structure neighbors; however VAST Structure Neighbors pages provide further information and displays of structure superpositions and structure-based alignments.

VAST pages begin with a brief text description of the query domain, including PubMed links. The precomputed structure neighbors, ranked by a selected similarity measure, are displayed below in a graphic or table. Individual 3D superpositions can be selected by clicking check boxes and viewed in Cn3D. The corresponding sequence alignments can be displayed in HTML, text, and FASTA formats. The "Find" feature is convenient for looking for particular structure neighbors, where the user wants to specify a particular identifier.



What does the graphic show?

The graphic in a VAST Page first displays structure features of the chain from which the query domain was selected. It is similar to the figure shown in the corresponding MMDB Summary page. All neighbor representatives from the specified non-redundant subset are sorted by one of the VAST similarity measures and displayed below, one "row" per neighbor.

The image below illustrates the graphic in the VAST page for MMDB entry 1RUO chain A domain 1 with its neighbors 1DB7A, 1FT9A, and 1WAPC.

The red bars indicate the region/residues of the query domain that can be superimposed on residues from each neighbor. The gray bars and blank space are unaligned regions. These region colors are the same as those shown in Cn3D when a structure superposition is viewed in Cn3D. When the mouse is over each icon, it will display a description of what it represents.

On the sequence ruler next to the query domain, e.g., "1RUO A", the aligned region indicates a sum of regions from all neighbors. This indicates the maximum fragment in the query that is similar to some other structures. The individual 3D domains in the chain are indicated by rectangles below the sequence ruler with different colors and numbers. MMDB's 3D domains are defined on the basis of structural compactness. Red indicates the query domain. Links to the conserved domain database are provided for convenience, to provide names and descriptions (where possible) of the 3D domains to which they correspond.

The check box at the leftmost side of a neighbor's "row" (not shown here) allows for selection of individual neighbors and their 3D superposition. Clicking the sequence identifier beside it will go to the Entrez sequence page of the neighbor. The red aligned regions in a neighbor's sequence are displayed at the positions of their equivalent residues in the query sequence. Clicking on these will display an HTML view of the sequence alignment between the query and the neighbor. One of the VAST similarity measures used for sorting (here, the alignment length: e.g., 162 residues residues are aligned with 1DB7A) is listed at the rightmost side of the line. Clicking the name of the similarity measure (i.e., "Ali_Res" in our example) will display a table with all of the VAST statistics.



How may I view or save a structure superposition?

From the VAST page, individual structure neighbors can be selected by clicking in the check boxes at the left margin. Then if one chooses the button labeled "View 3D Structure", the 3D superposition of the query protein with the selected neighbors is displayed in Cn3D. Up to 10 neighbors may be viewd in a superposition simultaneously, if Cn3D without the cache mechanism is selected (this is the default). This selection also works for Cn3D version 3.0. Although the default is to submit all atoms for display in Cn3D, the "Backbone" option can be used to control the size of the files being downloaded by Cn3D, in order to save time and memory for data transmission to the viewer. With the release of Cn3D version 4.0, the Cn3D/Cache mechanism is used to store downloaded structure data locally. With this option, the number of neighbors for display is not limited. The user must take care not to exceed the physical memory available in his/her computer. If available memory is exceeded, Cn3D will not operate properly.

Alternatively, instead of viewing the 3D superpositions, the data can be examined or saved to disk as a local file, for browser-independent or later viewing. Also if the "List" "Asn1" option is selected instead of the "List" "Graphics" or "List" "Table" from the last menu, a complete alignment file will be saved locally, including all of the neighbors in the subset.



How may I display a sequence alignment created from a structure superposition?

If the "View Alignment" button is chosen, a multiple alignment view will be opened in HTML, text, or FASTA with Gap formats. The check boxes at each neighbor "row" allow one to add the "Selected" neighbors into the alignments. The "All on page" option will allow a display of multiple alignments made from all of the neighbors on the same page.

The HTML- and text-format alignment views indicate aligned vs. unaligned residues as uppercase and lowercase letters, respectively. In HTML views, columns with identical residues aligned across all selected sequences are colored red, whereas those with different aligned residues are colored blue. Those not covered by all sequences will be shown in gray.



How may I display different neighbors or search for possible neighbors?

The "List" button can be used to change the appearance of the graphic and table, by selecting from its options. The VAST similarity measures reported for each neighbor can be used to determine sort order. The lengths of the whole graphic and table are strongly influenced by the display subset, which determines the level of sequence redundancy chosen.

The total number of neighbors displayed in a page is limited. At most 60 neighbors from a non-redundant subset can be displayed simultaneously on one page. In addition, by clicking check boxes to select from previously listed neighbors, at most another 40 neighbors can also be displayed in the same page. Therefore the maximum capacity of one page is 100 neighbors. This feature, together with the pagination, is able to keep interesting neighbors from different pages displayed together. The page can be selected from the third pull-down menu in the "List" line.

The last menu in the "List" line is for choosing the display format, either graphic or table. A graphic is helpful to understand the superpositions between a query domain and its neighbors. A table is good for viewing or saving the statistics from a VAST calculation.

The "Find" button can be used by specifying a MMDB, PDB, or 3D-Domain identifier in the text field. One may search for a possible neighbor that is not displayed in the current page. A "Find" with no input will display only those neighbors that were selected previously.



What VAST similarity measures are listed in the table?

All of the similarity measures for each structure neighbor detected by VAST can be listed in a table to facilitate the examination of VAST results. The table includes the following columns:
  • Check box: Allows for selection of individual neighbors.
  • PDB: The four-character PDB-Identifier of the structure neighbor. Click on the Identifier to switch to the MMDB Summary page of the respective neighbor.
  • C: The PDB chain name. A blank space indicates that the chain does not have an identifier (many protein structures have a single chain only). Note that non-alphanumeric characters such as dashes, hyphens, underscores, etc. may be used as chain names by PDB.
  • D: The MMDB 3D domain identifier. Domains are parsed based on geometrical criteria (the ratio of intradomain contacts to interdomain contacts) by an automatic method and can be visualized with Cn3D.
  • Aligned Length: The number of equivalent pairs of C-alpha atoms superimposed between the two structures, i.e. how many residues have been used to calculate the 3D superposition.
  • SCORE: The VAST structure-similarity score. This number is related to the number of secondary structure elements superimposed and the quality of that superposition. Higher VAST scores correlate with higher similarity.
  • P-VAL: The VAST p value is a measure of the significance of the comparison, expressed as a probability. For example, if the p value is 0.001, then the odds are 1000 to 1 against seeing a match of this quality by pure chance. The p value from VAST is adjusted for the effects of multiple comparisons using the assumption that there are 500 independent and unrelated types of domains in the MMDB database. The p value shown thus corresponds to the p value for the pairwise comparison of each domain pair, divided by 500.
  • RMSD: The root mean square superposition residual in Angstroms. This number is calculated after optimal superposition of two structures, as the square root of the mean square distances between equivalent C-alpha atoms. Note that the RMSD value scales with the extent of the structural alignments and that this size must be taken into consideration when using RMSD as a descriptor of overall structural similarity.
  • %Id: Percent identical residues in the aligned sequence region. This is a raw measure of sequence similarity in the parts of the proteins that have been superimposed.
  • LHM: Loop Hausdorff Metric. A Loop Similarity measure that shows how well two structures conform to each other in the loop regions, after structural superposition. The "loop regions" are the parts of the structures between aligned secondary structure elements (helices and strands). LHM is measured in Angstroms, with a smaller value indicative of greater similarity. The loop similarity may be undefined (indicated by 'NA') if there are too many residues with missing coordinates in the loops. Citation: Analysis of protein homology by assessing the (dis)similarity in protein loop regions
  • GSP: Gapped Score. A combination (algebraic) score that uses RMSD, aligned length, and the number of gapped regions in the alignment. A smaller gapped score correlates with greater similarity. Citation: Comprehensive evaluation of protein structure alignment methods: scoring by geometric measures.
  • Description: A string parsed out of PDB's COMPOUND records that describes the nature of the structure neighbor.



What does it mean when it says VAST did not find any structure neighbors?

There are a few different reasons for this condition. One reason is simply that VAST does not consider this structure to be sufficiently similar to any other structure in the MMDB database. The VAST data use a statistical significance cutoff of P < 0.0001. This cutoff was set to be conservative intentionally, to reduce the number of false positives, but some hits that are biologically significant may be omitted because of this statistical threshold.

There are also some entries where the VAST calculation was not done: those for proteins with fewer than 3 secondary structure elements (SSEs), and structures containing no protein chains (i.e., only DNA or RNA). The molecule type and SSE count can be checked out by examining the structure with Cn3D.



How are non-redundant subsets of protein chains selected?

MMDB chains are clustered into groups according to their amino acid sequence similarity in pairwise comparisons. A representative chain is selected from each group to compile a non-redundant subset of MMDB, and only one representative of each group is shown in a neighbor-list calculated by VAST. By default, a lower level of redundancy at 10e-40 is used to report structure neighbors. This keeps the table shorter while providing the most informative summary of structural relationships in MMDB.

All-against-all pairwise comparisons of MMDB-domains are calculated with the BLAST algorithm, setting a fixed database size parameter of 500,000 residues. Sequences are then clustered into groups by single linkage, whereby a sequence is merged into a group if it shows a BLAST p value of C or less with any member of the group. There are 5 levels of redundancy defined in MMDB database:

  1. Low redundancy: representatives are chosen from each group where sequences show a BLAST p value of 10e-7 to each other
  2. Medium redundancy: representatives are chosen from each group where sequences show a BLAST p value of 10e-40 to each other
  3. High redundancy: representatives are chosen from each group where sequences show a BLAST p value of 10e-80 to each other
  4. Non-identical sequence level: representatives are chosen from each group where sequences are not identical to each other
  5. All sequences level: this is the most redundant level, which includes all of MMDB sequences

Within each cluster of similar protein chains, cluster members are ranked according to the apparent quality and completeness of the structure data. The following criteria are used (ranked by decreasing priority):

  1. Low fraction of residues with unknown residue type
  2. Low fraction of residues with incomplete coordinates
  3. Low fraction of residues with missing coordinates
  4. Low fraction of residues with incomplete side-chain coordinates
  5. High resolution
  6. High number of chains (subunits) contained in the PDB entry
  7. High number of heterogens contained in the PDB entry
  8. High number of different types of heterogens.
  9. Chain length

For the display of structure neighbors calculated by VAST, the highest ranking chain (according to the criteria above) from each cluster found in the list of neighbors is reported. In most cases this implies that the parent structure is also similar to the other members of the sequence redundant cluster. To have them displayed, the user must select a higher level of redundancy.




Privacy statement

Disclaimer

 
Help Desk NCBI NLM NIH Credits