|
|
| PubMed | BLAST | OMIM | Taxonomy | Structure |
|
|
||||
|
NCBI's Structure database Short summary Direct WWW access to the MMDB server Papers about MMDB 3D-structure viewer Structure comparisons Submit structure database searches Conserved Domain Database Research topics and staff Updated 11/26/03 |
Index
What is VAST and a VAST page?Protein structure neighbors in Entrez are determined by direct comparison of 3D protein structures with the Vector Alignment Search Tool (VAST) algorithm. Each of the more than 87,000 domains and complete protein chains in MMDB is compared to every other one. Entrez can list structure neighbors; however VAST Structure Neighbors pages provide further information and displays of structure superpositions and structure-based alignments.VAST pages begin with a brief text description of the query domain, including PubMed links. The precomputed structure neighbors, ranked by a selected similarity measure, are displayed below in a graphic or table. Individual 3D superpositions can be selected by clicking check boxes and viewed in Cn3D. The corresponding sequence alignments can be displayed in HTML, text, and FASTA formats. The "Find" feature is convenient for looking for particular structure neighbors, where the user wants to specify a particular identifier.
What does the graphic show?The graphic in a VAST Page first displays structure features of the chain from which the query domain was selected. It is similar to the figure shown in the corresponding MMDB Summary page. All neighbor representatives from the specified non-redundant subset are sorted by one of the VAST similarity measures and displayed below, one "row" per neighbor.The image below illustrates the graphic in the VAST page for MMDB entry 1RUO chain A domain 1 with its neighbors 1DB7A, 1FT9A, and 1WAPC.
The red bars indicate the region/residues of the query domain that can be superimposed on residues from each neighbor. The gray bars and blank space are unaligned regions. These region colors are the same as those shown in Cn3D when a structure superposition is viewed in Cn3D. When the mouse is over each icon, it will display a description of what it represents. On the sequence ruler next to the query domain, e.g., "1RUO A", the aligned region indicates a sum of regions from all neighbors. This indicates the maximum fragment in the query that is similar to some other structures. The individual 3D domains in the chain are indicated by rectangles below the sequence ruler with different colors and numbers. MMDB's 3D domains are defined on the basis of structural compactness. Red indicates the query domain. Links to the conserved domain database are provided for convenience, to provide names and descriptions (where possible) of the 3D domains to which they correspond. The check box at the leftmost side of a neighbor's "row" (not shown here) allows for selection of individual neighbors and their 3D superposition. Clicking the sequence identifier beside it will go to the Entrez sequence page of the neighbor. The red aligned regions in a neighbor's sequence are displayed at the positions of their equivalent residues in the query sequence. Clicking on these will display an HTML view of the sequence alignment between the query and the neighbor. One of the VAST similarity measures used for sorting (here, the alignment length: e.g., 162 residues residues are aligned with 1DB7A) is listed at the rightmost side of the line. Clicking the name of the similarity measure (i.e., "Ali_Res" in our example) will display a table with all of the VAST statistics.
How may I view or save a structure superposition?From the VAST page, individual structure neighbors can be selected by clicking in the check boxes at the left margin. Then if one chooses the button labeled "View 3D Structure", the 3D superposition of the query protein with the selected neighbors is displayed in Cn3D. Up to 10 neighbors may be viewd in a superposition simultaneously, if Cn3D without the cache mechanism is selected (this is the default). This selection also works for Cn3D version 3.0. Although the default is to submit all atoms for display in Cn3D, the "Backbone" option can be used to control the size of the files being downloaded by Cn3D, in order to save time and memory for data transmission to the viewer. With the release of Cn3D version 4.0, the Cn3D/Cache mechanism is used to store downloaded structure data locally. With this option, the number of neighbors for display is not limited. The user must take care not to exceed the physical memory available in his/her computer. If available memory is exceeded, Cn3D will not operate properly.Alternatively, instead of viewing the 3D superpositions, the data can be examined or saved to disk as a local file, for browser-independent or later viewing. Also if the "List" "Asn1" option is selected instead of the "List" "Graphics" or "List" "Table" from the last menu, a complete alignment file will be saved locally, including all of the neighbors in the subset.
How may I display a sequence alignment created from a structure superposition?If the "View Alignment" button is chosen, a multiple alignment view will be opened in HTML, text, or FASTA with Gap formats. The check boxes at each neighbor "row" allow one to add the "Selected" neighbors into the alignments. The "All on page" option will allow a display of multiple alignments made from all of the neighbors on the same page.The HTML- and text-format alignment views indicate aligned vs. unaligned residues as uppercase and lowercase letters, respectively. In HTML views, columns with identical residues aligned across all selected sequences are colored red, whereas those with different aligned residues are colored blue. Those not covered by all sequences will be shown in gray.
How may I display different neighbors or search for possible neighbors?The "List" button can be used to change the appearance of the graphic and table, by selecting from its options. The VAST similarity measures reported for each neighbor can be used to determine sort order. The lengths of the whole graphic and table are strongly influenced by the display subset, which determines the level of sequence redundancy chosen.The total number of neighbors displayed in a page is limited. At most 60 neighbors from a non-redundant subset can be displayed simultaneously on one page. In addition, by clicking check boxes to select from previously listed neighbors, at most another 40 neighbors can also be displayed in the same page. Therefore the maximum capacity of one page is 100 neighbors. This feature, together with the pagination, is able to keep interesting neighbors from different pages displayed together. The page can be selected from the third pull-down menu in the "List" line. The last menu in the "List" line is for choosing the display format, either graphic or table. A graphic is helpful to understand the superpositions between a query domain and its neighbors. A table is good for viewing or saving the statistics from a VAST calculation. The "Find" button can be used by specifying a MMDB, PDB, or 3D-Domain identifier in the text field. One may search for a possible neighbor that is not displayed in the current page. A "Find" with no input will display only those neighbors that were selected previously.
What VAST similarity measures are listed in the table?All of the similarity measures for each structure neighbor detected by VAST can be listed in a table to facilitate the examination of VAST results. The table includes the following columns:
What does it mean when it says VAST did not find any structure neighbors?There are a few different reasons for this condition. One reason is simply that VAST does not consider this structure to be sufficiently similar to any other structure in the MMDB database. The VAST data use a statistical significance cutoff of P < 0.0001. This cutoff was set to be conservative intentionally, to reduce the number of false positives, but some hits that are biologically significant may be omitted because of this statistical threshold.There are also some entries where the VAST calculation was not done: those for proteins with fewer than 3 secondary structure elements (SSEs), and structures containing no protein chains (i.e., only DNA or RNA). The molecule type and SSE count can be checked out by examining the structure with Cn3D.
How are non-redundant subsets of protein chains selected?MMDB chains are clustered into groups according to their amino acid sequence similarity in pairwise comparisons. A representative chain is selected from each group to compile a non-redundant subset of MMDB, and only one representative of each group is shown in a neighbor-list calculated by VAST. By default, a lower level of redundancy at 10e-40 is used to report structure neighbors. This keeps the table shorter while providing the most informative summary of structural relationships in MMDB.All-against-all pairwise comparisons of MMDB-domains are calculated with the BLAST algorithm, setting a fixed database size parameter of 500,000 residues. Sequences are then clustered into groups by single linkage, whereby a sequence is merged into a group if it shows a BLAST p value of C or less with any member of the group. There are 5 levels of redundancy defined in MMDB database:
Within each cluster of similar protein chains, cluster members are ranked according to the apparent quality and completeness of the structure data. The following criteria are used (ranked by decreasing priority):
For the display of structure neighbors calculated by VAST, the highest ranking chain (according to the criteria above) from each cluster found in the list of neighbors is reported. In most cases this implies that the parent structure is also similar to the other members of the sequence redundant cluster. To have them displayed, the user must select a higher level of redundancy. |
||
|
|
| Help Desk | NCBI | NLM | NIH | Credits |