NCBI logo

Computational Biology Branch

 

 

 

 

 

 

NCBI

back to NCBI homepage

back to NCBI homepage

CBB
Home Page

T. Przytycka’s Research Group

 

 

Teresa M. Przytycka’s research group

Algorithmic and Graph Theoretical methods in

Computational and Systems Biology

 

 

 

Structure Comparison by Projection Methods

 

 

Group members:

Elena Zotenko

Teresa M. Przytycka

 

Collaborators

Dianne P. O’Leary

R.I Dogan,

WJ. Wilbur

 

 

References:

  • Elena Zotenko, Dianne P. O’Leary and Teresa M. Przytycka. Secondary Structure Spatial Conformation Footprint (SSEF) A Novel Method for Fast Protein Structure Comparison and Classification BMC Structural Biology 2006, 6:12 (8 June 2006)
  • Zotenko E, Dogan RI, Wilbur WJ, O'Leary DP, Przytycka TM. Structural footprinting in protein structure comparison: the impact of structural fragments. BMC Struct Biol. 2007 Aug 9;7:53.

 

 

Background: Recently a new class of methods for fast protein structure comparison has emerged. We call the methods in this class projection methods as they rely on a mapping of protein structure into a high-dimensional vector space. Once the mapping is done, the structure comparison is reduced to distance computation between corresponding vectors. As structural similarity is approximated by distance between projections, the success of any projection method depends on how well its mapping function is able to capture the salient features of protein structure. There is no agreement on what constitutes a good projection technique and the three currently known projection methods utilize very different approaches to the mapping construction, both in terms of what structural elements are included and how this information is integrated to produce a vector representation.

 

Results: In this paper we propose a novel projection method that uses secondary structure information to produce the mapping. First, a diverse set of spatial arrangements of triplets of secondary structure elements, a set of structural models, is automatically selected. Then, each protein structure is mapped into a high-dimensional vector of  “counts'' or footprint, where each count corresponds to the number of times a given structural model is observed in the structure, weighted by the precision with which the model is reproduced. We perform the first comprehensive evaluation of our method together with all other currently known projection methods, which not only allows us to compare our method to the methods in the same class but also creates a unique opportunity for establishing a connection between a projection technique and performance.

 

Conclusions: The results of our evaluation suggest that the type of structural information used by a projection method affects the ability of the method to detect structural similarity. In particular, our method that bases the mapping on the spatial conformations of triplets of secondary structure elements outperforms other methods in most of the tests.

 

 
 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 


Comparison of SSEF with other projection methods:

 

DATA SETS

 

PERFORMANCE IN SCREENING

  • SCOP 1.65
    • all level combined, the figure from the paper (plot)
  • SCOP 1.69
    • the fold  level (plot)
    • the super-family level (plot)
    • the family level (plot)
  • CATH 2.6
    • the topology  level (plot)
    • the super-family level (plot)
  •  

 

 

PERFORMANCE IN CLASSIFICATION

 

 

SSE FOOTPRINT

 

 

 

 

 

Structural footprinting in protein structure comparison: the impact of structural fragments.

Zotenko E, Dogan RI, Wilbur WJ, O'Leary DP, Przytycka TM.

BMC Struct Biol. 2007 Aug 9;7:53.

BACKGROUND: One approach for speeding-up protein structure comparison is the projection approach, where a protein structure is mapped to a high-dimensional vector and structural similarity is approximated by distance between the corresponding vectors. Structural footprinting methods are projection methods that employ the same general technique to produce the mapping: first select a representative set of structural fragments as models and then map a protein structure to a vector in which each dimension corresponds to a particular model and "counts" the number of times the model appears in the structure. The main difference between any two structural footprinting methods is in the set of models they use; in fact a large number of methods can be generated by varying the type of structural fragments used and the amount of detail in their representation. How do these choices affect the ability of the method to detect various types of structural similarity?

 

 RESULTS: To answer this question we benchmarked three structural footprinting methods that vary significantly in their selection of models against the CATH database. In the first set of experiments we compared the methods' ability to detect structural similarity characteristic of evolutionarily related structures, i.e., structures within the same CATH superfamily. In the second set of experiments we tested the methods' agreement with the boundaries imposed by classification groups at the Class, Architecture, and Fold levels of the CATH hierarchy.

 

CONCLUSION: In both experiments we found that the method which uses secondary structure information has the best performance on average, but no one method performs consistently the best across all groups at a given classification level. We also found that combining the methods' outputs significantly improves the performance. Moreover, our new techniques to measure and visualize the methods' agreement with the CATH hierarchy, including the threshholded affinity graph, are useful beyond this work. In particular, they can be used to expose a similar composition of different classification groups in terms of structural fragments used by the method and thus provide an alternative demonstration of the continuous nature of the protein structure universe.