NCBI logo

Computational Biology Branch

 

PubMed

Entrez

BLAST

OMIM

Taxonomy

Structure

 

 

NCBI

back to NCBI homepage

back to NCBI homepage

 

CBB
Home Page

NCBI Structure Group

CDD

Protein Threading

REFINER

 

                                                                         

Evolution of protein structures and protein-protein interaction networks.

My research interests focus on understanding how protein structure, function and protein-protein interaction networks change in the course of evolution. Protein evolution occurs under strong functional and structural constraints and understanding of relationships between protein sequence   (genotype) and structure/function (phenotype) is crucial for inferring the causes of many diseases. We investigate how tolerant different protein structures are to sequence change and whether this structural plasticity depends on protein function or fold (1).

 

 

 

 

 

 

Proteins perform their functions in a cell environment via interaction with other proteins/domains. As a result structural changes which are not deleterious in a single protein/domain can cause lethal effects if combined together in a protein complex. As the rate of structures solved continues to increase, the analysis of interactions between domains within the same protein structure can provide important clues for understanding the interactions between proteins in a cell. We analyze the conservation of interaction patterns between protein domains and investigate how different protein structures can adapt to multiple interaction partners (2, 3, 4).

 

Evolution of protein loops and indels.

The intervening unaligned regions ("loops") between the superimposable helices and strands in proteins can exhibit a wide range of similarity and may offer clues to the structural evolution of folds. One might argue that more closely related proteins differ less in their nonconserved loop regions than distantly related proteins and, at the same time, the degree of variability in the loop regions in structurally similar but unrelated proteins is higher than in homologs. Conventional sequence and structure similarity measures comparing proteins in their cores are often not sensitive enough to detect subtle (dis)similarities between proteins and therefore we developed loop-based metrics to improve protein classification and gauge the protein evolutionary relationships (5,6,7)

 

 

 

 

 

 

 

 

 

 

 

Changes in protein domains result mostly from point mutations, insertion and deletion processes. Although amino acid insertion and deletion (indel) events in proteins are less frequent than amino acid substitutions, they can have a major effect in protein evolution. The mechanisms of indel events are not very well understood and there are only few statistical models describing these events in evolution. We studied whether the insertion and deletion events in protein domains are balanced and if there exist trends toward increasing or decreasing indel or domain lengths.

 

We found that more than one third of all studies domains (the test set of 362 manually curated domain alignments together with their rooted phylogenetic trees is available at ftp://ftp.ncbi.nih.gov/mmdb.tree.files) have a statistically significant tendency to increase/decrease in size in evolution as judged from the overall domain size distribution as well as from the size distribution of individual indels. Moreover, the fraction of domains and individual indels increasing in size is almost twofold larger than the fraction decreasing in size. We showed that the tolerance to insertion and deletion events depends on the domain's taxonomy span. Eukaryotic domains are depleted in insertions compared to the overall test set, on the other hand, ancient domain families show some bias towards insertions (8).

 

 

                     Prediction of protein function.

The protein classification can be exploited to infer the function between experimentally annotated and uncharacterized homologous proteins. However, common descent does not necessarily imply functional similarity and functional annotation transferred from one homologous protein to another can result in incorrect assignment. To verify functional assignments we examine common features conserved among families of homologs to identify family/subfamily specific functionally important sites (9).

 

The rapid increase in the amount of protein sequence data has created a need for automated identification of sites that determine functional specificity among related subfamilies of proteins. A significant fraction of subfamily specific sites are only marginally conserved, which makes it extremely challenging to detect those amino acid changes that lead to functional diversification.

 

To address this critical problem we developed a method named SPEER (specificity prediction using amino acids' properties, entropy and evolution rate) to distinguish specificity determining sites from others. SPEER encodes the conservation patterns of amino acid types using their physico-chemical properties and the heterogeneity of evolutionary changes between and within the subfamilies. To test the method, we compiled a test set containing 13 protein families with known specificity determining sites (the set of alignments together with the subfamily determinants can be obtained at ftp://ftp.ncbi.nih.gov/pub/chakraba/SPEER/). Extensive benchmarking by comparing the performance of SPEER with other specificity site prediction algorithms has shown that it performs better in predicting several categories of subfamily specific sites (10).

 

Algorithms of sequence alignment and fold recognition.

Pairwise sequence alignment methods may fail to detect distant evolutionary relationships in the twilight zone of sequence similarity whereas methods based on the analysis of the residue conservation patterns in multiple sequence alignments have proved very powerful in this respect. Moreover algorithms of protein structure prediction and fold recognition may recognize even more remote evolutionary relationships that are not detectable by sequence comparison alone.

 

We develop algorithms of sequence alignment

which score the sequence and structure conservation within protein families (PSSM-based protein threading 11,12) algorithms which find an optimal alignment between two sequence profiles (profile-profile alignment algorithm,13)

 

 

and refine the existing multiple sequence alignments by retaining the structural and functional information embedded in the protein family model (14). Realignment of each sequence can correct misalignments between a given sequence and the rest of the profile and at the same time preserves the family's overall block model. Large-scale benchmarking studies showed a noticeable improvement of alignment after refinement.