Format

Send to

Choose Destination
Bioinformatics. 2015 Jun 15;31(12):1929-37. doi: 10.1093/bioinformatics/btv103. Epub 2015 Feb 19.

Comparative study of the effectiveness and limitations of current methods for detecting sequence coevolution.

Author information

1
Department of Computational and Systems Biology, School of Medicine, University of Pittsburgh, Pittsburgh, PA 15260, USA, Department of Pharmacology, School of Medicine, Tsinghua University, Beijing 100084, China and Department of Structural Biology, Weizmann Institute of Science, Rehovot 76100, Israel Department of Computational and Systems Biology, School of Medicine, University of Pittsburgh, Pittsburgh, PA 15260, USA, Department of Pharmacology, School of Medicine, Tsinghua University, Beijing 100084, China and Department of Structural Biology, Weizmann Institute of Science, Rehovot 76100, Israel.
2
Department of Computational and Systems Biology, School of Medicine, University of Pittsburgh, Pittsburgh, PA 15260, USA, Department of Pharmacology, School of Medicine, Tsinghua University, Beijing 100084, China and Department of Structural Biology, Weizmann Institute of Science, Rehovot 76100, Israel.

Abstract

MOTIVATION:

With rapid accumulation of sequence data on several species, extracting rational and systematic information from multiple sequence alignments (MSAs) is becoming increasingly important. Currently, there is a plethora of computational methods for investigating coupled evolutionary changes in pairs of positions along the amino acid sequence, and making inferences on structure and function. Yet, the significance of coevolution signals remains to be established. Also, a large number of false positives (FPs) arise from insufficient MSA size, phylogenetic background and indirect couplings.

RESULTS:

Here, a set of 16 pairs of non-interacting proteins is thoroughly examined to assess the effectiveness and limitations of different methods. The analysis shows that recent computationally expensive methods designed to remove biases from indirect couplings outperform others in detecting tertiary structural contacts as well as eliminating intermolecular FPs; whereas traditional methods such as mutual information benefit from refinements such as shuffling, while being highly efficient. Computations repeated with 2,330 pairs of protein families from the Negatome database corroborated these results. Finally, using a training dataset of 162 families of proteins, we propose a combined method that outperforms existing individual methods. Overall, the study provides simple guidelines towards the choice of suitable methods and strategies based on available MSA size and computing resources.

AVAILABILITY AND IMPLEMENTATION:

Software is freely available through the Evol component of ProDy API.

PMID:
25697822
PMCID:
PMC4481699
DOI:
10.1093/bioinformatics/btv103
[Indexed for MEDLINE]
Free PMC Article

Supplemental Content

Full text links

Icon for Silverchair Information Systems Icon for PubMed Central
Loading ...
Support Center