Format

Send to

Choose Destination
Methods. 2014 Jun 1;67(3):325-33. doi: 10.1016/j.ymeth.2014.02.016. Epub 2014 Feb 21.

Effective identification of essential proteins based on priori knowledge, network topology and gene expressions.

Author information

1
School of Information Science and Engineering, Central South University, Changsha 410083, China; State Key Laboratory of Medical Genetics, Central South University, Changsha 410078, China.
2
School of Information Science and Engineering, Central South University, Changsha 410083, China.
3
School of Information Science and Engineering, Central South University, Changsha 410083, China; Department of Computer Science, Georgia State University, Atlanta, GA 30302-4110, USA. Electronic address: pan@cs.gsu.edu.

Abstract

Identification of essential proteins is very important for understanding the minimal requirements for cellular life and also necessary for a series of practical applications, such as drug design. With the advances in high throughput technologies, a large number of protein-protein interactions are available, which makes it possible to detect proteins' essentialities from the network level. Considering that most species already have a number of known essential proteins, we proposed a new priori knowledge-based scheme to discover new essential proteins from protein interaction networks. Based on the new scheme, two essential protein discovery algorithms, CPPK and CEPPK, were developed. CPPK predicts new essential proteins based on network topology and CEPPK detects new essential proteins by integrating network topology and gene expressions. The performances of CPPK and CEPPK were validated based on the protein interaction network of Saccharomyces cerevisiae. The experimental results showed that the priori knowledge of known essential proteins was effective for improving the predicted precision. The predicted precisions of CPPK and CEPPK clearly exceeded that of the other 10 previously proposed essential protein discovery methods: Degree Centrality (DC), Betweenness Centrality (BC), Closeness Centrality (CC), Subgraph Centrality (SC), Eigenvector Centrality (EC), Information Centrality (IC), Bottle Neck (BN), Density of Maximum Neighborhood Component (DMNC), Local Average Connectivity-based method (LAC), and Network Centrality (NC). Especially, CPPK achieved 40% improvement in precision over BC, CC, SC, EC, and BN, and CEPPK performed even better. CEPPK was also compared to four other methods (EPC, ORFL, PeC, and CoEWC) which were not node centralities and CEPPK was showed to achieve the best results.

KEYWORDS:

Essential protein; Gene expression; Protein interaction networks

PMID:
24565748
DOI:
10.1016/j.ymeth.2014.02.016
[Indexed for MEDLINE]

Supplemental Content

Full text links

Icon for Elsevier Science
Loading ...
Support Center