Accurate and scalable techniques for the complex/pathway membership problem in protein networks

Adv Bioinformatics. 2009:2009:787128. doi: 10.1155/2009/787128. Epub 2010 Feb 23.

Abstract

A protein network shows physical interactions as well as functional associations. An important usage of such networks is to discover unknown members of partially known complexes and pathways. A number of methods exist for such analyses, and they can be divided into two main categories based on their treatment of highly connected proteins. In this paper, we show that methods that are not affected by the degree (number of linkages) of a protein give more accurate predictions for certain complexes and pathways. We propose a network flow-based technique to compute the association probability of a pair of proteins. We extend the proposed technique using hierarchical clustering in order to scale well with the size of proteome. We also show that top-k queries are not suitable for a large number of cases, and threshold queries are more meaningful in these cases. Network flow technique with clustering is able to optimize meaningful threshold queries and answer them with high efficiency compared to a similar method that uses Monte Carlo simulation.