• We are sorry, but NCBI web applications do not support your browser and may not function properly. More information
Logo of narLink to Publisher's site
Nucleic Acids Res. Jan 2009; 37(Database issue): D651–D656.
Published online Nov 6, 2008. doi:  10.1093/nar/gkn870
PMCID: PMC2686497

PIPs: human protein–protein interaction prediction database

Abstract

The PIPs database (http://www.compbio.dundee.ac.uk/www-pips) is a resource for studying protein–protein interactions in human. It contains predictions of >37 000 high probability interactions of which >34 000 are not reported in the interaction databases HPRD, BIND, DIP or OPHID. The interactions in PIPs were calculated by a Bayesian method that combines information from expression, orthology, domain co-occurrence, post-translational modifications and sub-cellular location. The predictions also take account of the topology of the predicted interaction network. The web interface to PIPs ranks predictions according to their likelihood of interaction broken down by the contribution from each information source and with easy access to the evidence that supports each prediction. Where data exists in OPHID, HPRD, DIP or BIND for a protein pair this is also reported in the output tables returned by a search. A network browser is included to allow convenient browsing of the interaction network for any protein in the database. The PIPs database provides a new resource on protein–protein interactions in human that is straightforward to browse, or can be exploited completely, for interaction network modelling.

INTRODUCTION

Protein–protein interactions (PPIs) regulate many fundamental cellular processes. As a consequence, a key step in understanding the function of a protein in its cellular context is to identify potential interacting partners. PPIs are typically identified on a small scale by pull-down experiments or similar techniques, but this approach is too slow and expensive to meet the goal of identifying all the PPIs necessary to provide a rich picture of the functional and dynamic properties of the cell (1). High-throughput methods, such as yeast two-hybrid seek to overcome the time constraints of traditional protein-by-protein methods and have been applied to the study of PPIs in many organisms, including Saccharomyces cerevisiae (2,3) Caenorhabditis elegans (4), Drosophila melanogaster (5,6), Escherichia coli (7) and more recently human (8,9). Although high-throughput methods provide data for large numbers of potential interacting pairs, they unfortunately often have much higher error rates than traditional approaches (10). Computational methods to predict PPIs complement experimental methods. They can efficiently integrate data from numerous sources in order to make predictions of the likelihood of interaction between two proteins (11).

There are several public repositories that store PPIs identified by experimental methods. Databases, such as the HPRD (12,13), DIP (14), IntAct (15), BioGRID (16) and MINT (17) all provide lists of experimentally determined interactions. Many of these resources contain only interactions that have been observed experimentally, but these data are not yet representative of a complete interactome.

It has been suggested that the human proteome includes around 300 000 PPIs (18) out of a potential >300 000 000. This estimate does not account for the numerous variations in interacting pairs due to post-translational modifications and alternative splicing. However, the number of human PPIs that have been experimentally determined is an order of magnitude less as shown in Table 1. The importance of prediction in filling this gap has been recognized by a number of groups and led to the development of databases, such as OPHID (19) and POINT (20) which predict PPIs as well as STRING, a database of predicted protein–protein associations (direct and indirect PPIs) (21). All three services computationally predict likely PPIs (whether direct or indirect) based on orthology, annotations and/or experimental information and have substantially increased the size of the human interactome. However, neither OPHID nor POINT ranks the predictions in order of likelihood. Furthermore, the breakdown of the evidence for interaction is limited to a summary of correlation scores or a binary indication of co-occurrence. STRING provides an aesthetically pleasing, informative and user-friendly method of accessing its predictions and the primary data, but does not distinguish between direct physical interactions and indirect relationships, which include transcriptional relationships as well as co-pathway membership (21).

Table 1.
Number of human PPIs that have been determined experimentally and the results made available via publically accessible databases

In this article, a new database—PIPs—of predicted PPIs for human is described. The predictions stored in PIPs are derived by a Bayesian prediction method that combines information on the likelihood of interaction from a variety of sources (11). A novel feature of the method is to use a ‘Transitive’ module that gathers evidence for interaction from examination of predicted common interactors to a pair of proteins. The unique combination of features examined allowed the generation of a set of predictions that are mostly orthogonal to other PPI databases (11). The database and its interface allow the user to see the full evidence trail for each predicted interaction. In this way, PIPs is a resource not only for large-scale modelling of protein interaction networks, but also as an exploratory tool for the cell/molecular biologist who wishes to understand more about the predicted interaction network for the protein they are studying.

THE DATABASE

Overview

The PIPs database is a resource of PPIs in human predicted by a naïve Bayesian model as described in Scott and Barton (11). Briefly, the method (11) combines information from gene co-expression, orthology, co-occurrence of domains, post-translational modifications, co-localization of the proteins within the cell and analysis of the local topology of the predicted PPI network. The different evidence types are programmed as separate modules with each module giving a score of interaction. The individual module scores are combined to give a prediction for the overall likelihood of interaction given the available data.

The full database of predicted interactions includes details about 69 965 human proteins imported from the IPI (22) together with interaction scores for 17 643 506 protein pairs, of which 37 606 are predicted to interact. For each protein pair, the overall score is stored along with a breakdown of the scores provided by each of the modules. Further information is stored that details the evidence that was used in calculating the final score. The evidence includes 5872 S. cerevisiae, 23 195 C. elegans and 27629 D. melanogaster proteins that were analysed by InParanoid (23) to identify orthologous protein pairs, where each protein was known to be involved in an interaction. Details of the InterPro (24) motifs and domains, the sites of post-translational modifications, and each protein's sub-cellular localization are also stored, as well as the Pearson's correlation coefficients from analysis of expression data. In order to simplify exploration of the predicted interactions, links are stored to external data sources including, RefSeq (25), UniProt (26) and Entrez (27). Comparisons to other publicly available databases of interactions are simplified by the inclusion of links to HPRD (12,13), DIP (14), BIND (28) and OPHID (19) for protein pairs that are represented in those databases.

The PIPs database was constructed on a Linux server running the MySQL database software and Apache/Tomcat for the web server. The front-end utilizes Java Server Pages (JSP) to provide a dynamic and easy to navigate web interface.

The PIPs web interface

The front page of the PIPs interface allows for simple searches with the IPI, UniProt or RefSeq identifier for a protein, or a text search with keywords. The output may be restricted by adjusting the minimum score threshold. The Advanced Search allows the query protein sequence to be compared with the protein sequences stored in the PIPs database by MagicMatch (29) which returns exact matches to the query sequence. If no match is found, a BLAST (30) search may optionally be run to find sequences that are similar to the query. A batch mode is available to allow larger numbers of protein IPI identifiers to be run against the database as a single set.

Figure 1 illustrates the result of searching with IPI00016572 (SNRPG–small nuclear ribonucleoprotein G) via the quick search from the front page and selecting to view the scores from each module. The Interaction Summary Page for SNRPG shows interacting pairs of proteins ranked in descending order by the final interaction score. The output includes the name of the protein and scores obtained by each of the different modules. For example, the interaction between SNRPG and LSM8 seen in Figure 1 shows that a low contribution was made by the orthology and combined modules, but the expression and transitive modules provide the major contribution to the final score. In contrast, the interaction between SNRPG and SNRPD3, the modules expression, orthology, combined and transitive are all predictive of this interaction. The ‘Evidence’ column provides a link to view the evidence that was used by each of the modules in calculating the final interaction score, while the ‘Database’ column lets the user know if the pair of proteins has been reported as interacting in other databases [Currently—BIND (28), DIP (14), HPRD (12,13) and OPHID (19)].

Figure 1.
Interaction Summary for the protein IPI00016572 (SNRPG): this page shows the predicted interactors, ordered by the score in descending order from the most probable interactor. The name of the predicted interactor and a breakdown by predictive feature ...

Figure 2a–c show the Evidence of Interaction page for the interaction predicted between SNRPG and SNRPD3 that was identified in Figure 1. The page is organized into six sections which provide a break-down of the information on expression, orthology, domains, post-translational modifications, localization and topology (transitive) score.

Figure 2.
(a) Evidence of Interaction Summary page for the interaction between SNRPG and SNRPD3: Sections Gene Expression and Orthology provide details about the predictions based on expression and orthology for the interaction pair. (b) Sections Domains, Post-translational ...

For each protein analysed in the prediction, a Protein Summary page is available as a link from the main prediction result page. For example, Figure 3 shows the Protein Summary page for the SNRPG protein. The summary shows the number of predicted interactions above a given threshold (57 predicted interactors with a Score ≥1.0 of which four have a Score ≥2500). The table also provides links to external protein databases including RefSeq (25), HPRD (12,13), UniProt (26) and Entrez (27).

Figure 3.
Protein Summary for the protein SNRPG: information about the selected protein including a breakdown of the number of predicted interactions and the number of interactions within external databases. Links are also provided to obtain further details about ...

Figure 4 illustrates the display of interactions through a new Java applet that can be accessed from the Protein Summary page. Users are able to view the network of the predicted protein interactions out to a path length of two from the query protein. Within the applet the user is able to view the network with and without proteins that have only a single connection. The user can also grow the graph by selecting a protein and clicking on the ‘Grow Network …’ option. Once the network has been created it is possible to save the network as an image or save an adjacency list of the proteins so that they can be represented in an external application, such as Cytoscape (http://cytoscape.org/) or Graphviz (http://www.graphviz.org/).

Figure 4.
Network view of the predicted interactors of SNRPG: Java application to view the local topology of the predicted PPI network. Left: network image of the predicted primary and secondary interactors of the protein SNRPG (blue). Right: network image of the ...

SUMMARY

It has been estimated that only 10% of the human interactome has been identified (18). The PIPs database allows the user to browse and easily access many additional high probability predicted human interactions and to see the evidence that led to each prediction. It also provides a source of information to help improve the design of experiments to investigate further the function of proteins in the human proteome. All predictions are ranked allowing the most probable interactions to be investigated first rather than being given a flat list of predicted interactions.

The database is freely available to search/explore at http://www.compbio.dundee.ac.uk/www-pips.

FUNDING

UK Biotechnology and Biological Sciences Research Council (BBSRC to M.D.M.); Canadian Institutes of Health Research (fellowship to M.S.S.). Funding for open access charge: Canadian Institutes of Health Research.

Conflict of interest statement. None declared.

ACKNOWLEDGEMENTS

We would like to thank Dr Tom Walsh for assistance with computational issues and all members of the Barton Group for helpful discussions.

REFERENCES

1. Stelzl U, Wanker EE. The value of high quality protein-protein interaction networks for systems biology. Curr. Opin. Chem. Biol. 2006;10:551–558. [PubMed]
2. Ito T, Chiba T, Ozawa R, Yoshida M, Hattori M, Sakaki Y. A comprehensive two-hybrid analysis to explore the yeast protein interactome. Proc. Natl Acad. Sci. USA. 2001;98:4569–4574. [PMC free article] [PubMed]
3. Uetz P, Giot L, Cagney G, Mansfield TA, Judson RS, Knight JR, Lockshon D, Narayan V, Srinivasan M, Pochart P, et al. A comprehensive analysis of protein-protein interactions in Saccharomyces cerevisiae. Nature. 2000;403:623–627. [PubMed]
4. Li S, Armstrong CM, Bertin N, Ge H, Milstein S, Boxem M, Vidalain P-O, Han J-DJ, Chesneau A, Hao T, et al. A map of the interactome network of the metazoan C. elegans. Science. 2004;303:540–543. [PMC free article] [PubMed]
5. Formstecher E, Aresta S, Collura V, Hamburger A, Meil A, Trehin A, Reverdy C, Betin V, Maire S, Brun C, et al. Protein interaction mapping: a Drosophila case study. Genome Res. 2005;15:376–384. [PMC free article] [PubMed]
6. Giot L, Bader JS, Brouwer C, Chaudhuri A, Kuang B, Li Y, Hao YL, Ooi CE, Godwin B, Vitols E, et al. A protein interaction map of Drosophila melanogaster. Science. 2003;302:1727–1736. [PubMed]
7. Arifuzzaman M, Maeda M, Itoh A, Nishikata K, Takita C, Saito R, Ara T, Nakahigashi K, Huang H-C, Hirai A, et al. Large-scale identification of protein-protein interaction of Escherichia coli K-12. Genome Res. 2006;16:686–691. [PMC free article] [PubMed]
8. Rual J-F, Venkatesan K, Hao T, Hirozane-Kishikawa T, Dricot A, Li N, Berriz GF, Gibbons FD, Dreze M, Ayivi-Guedehoussou N, et al. Towards a proteome-scale map of the human protein-protein interaction network. Nature. 2005;437:1173–1178. [PubMed]
9. Stelzl U, Worm U, Lalowski M, Haenig C, Brembeck FH, Goehler H, Stroedicke M, Zenkner M, Schoenherr A, Koeppen S, et al. A human protein-protein interaction network: a resource for annotating the proteome. Cell. 2005;122:957–968. [PubMed]
10. Reguly T, Breitkreutz A, Boucher L, Breitkreutz B-J, Hon G, Myers C, Parsons A, Friesen H, Oughtred R, Tong A, et al. Comprehensive curation and analysis of global interaction networks in Saccharomyces cerevisiae. J. Biol. 2006;5:11. [PMC free article] [PubMed]
11. Scott MS, Barton GJ. Probabilistic prediction and ranking of human protein-protein interactions. BMC Bioinformatics. 2007;8:239. [PMC free article] [PubMed]
12. Mishra GR, Suresh M, Kumaran K, Kannabiran N, Suresh S, Bala P, Shivakumar K, Anuradha N, Reddy R, Raghavan TM, et al. Human protein reference database–2006 update. Nucleic Acids Res. 2006;34:D411–D414. [PMC free article] [PubMed]
13. Peri S, Navarro JD, Amanchy R, Kristiansen TZ, Jonnalagadda CK, Surendranath V, Niranjan V, Muthusamy B, Gandhi TKB, Gronborg M, et al. Development of human protein reference database as an initial platform for approaching systems biology in humans. Genome Res. 2003;13:2363–2371. [PMC free article] [PubMed]
14. Salwinski L, Miller CS, Smith AJ, Pettit FK, Bowie JU, Eisenberg D. The Database of Interacting Proteins: 2004 update. Nucleic Acids Res. 2004;32:D449–D451. [PMC free article] [PubMed]
15. Kerrien S, Alam-Faruque Y, Aranda B, Bancarz I, Bridge A, Derow C, Dimmer E, Feuermann M, Friedrichsen A, Huntley R, et al. IntAct–open source resource for molecular interaction data. Nucleic Acids Res. 2007;35:D561–D565. [PMC free article] [PubMed]
16. Breitkreutz BJ, Stark C, Reguly T, Boucher L, Breitkreutz A, Livstone M, Oughtred R, Lackner DH, Bahler J, Wood V, et al. The BioGRID interaction database: 2008 update. Nucleic Acids Res. 2008;36:D637–D640. [PMC free article] [PubMed]
17. Chatr-aryamontri A, Ceol A, Palazzi LM, Nardelli G, Schneider MV, Castagnoli L, Cesareni G. MINT: the Molecular INTeraction database. Nucleic Acids Res. 2007;35:D572–D574. [PMC free article] [PubMed]
18. Hart GT, Ramani AK, Marcotte EM. How complete are current yeast and human protein-interaction networks? Genome Biol. 2006;7:120. [PMC free article] [PubMed]
19. Brown KR, Jurisica I. Online predicted human interaction database. Bioinformatics. 2005;21:2076–2082. [PubMed]
20. Huang T-W, Tien A-C, Huang W-S, Lee Y-CG, Peng C-L, Tseng H-H, Kao C-Y, Huang C-YF. POINT: a database for the prediction of protein-protein interactions based on the orthologous interactome. Bioinformatics. 2004;20:3273–3276. [PubMed]
21. von Mering C, Jensen LJ, Kuhn M, Chaffron S, Doerks T, Kruger B, Snel B, Bork P. STRING 7 - recent developments in the integration and prediction of protein interactions. Nucleic Acids Res. 2007;35:D358–D362. [PMC free article] [PubMed]
22. Kersey PJ, Duarte J, Williams A, Karavidopoulou Y, Birney E, Apweiler R. The International Protein Index: an integrated database for proteomics experiments. Proteomics. 2004;4:1985–1988. [PubMed]
23. Berglund AC, Sjolund E, Ostlund G, Sonnhammer ELL. InParanoid 6: eukaryotic ortholog clusters with inparalogs. Nucleic Acids Res. 2008;36:D263–D266. [PMC free article] [PubMed]
24. Mulder NJ, Apweiler R, Attwood TK, Bairoch A, Bateman A, Binns D, Bork P, Buillard V, Cerutti L, Copley R, et al. New developments in the InterPro database. Nucleic Acids Res. 2007;35:D224–D228. [PMC free article] [PubMed]
25. Pruitt KD, Tatusova T, Maglott DR. NCBI reference sequences (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins. Nucleic Acids Res. 2007;35:D61–D65. [PMC free article] [PubMed]
26. Bairoch A, Bougueleret L, Altairac S, Amendolia V, Auchincloss A, Puy GA, Axelsen K, Baratin D, Blatter MC, Boeckmann B, et al. The Universal Protein Resource (UniProt) Nucleic Acids Res. 2008;36:D190–D195.
27. Wheeler DL, Barrett T, Benson DA, Bryant SH, Canese K, Chetvernin V, Church DM, DiCuccio M, Edgar R, Federhen S, et al. Database resources of the National Center for Biotechnology Information. Nucleic Acids Res. 2008;36:D13–D21. [PMC free article] [PubMed]
28. Alfarano C, Andrade CE, Anthony K, Bahroos N, Bajec M, Bantoft K, Betel D, Bobechko B, Boutilier K, Burgess E, et al. The Biomolecular Interaction Network Database and related tools 2005 update. Nucleic Acids Res. 2005;33:D418–D424. [PMC free article] [PubMed]
29. Smith M, Kunin V, Goldovsky L, Enright AJ, Ouzounis CA. MagicMatch–cross-referencing sequence identifiers across databases. Bioinformatics. 2005;21:3429–3430. [PubMed]
30. Altschul SF, Madden TL, Schäffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 1997;25:3389–3402. [PMC free article] [PubMed]
31. Mewes HW, Dietmann S, Frishman D, Gregory R, Mannhaupt G, Mayer KFX, Munsterkotter M, Ruepp A, Spannagl M, Stuempflen V, et al. MIPS: analysis and annotation of genome information in 2007. Nucleic Acids Res. 2008;36:D196–D201. [PMC free article] [PubMed]

Articles from Nucleic Acids Research are provided here courtesy of Oxford University Press
PubReader format: click here to try

Formats:

Related citations in PubMed

See reviews...See all...

Cited by other articles in PMC

See all...

Links

Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...