• We are sorry, but NCBI web applications do not support your browser and may not function properly. More information
Logo of pnasPNASInfo for AuthorsSubscriptionsAboutThis Article
Proc Natl Acad Sci U S A. May 13, 2008; 105(19): 6959–6964.
Published online May 12, 2008. doi:  10.1073/pnas.0708078105
PMCID: PMC2383957
From the Cover
Applied Mathematics, Evolution

Estimating the size of the human interactome


After the completion of the human and other genome projects it emerged that the number of genes in organisms as diverse as fruit flies, nematodes, and humans does not reflect our perception of their relative complexity. Here, we provide reliable evidence that the size of protein interaction networks in different organisms appears to correlate much better with their apparent biological complexity. We develop a stable and powerful, yet simple, statistical procedure to estimate the size of the whole network from subnet data. This approach is then applied to a range of eukaryotic organisms for which extensive protein interaction data have been collected and we estimate the number of interactions in humans to be ≈650,000. We find that the human interaction network is one order of magnitude bigger than the Drosophila melanogaster interactome and ≈3 times bigger than in Caenorhabditis elegans.

Keywords: evolutionary systems biology, network inference, network sampling theory, network evolution

One of the perhaps most surprising results of the genome-sequencing projects was that the number of genes is much lower than had been expected and is, in fact, surprisingly similar for very different organisms (1, 2). For example, the nematode Caenorhabditis elegans appears to have a similar number of genes as humans, whereas rice and maize appear to have even more genes than humans. It was then quickly suggested that the biological complexity of organisms is not reflected merely by the number of genes but by the number of physiologically relevant interactions (1, 3). In addition to alternative splice variants (4), posttranslational processes (5), and other (e.g., genetic) factors influencing gene expression (6, 7), the structure of interactome is one of the crucial factors underlying the complexity of biological organisms. Here, we focus on the wealth of available protein interaction data and demonstrate that it is possible to arrive at a reliable statistical estimate for the size of these interaction networks. This approach is then used to assess the complexity of protein interaction networks in different organisms from present incomplete and noisy protein interaction datasets.

There are now fairly extensive protein interaction network (PIN) datasets in a number of species, including humans (8, 9). These have been generated by a variety of experimental techniques (as well as some in silico inferences). Although these techniques and the resulting data are (i) notoriously prone to false positives and negatives (10, 11), and (ii) result in highly idealized and averaged network structures (12), such interaction datasets are increasingly turning into useful tools for the analysis of the functional (e.g., ref. 13) and evolutionary properties (14) of biological systems. In particular, in Saccharomyces cerevisiae we are beginning to have a fairly complete description of the protein interaction network that is accessible with current experimental technologies; the recent high-quality literature-curated dataset of Reguly et al. (15) provides us with a dataset that should be almost completely free from false positives. For most other organisms, however, interaction data are still far from complete and it has recently been shown that subnetworks, in general, have qualitatively different properties from the true network (1618). Although the importance of network-sampling properties had only been realized relatively recently, this aspect of most systems biology data are increasingly being recognized (11, 19) as important.

There are, however, some properties of the true network that can be inferred even from subnet data, and here we show that the total network size is one property for which this is the case. Present protein-interaction datasets enable us to estimate the size of the interactomes in different species by using graph theoretical invariants. This is particularly interesting for species where more than one experimental dataset is available. Below we first describe a robust and very general estimator of network size from partial network data that overcomes this problem. We then apply it to available PIN data in a range of eukaryotic organisms. In supporting information (SI) Text we demonstrate the power of this approach by using extensive simulation studies.

Estimating Interactome Size

Here, we develop an approach for estimating the size of a network from incomplete data. We will show below (and by using extensive simulations in SI Text) that for a given species estimates from different independent datasets—generated by different methods such as yeast-two-hybrid and TAP tagging—yield estimates for the interactome size that are in excellent agreement.

We are concerned with a true network, [mathematical script N], which has N[mathematical script N] nodes and M[mathematical script N] edges. The sets of nodes and edges are given by V[mathematical script N] and x2130[mathematical script N], respectively; these define the graph representation of the true network:

equation image

We pick a subset of nodes VS [subset, dbl equals] V[mathematical script N] and study properties of the subgraph GS induced by the nodes in VS

equation image

where the set of edges observed in the S is a subset of the total set of edges, x2130S [subset, dbl equals] x2130[mathematical script N]. Our aim is to predict the number of interactions in the true network G[mathematical script N] based on the available data in the subnet, GS.

We assume that the network, G[mathematical script N], is generated according to some (unknown) model characterized by a parameter (vector) θ, and subsequently the observed network, GS is sampled from it. Then

equation image

where it is assumed that the sampling is independent of the network-generating model. The parameter p refers to a general sampling process, and not only independent node sampling. Furthermore, we assume the order N[mathematical script N] of the network is known and allow nodes to be annotated with information not related to the wiring of the network (e.g., GO terms or protein family classes). Consequently, the sum is over networks, G[mathematical script N], with N[mathematical script N] (labeled) nodes only. For convenience, we take labeling information to be included in N[mathematical script N] and NS (the order of GS).

If sampling only depends on the nodes in the network and not on their connections, then Pp(GS|G[mathematical script N]) splits into a product of two terms,

equation image

where Qp(NS) is a term denoting the probability of sampling the nodes in the observed PIN and q(GS, G[mathematical script N]) denotes how many ways this can be done given the (labeled) nodes in G[mathematical script N]—by assumption, labeling of nodes is the same in all possible G[mathematical script N]s. For example, if the nodes are unlabeled and have degree zero, then q(GS, G[mathematical script N]) = (NSN[mathematical script N]). If all nodes have degree one, a similar factor can be derived based on the number of degree one (d1) and degree zero (d0) nodes that are observed in the PIN: q(GS, G[mathematical script N]) is the number of ways one can choose d0 and d1 out of the N[mathematical script N]/2 pairs of connected nodes in G[mathematical script N]. If all nodes are labeled then q(GS, G[mathematical script N]) = 1, because one can only select the nodes in the PIN in one way.

It follows that Qp(N[mathematical script N]) is sufficient for inference on p (the remaining part of the likelihood does not depend on p). In the case of independent node sampling, each node with probability p, we have Qp(NS) = pN[mathematical script N] (1 − p)N[mathematical script N]NS and the maximum likelihood estimate of p is

equation image

which is unbiased and consistent.

From the likelihood Eq. 4 it follows that

equation image

where G*[mathematical script N] is a specific network (to distinguish it from the sum over all networks in the denominator). Note that this conditional probability does not depend on p and that, in principle, we can only gain knowledge about the interactome if something is assumed about the network-generating model. Note also that this is a general restriction that is not related to independent node sampling alone.

A reasonable estimate of the edge probability in G[mathematical script N] is

equation image

where MS is the number of edges in the PIN. It leads to the following estimate of the interactome size:

equation image

The estimate is unbiased and consistent provided the network-generating mechanism ensures some form of uniformity, as is the case for random graphs (Figs. S1 and S2). For example, if G[mathematical script N] has a star topology with one node of degree N[mathematical script N] − 1 and the remaining of degree 1, then MS = 0 with probability 1 − p and MS = (N[mathematical script N] − 1)/[p with hat] with probability p; hence, [M with circumflex][mathematical script N] is not consistent. We will demonstrate below that the assumption of independent sampling of nodes is not too restrictive and should apply to many, in particular, high-throughput, experimental studies.

So far we have assumed that the number of ORFs, N[mathematical script N] in an organism is known from genome surveys. Total genome size is, however, still not precisely known in most organisms. Uncertainty in the N[mathematical script N] is, however, easily incorporated. Assume that the value N[mathematical script N] is associated with an error or uncertainty ε (i.e., if the genome contains N0 protein-coding genes of which N[mathematical script N] are known, then ε = (N0N[mathematical script N])/N0. Then let N[mathematical script N] := N0 (1 ± ε) and for ε [less, similar] 0.1 we have

equation image

Replacing [p with hat] in Eq. 8 with ~p yields the error-corrected estimate for the true network size

equation image

Thus, an uncertainty of ε in the number of nodes in the true network results in an uncertainty of 2ε for the number of edges in the true network.

To assess the variability of the estimator we can construct approximate bootstrap confidence intervals (CI) (20). The number of edges is given by

equation image

in terms of the degree sequence. Now let d = {d1, d2, …, d[mathematical script N]S} be the set of degrees of all of the nodes in the graph GS describing the subnet. Then we generate bootstrap replicates, d*, by sampling the degrees of the nodes in the sample with replacement NS. For each bootstrap replicate, d*, we obtain an estimate M*S (which may be a noninteger because of the factor 1/2 in Eq. 11; this does not affect the estimator). Creating a sufficiently large number of bootstrap replicates, d*, thus allows us to calculate the bootstrap CIs; these have very good coverage properties, as shown in Figs. S3 and S4.

The derivation of Eq. 8 does not depend on any restrictive assumptions (see SI Text) but is a generic property of random graphs and their subnets. Crucially Eq. 8 is valid irrespective of the degree sequence or other summary statistics of the networks††; confidence intervals (CI) and their coverage properties (20) may, however, depend on the degree sequence or network structure. Because there is no sufficient statistic for general networks (17) [i.e., a summary statistic that would include all information about the likelihood (21) of a network] it is also not possible to improve on these estimators by, for example., including the numbers of observed triangles or the clustering coefficient. The only limitation is the assumption of independent sampling. This is, however, also implicit in all previous attempts at estimating interactome sizes (2224). Below we show how nonrandom sampling schemes can be described and how false-positive and false-negative rates of PIN data affect our estimate.

Other Node-Sampling Schemes

The above approach can be generalized for datasets that are ascertained in certain ways and can thus also deal with experimental bias.

Independent but Nonuniform Sampling.

We assume independent sampling of nodes. Let node i have a probability pi for being included in the subnet. We allow pipj and only assume that the pi values are drawn independently from the same probability distribution,

equation image

where α is a parameter (potentially vector valued). The properties of F are not of importance. It follows that [p with hat] is unbiased

equation image

and also consistent, because

equation image

for large networks. Now consider an edge eij (ij); then the probability of observing this edge in the subnet is

equation image


equation image

Likewise [p with hat]2 is consistent (25), hence also unbiased for large networks.

Dependent Sampling.

Here, we assume as above that pi is drawn from some probability distribution, pi ~ Fi(α) that might, however, depend on information related to node i, for example, the degree or functional classification of i; that is, Fi(α) = F(α; Di), where Di denotes this information. Although measures for expression abundance may be such a factor, this appears not to be the case for the datasets considered here (Fig. S5). Hence, we might take Di as an additional parameter in the function F.

In addition, we assume the network is uncorrelated with respect to this information, that is, P(Di, Dj) = P(Di)P(Dj); and, given the probabilities pi, we assume nodes are drawn independently of each other. This assumption is justified for all networks in which the degree–degree correlation of interacting nodes is determined by the degree distribution. This is approximately the case for the networks considered here‡‡. It follows that

equation image

that is, [p with hat] is unbiased. Note that

equation image

which in turn leads to

equation image

and consequently consistency. Likewise, it follows that Eij) = E(pipj) = left angle bracketpright angle bracket2, and that the edge sampling probability consistently is estimated by [p with hat]2.

Effects of Uncertain Data on Estimated Interactome Sizes

So far we have assumed that the interaction data are correct. This is not the case for protein interaction data (1012, 2628). Here, we show that it is possible to include noisy data and that the estimates given in Table 1(see also Fig. 2) are not likely to change severely (e.g., by an order of magnitude) for realistic rates of false positives and false negatives. We note that the sampling theory developed in the previous sections needs modification to take false positives and false negatives into account; for example, the sum in Eq. 4 should be over all possible networks and not just those containing the observed PIN data.

Table 1.
Dataset properties and predicted interactome sizes
Fig. 2.
Estimated interactome sizes for humans and three other eukaryotic species for which high-throughput interaction data are available. The letters denote the approximate position of the point estimate, [M with circumflex][mathematical script N], and the horizontal bars ...

Let the number of true interactions in a network with N nodes be denoted by M; if the data collection process is not perfect, then (assuming independence) the number of reported interactions, [M with tilde] will generally be different from M. Now let MTP, MFN, MFP, and MTN denote the true-positive, false-negative, false-positive, and true-negative results, respectively. We trivially have

equation image


equation image

The rates for true positives and false negatives are defined by

equation image

equation image

Thus, for a given number of reported edges/interactions and estimates of the true-positive and false-negative rates, [mu] and [rho with circumflex], we obtain an estimate for the true number of interactions

equation image

Thus, for a fixed network (or subnet) the false-positive and false-negative rates affect the estimates of the true number of interactions in a simple linear manner (see Fig. S6).


We use Eq. 8 to estimate interactome sizes in humans and three other eukaryotic organisms: S. cerevisiae (2932), C. elegans (33), and D. melanogaster (3436). But we begin with an illustration of the power of this simple estimator by applying it to S. cerevisiae PIN data; here, we have treated the presently available PIN data as a proxy for a complete “interaction network” whose size we are trying to predict. In Fig. 1A we show the distributions of estimates obtained from 1,000 randomly chosen subnets covering 20%, 40%, 60%, and 80% of the available PIN data [taken from the Database of Interacting Proteins (DIP) (37)]. In Fig. 1B we show the coverage properties of the bootstrap 95% CIs for sampling the same sampling fractions. Together with the simulation studies discussed in SI Text, the results in Fig. 1 suggest that the estimator [M with circumflex][mathematical script N] provides an accurate and reliable way of estimating interactome sizes from present data. Interactome size estimates and their CIs for experimental PIN datasets are shown in Table 1 and Fig. 2 for the organisms considered here. The DIP datasets (always shown in green) are mainly based on high-throughput studies, supplemented by interactions collected from the literature; as such, they generally cannot be treated as independent from the other datasets. For humans, however, there is negligible overlap between the DIP databases and the two recent high-throughput surveys and we can treat the three estimators as approximately independent.

Fig. 1.
Performance of the estimator, Eq. 8, for the yeast network. Here, the DIP dataset was taken as a gold-standard “true” interaction network. (A) True network size (red bars) and histograms of predicted sizes for subnets that were created ...

Based on the results in Table 1 and Fig. 2, we would therefore expect—given present experimental methods and ignoring multiple splice variants—the human interactome to contain ≈650,000 protein interactions. Thus, it is approximately an order of magnitude larger than the estimated D. melanogaster interactome, and a factor of 3 more complex than the estimated C. elegans interactome; this contrasts with relative genome sizes of ≈1.8 and ≈1.2, respectively. The results for the S. cerevisiae PIN suggest that it will ultimately contain ≈25,000–35,000 interactions (see also Table 1); this agrees well with previous estimates (22, 23). It also agrees well with estimates obtained from the recent data generated by Reguly et al. (15): for the pure literature-curated set we obtained 37,000 interactions; for the complete network data we obtained an estimate of ≈35,000 interactions in the yeast PIN. These two datasets were, however, collected from the literature and the sampling process is thus much harder, perhaps even impossible, to model accurately.

By using Eq. 24 the impacts of false-positive and false-negative rates are easily assessed (see also ref. 38). We find that the linear effect of the error rates on the estimated number of true interactions results in a comparatively modest effect. The estimates of the true-positive rates in PIN datasets range from 35% (33) to 84% (34); there are fewer estimates for the false-negative rate that are on the order of 20–40% (10) obtained for different S. cerevisiae datasets. It appears that, for realistic rates of true positive and false positive, the estimate of the human interactome size remains very similar compared to the simple estimate obtained in this article of ≈650,000 protein–protein interactions. Similar curves can be drawn for the other species, too, and in each case we obtain comparable values for most combinations of realistic error rates. Thus, we believe that error rates exert a comparatively moderate effect on the estimator (Eq. 8).

Overall, it therefore appears that estimates obtained from Eq. 8 should be accurate to within less than an order of magnitude even under the very worst circumstances. A much more realistic estimate, however, can be obtained from comparing the different and essentially independent estimates for S. cerevisiae. These findings suggest that an accuracy of approximately a factor of 2 is more realistic. Reassuringly, these results are confirmed when applying a recent multimodel inference procedure (39) that deals with incomplete network data.


We have shown that it is possible to estimate the size of interactomes reliably from present partial interaction data. Our estimator is powerful and robust, relying on assumptions that appear to be met by typical systematic high-throughput studies. Unlike the previous approach of Hart et al. (24), who implicitly assume that interactions do not occur between surveyed proteins and those not yet surveyed, our estimate deals with missing data in a coherent and statistically meaningful manner; the route taken by Grigoriev (23) can be understood as a special case of the present approach when two or more datasets are available. Moreover, noise and different sampling/ascertainment strategies are straightforwardly included in the analysis (38, 40). We have illustrated the power of this approach by using simulated sampling processes in S. cerevisiae and have found that the estimator, Eq. 8, and the bootstrap confidence intervals have very good coverage properties. We have then applied this inferential framework to published datasets in four eukaryotic organisms. We found that the predicted interactome sizes differ quite considerably between these species. For example, the human interactome appears to be an order of magnitude larger than the D. melanogaster interactome. Unfortunately, for maize and rice, which have comparable or even larger number of genes to humans, only tiny PIN datasets are available and we cannot obtain useful estimates for their respective interactome sizes. If conventional assumptions about the different complexity of organisms are indeed correct, and if interactome size does reflect organismic complexity (13, 41), then we would expect these organisms to have smaller interactomes than humans. The increase of interactome size with number of proteins/ORFs should thus not be uniform or even monotonic. We note that the estimate of ≈ 650,000 interactions means that the human PIN will still be relatively sparse: this corresponds to only ≈0.2% of all possible pairwise interactions being present; for most other species, however, the network is even sparser.

There are a number of other factors that may contribute to an explanation of the increase in phenotypic bauplan complexity between species: the diversity of the transcriptome (42) and protein-domain architecture (43) have all been implicated in the literature. Here, we have demonstrated that interactome sizes are consistent with biological intuition about the complexity of eukaryotic organisms. We note that our estimator is very flexible and reflects the quality of present data: we predict the number of interactions that are detectable given present experimental technology. For example, we have not considered (physiologically probably very important) transient or condition-specific interactions. Should more sensitive and reliable experimental methodologies or better estimates of experimental error rates become available in the future, then Eq. 8 can, of course, be used to predict an updated number of protein–protein interactions for an organism. Our formalism is also readily extended to directed network data (such as gene-regulation networks).

As a final note, we want to stress that the estimates necessarily reflect experimental technology. Thus, the estimates in Table 1 refer only to the types of interactions that are detectable given present experimental methods and protocols. The estimator for the size of the true network, however, will remain universally correct for suitable datasets and for all types of networks. We will thus be able to use it in the future and apply it to other network datasets as well.

Supplementary Material

Supporting Information:


This work was supported by the Wellcome Trust (M.P.H.S., E.d.S., and T.T.), the Royal Society and the Carlsberg Foundation (M.P.H.S. and C.W.), and an EMBO Young Investigator fellowship (to M.P.H.S.). C.W. is supported by the Danish Research Council.


The authors declare no conflict of interest.

This article is a PNAS Direct Submission.

See Commentary on page 6795.

This article contains supporting information online at www.pnas.org/cgi/content/full/0708078105/DCSupplemental.

††Eq. 8 is a general result for general (random) graphs; it is equally true for all ensembles of random graphs such as Erdös–Rényi and scale-free random graphs. In SI Text we further illustrate the simple quadratic relationship by using simulations.

‡‡The degree–degree distribution is not significantly different from the product degree distribution (by using the Kolmogorov–Smirnov test); that is, P(k, l) ≈ P(k)P(l) for the datasets considered here.


1. Lander E, et al. Initial sequencing and analysis of the human genome. Nature. 2001;409:860–921. [PubMed]
2. Venter J, et al. The sequence of the human genome. Science. 2001;291:1304–1351. [PubMed]
3. Copley R. The animal in the genome: comparative genomics and evolution. Philos Trans R Soc London Ser B. 2008;363:1453–1461. [PMC free article] [PubMed]
4. Tian B, Pan Z, Lee JY. Widespread mRNA polyadenylation events in introns indicate dynamic interplay between polyadenylation and splicing. Genome Res. 2007;17:156–165. [PMC free article] [PubMed]
5. Henikoff S. Histone modifications: Combinatorial complexity or cumulative simplicity? Proc Natl Acad Sci USA. 2005;102:5308–5309. [PMC free article] [PubMed]
6. Hegde RS, Bernstein HD. The surprising complexity of signal sequences. Trends Biochem Sci. 2006;31:563–571. [PubMed]
7. Stranger BE, et al. Population genomics of human gene expression. Nat Genet. 2007;39:1217–1224. [PMC free article] [PubMed]
8. Stelzl U, et al. A human protein–protein interaction network: A resource for annotating the proteome. Cell. 2005;122:957–968. [PubMed]
9. Rual J, et al. Towards a proteome-scale map of the human protein–protein interaction network. Nature. 2005;437:1173–1178. [PubMed]
10. Bader JS, Chaudhuri A, Rothberg JM, Chant J. Gaining confidence in high-throughput protein interaction networks. Nat Biotechnol. 2004;22:78–85. [PubMed]
11. Deeds EJ, Ashenberg O, Shaknovich EI. A simple physical model for scaling in protein–protein interaction networks. Proc Natl Acad Sci USA. 2006;103:311–316. [PMC free article] [PubMed]
12. de Silva E, Stumpf M. Complex networks and simple models in biology. J R Soc Interface. 2005;2:419–430. [PMC free article] [PubMed]
13. Rives AW, Galitski T. Modular organization of cellular networks. Proc Natl Acad Sci USA. 2003;100:1128–1133. [PMC free article] [PubMed]
14. Stumpf M, Kelly W, Thorne T, Wiuf C. Evolution at the system level: The natural history of protein interaction networks. Trends Ecol Evol. 2007;22:366–373. [PubMed]
15. Reguly T, et al. Comprehensive curation and analysis of global interaction networks in Saccharomyces cerevisiae. J Biol. 2006;5:11. [PMC free article] [PubMed]
16. Stumpf M, Wiuf C, May R. Subnets of scale-free networks are not scale-free: Sampling properties of networks. Proc Natl Acad Sci USA. 2005;102:4221–4224. [PMC free article] [PubMed]
17. Stumpf M, Wiuf C. Sampling properties of random graphs: The degree distribution. Phys Rev E. 2005;72 036118. [PubMed]
18. Wiuf C, Stumpf M. Binomial subsampling. Proc R Soc A. 2006;462:1181–1195.
19. Han J, Dupuy D, Bertin N, Cusick M, Vidal M. Effect of sampling on topology predictions of protein–protein interaction networks. Nat Biotechnol. 2005;23:839–844. [PubMed]
20. Efron B, Tibshirani R. An Introduction to the Bootstrap. New York: Chapman & Hall/CRC; 1998.
21. Cox D, Hinkley D. Theoretical Statistics. New York: Chapman&Hall/CRC; 1974.
22. Hazbun T, Fields S. Networking proteins in yeast. Proc Natl Acad Sci USA. 2001;98:4277–4278. [PMC free article] [PubMed]
23. Grigoriev A. On the number of protein–protein interactions in the yeast proteome. Nucleic Acids Res. 2003;31:4157–4161. [PMC free article] [PubMed]
24. Hart G, Ramani A, Marcotte E. How complete are current yeast and human protein-interaction networks? Genome Biol. 2006;7:120. [PMC free article] [PubMed]
25. Silvey S. Statistical Inference. New York: Chapman & Hall; 1975.
26. von Mering C, et al. Comparative assessment of large-scale data sets of protein–protein interactions. Nature. 2002;417:399–403. [PubMed]
27. Lappe M, Holm L. Unraveling protein interaction networks with near-optimal efficiency. Nat Biotechnol. 2004;22:98–103. [PubMed]
28. Uetz P, Finley R. From protein networks to biological systems. FEBS Lett. 2005;579:1821–1827. [PubMed]
29. Uetz P, et al. A comprehensive analysis of protein–protein interactions in Saccharomyces cerevisiae. Nature. 2000;403:623–627. [PubMed]
30. Ito T, et al. A comprehensive two-hybrid analysis to explore the yeast protein interactome. Proc Natl Acad Sci USA. 2001;98:4569–4574. [PMC free article] [PubMed]
31. Ho Y, et al. Systematic identification of protein complexes in Saccharomyces cerevisiae by mass spectrometry. Nature. 2002;415:180–183. [PubMed]
32. Gavin AC, et al. Functional organization of the yeast proteome by systematic analysis of protein complexes. Nature. 2002;415:141–147. [PubMed]
33. Li S, et al. A map of the interactome network of the metazoan C. elegans. Science. 2004;303:540–543. [PMC free article] [PubMed]
34. Giot L, et al. A protein interaction map of Drosophila melanogaster. Science. 2003;302:1727–1736. [PubMed]
35. Stanyon C, et al. A Drosophila protein-interaction map centered on cell-cycle regulators. Genome Biol. 2004;5:R96. [PMC free article] [PubMed]
36. Formstecher E, et al. Protein interaction mapping: A Drosophila case study. Genome Res. 2005;15:376–384. [PMC free article] [PubMed]
37. Duan X, Xenarios I, Eisenberg D. Describing biological protein interactions in terms of protein states and state transitions: The LiveDIP database. Mol Cell Proteomics. 2002;1:104–116. [PubMed]
38. de Silva E, et al. The effects of incomplete protein interaction data on structural and evolutionary inferences. BMC Biol. 2006;4:39. [PMC free article] [PubMed]
39. Stumpf M, Thorne T. Multimodel inference of network properties from incomplete data. J Integr Bioinf. 2006;3:32.
40. Lin N, Zhao H. Are scale-free networks robust to measurement errors? BMC Bioinformatics. 2005;6:119. [PMC free article] [PubMed]
41. Tucker C, Gera J, Uetz P. Towards an understanding of complex protein networks. Trends Cell Biol. 2001;11:102–106. [PubMed]
42. Carninci P, et al. The transcriptional landscape of the mammalian genome. Science. 2005;309:1559–1563. [PubMed]
43. Chothia C, Gough J, Vogel C, Teichmann S. Evolution of the protein repertoire. Science. 2003;300:1701–1703. [PubMed]

Articles from Proceedings of the National Academy of Sciences of the United States of America are provided here courtesy of National Academy of Sciences
PubReader format: click here to try


Related citations in PubMed

See reviews...See all...

Cited by other articles in PMC

See all...