![]() | ![]() |
Formats:
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Copyright © 2008 by The National Academy of Sciences of the USA Applied Mathematics, Evolution From the Cover Estimating the size of the human interactome †Division of Molecular Biosciences, Imperial College London, Wolfson Building, London SW7 2AZ, United Kingdom; ‡Institute of Mathematical Sciences, Imperial College London, London SW7 2AZ, United Kingdom; ‖Max Planck Institute for Molecular Genetics, 14195 Berlin, Germany; and ¶Bioinformatics Research Center, University of Aarhus, 8000 Aarhus C, Denmark §To whom correspondence may be addressed. E-mail: m.stumpf/at/imperial.ac.uk or Email: wiuf/at/birc.au.dk Edited by Burton H. Singer, Princeton University, Princeton, NJ, and approved February 19, 2008 Author contributions: M.P.H.S., M.L., and C.W. designed research; M.P.H.S., T.T., E.d.S., M.L., and C.W. performed research; M.P.H.S., T.T., E.d.S., R.S., H.J.A., and C.W. analyzed data; and M.P.H.S., M.L., and C.W. wrote the paper. Received August 27, 2007. See commentary "A truer measure of our ignorance" on page 6795. This article has been cited by other articles in PMC.Abstract After the completion of the human and other genome projects it emerged that the number of genes in organisms as diverse as fruit flies, nematodes, and humans does not reflect our perception of their relative complexity. Here, we provide reliable evidence that the size of protein interaction networks in different organisms appears to correlate much better with their apparent biological complexity. We develop a stable and powerful, yet simple, statistical procedure to estimate the size of the whole network from subnet data. This approach is then applied to a range of eukaryotic organisms for which extensive protein interaction data have been collected and we estimate the number of interactions in humans to be ≈650,000. We find that the human interaction network is one order of magnitude bigger than the Drosophila melanogaster interactome and ≈3 times bigger than in Caenorhabditis elegans. Keywords: evolutionary systems biology, network inference, network sampling theory, network evolution One of the perhaps most surprising results of the genome-sequencing projects was that the number of genes is much lower than had been expected and is, in fact, surprisingly similar for very different organisms (1, 2). For example, the nematode Caenorhabditis elegans appears to have a similar number of genes as humans, whereas rice and maize appear to have even more genes than humans. It was then quickly suggested that the biological complexity of organisms is not reflected merely by the number of genes but by the number of physiologically relevant interactions (1, 3). In addition to alternative splice variants (4), posttranslational processes (5), and other (e.g., genetic) factors influencing gene expression (6, 7), the structure of interactome is one of the crucial factors underlying the complexity of biological organisms. Here, we focus on the wealth of available protein interaction data and demonstrate that it is possible to arrive at a reliable statistical estimate for the size of these interaction networks. This approach is then used to assess the complexity of protein interaction networks in different organisms from present incomplete and noisy protein interaction datasets. There are now fairly extensive protein interaction network (PIN) datasets in a number of species, including humans (8, 9). These have been generated by a variety of experimental techniques (as well as some in silico inferences). Although these techniques and the resulting data are (i) notoriously prone to false positives and negatives (10, 11), and (ii) result in highly idealized and averaged network structures (12), such interaction datasets are increasingly turning into useful tools for the analysis of the functional (e.g., ref. 13) and evolutionary properties (14) of biological systems. In particular, in Saccharomyces cerevisiae we are beginning to have a fairly complete description of the protein interaction network that is accessible with current experimental technologies; the recent high-quality literature-curated dataset of Reguly et al. (15) provides us with a dataset that should be almost completely free from false positives. For most other organisms, however, interaction data are still far from complete and it has recently been shown that subnetworks, in general, have qualitatively different properties from the true network (16–18). Although the importance of network-sampling properties had only been realized relatively recently, this aspect of most systems biology data are increasingly being recognized (11, 19) as important. There are, however, some properties of the true network that can be inferred even from subnet data, and here we show that the total network size is one property for which this is the case. Present protein-interaction datasets enable us to estimate the size of the interactomes in different species by using graph theoretical invariants. This is particularly interesting for species where more than one experimental dataset is available. Below we first describe a robust and very general estimator of network size from partial network data that overcomes this problem. We then apply it to available PIN data in a range of eukaryotic organisms. In supporting information (SI) Text we demonstrate the power of this approach by using extensive simulation studies. Estimating Interactome Size Here, we develop an approach for estimating the size of a network from incomplete data. We will show below (and by using extensive simulations in SI Text) that for a given species estimates from different independent datasets—generated by different methods such as yeast-two-hybrid and TAP tagging—yield estimates for the interactome size that are in excellent agreement. We are concerned with a true network, , which has N nodes and M edges. The sets of nodes and edges are given by ![]() and ![]() , respectively; these define the graph representation of the true network:
![]() ![]() and study properties of the subgraph G induced by the nodes in ![]()
is a subset of the total set of edges, ![]() ![]() . Our aim is to predict the number of interactions in the true network G based on the available data in the subnet, G .We assume that the network, G , is generated according to some (unknown) model characterized by a parameter (vector) θ, and subsequently the observed network, G is sampled from it. Then
of the network is known and allow nodes to be annotated with information not related to the wiring of the network (e.g., GO terms or protein family classes). Consequently, the sum is over networks, G , with N (labeled) nodes only. For convenience, we take labeling information to be included in N and N (the order of G ).If sampling only depends on the nodes in the network and not on their connections, then Pp(G |G ) splits into a product of two terms,
) is a term denoting the probability of sampling the nodes in the observed PIN and q(G , G ) denotes how many ways this can be done given the (labeled) nodes in G —by assumption, labeling of nodes is the same in all possible G s. For example, if the nodes are unlabeled and have degree zero, then q(G , G ) = (N N ). If all nodes have degree one, a similar factor can be derived based on the number of degree one (d1) and degree zero (d0) nodes that are observed in the PIN: q(G , G ) is the number of ways one can choose d0 and d1 out of the N /2 pairs of connected nodes in G . If all nodes are labeled then q(G , G ) = 1, because one can only select the nodes in the PIN in one way.It follows that Qp(N ) is sufficient for inference on p (the remaining part of the likelihood does not depend on p). In the case of independent node sampling, each node with probability p, we have Qp(N ) = pN (1 − p)N − N and the maximum likelihood estimate of p is
From the likelihood Eq. 4 it follows that
is a specific network (to distinguish it from the sum over all networks in the denominator). Note that this conditional probability does not depend on p and that, in principle, we can only gain knowledge about the interactome if something is assumed about the network-generating model. Note also that this is a general restriction that is not related to independent node sampling alone.A reasonable estimate of the edge probability in G is
is the number of edges in the PIN. It leads to the following estimate of the interactome size:
has a star topology with one node of degree N − 1 and the remaining of degree 1, then M = 0 with probability 1 − p and M = (N − 1)/ with probability p; hence, ![]() is not consistent. We will demonstrate below that the assumption of independent sampling of nodes is not too restrictive and should apply to many, in particular, high-throughput, experimental studies.So far we have assumed that the number of ORFs, N in an organism is known from genome surveys. Total genome size is, however, still not precisely known in most organisms. Uncertainty in the N is, however, easily incorporated. Assume that the value N is associated with an error or uncertainty ε (i.e., if the genome contains N0 protein-coding genes of which N are known, then ε = (N0 − N )/N0. Then let N := N0 (1 ± ε) and for ε 0.1 we have
in Eq. 8 with ~p yields the error-corrected estimate for the true network size
To assess the variability of the estimator we can construct approximate bootstrap confidence intervals (CI) (20). The number of edges is given by
![]() } be the set of degrees of all of the nodes in the graph ![]() describing the subnet. Then we generate bootstrap replicates, d*, by sampling the degrees of the nodes in the sample with replacement N . For each bootstrap replicate, d*, we obtain an estimate M* (which may be a noninteger because of the factor 1/2 in Eq. 11; this does not affect the estimator). Creating a sufficiently large number of bootstrap replicates, d*, thus allows us to calculate the bootstrap CIs; these have very good coverage properties, as shown in Figs. S3 and S4.The derivation of Eq. 8 does not depend on any restrictive assumptions (see SI Text) but is a generic property of random graphs and their subnets. Crucially Eq. 8 is valid irrespective of the degree sequence or other summary statistics of the networks††; confidence intervals (CI) and their coverage properties (20) may, however, depend on the degree sequence or network structure. Because there is no sufficient statistic for general networks (17) [i.e., a summary statistic that would include all information about the likelihood (21) of a network] it is also not possible to improve on these estimators by, for example., including the numbers of observed triangles or the clustering coefficient. The only limitation is the assumption of independent sampling. This is, however, also implicit in all previous attempts at estimating interactome sizes (22–24). Below we show how nonrandom sampling schemes can be described and how false-positive and false-negative rates of PIN data affect our estimate. Other Node-Sampling Schemes The above approach can be generalized for datasets that are ascertained in certain ways and can thus also deal with experimental bias. Independent but Nonuniform Sampling. We assume independent sampling of nodes. Let node i have a probability pi for being included in the subnet. We allow pi ≠ pj and only assume that the pi values are drawn independently from the same probability distribution,
is unbiased
2 is consistent (25), hence also unbiased for large networks.Dependent Sampling. Here, we assume as above that pi is drawn from some probability distribution, pi ~ Fi(α) that might, however, depend on information related to node i, for example, the degree or functional classification of i; that is, Fi(α) = F(α; Di), where Di denotes this information. Although measures for expression abundance may be such a factor, this appears not to be the case for the datasets considered here (Fig. S5). Hence, we might take Di as an additional parameter in the function F. In addition, we assume the network is uncorrelated with respect to this information, that is, P(Di, Dj) = P(Di)P(Dj); and, given the probabilities pi, we assume nodes are drawn independently of each other. This assumption is justified for all networks in which the degree–degree correlation of interacting nodes is determined by the degree distribution. This is approximately the case for the networks considered here‡‡. It follows that
is unbiased. Note that
p 2, and that the edge sampling probability consistently is estimated by 2.Effects of Uncertain Data on Estimated Interactome Sizes So far we have assumed that the interaction data are correct. This is not the case for protein interaction data (10–12, 26–28). Here, we show that it is possible to include noisy data and that the estimates given in Table 1(see also Fig. 2
Let the number of true interactions in a network with N nodes be denoted by M; if the data collection process is not perfect, then (assuming independence) the number of reported interactions, will generally be different from M. Now let MTP, MFN, MFP, and MTN denote the true-positive, false-negative, false-positive, and true-negative results, respectively. We trivially have
and , we obtain an estimate for the true number of interactions
Results We use Eq. 8 to estimate interactome sizes in humans and three other eukaryotic organisms: S. cerevisiae (29–32), C. elegans (33), and D. melanogaster (34–36). But we begin with an illustration of the power of this simple estimator by applying it to S. cerevisiae PIN data; here, we have treated the presently available PIN data as a proxy for a complete “interaction network” whose size we are trying to predict. In Fig. 1 ![]() provides an accurate and reliable way of estimating interactome sizes from present data. Interactome size estimates and their CIs for experimental PIN datasets are shown in Table 1 and Fig. 2
Based on the results in Table 1 and Fig. 2 By using Eq. 24 the impacts of false-positive and false-negative rates are easily assessed (see also ref. 38). We find that the linear effect of the error rates on the estimated number of true interactions results in a comparatively modest effect. The estimates of the true-positive rates in PIN datasets range from 35% (33) to 84% (34); there are fewer estimates for the false-negative rate that are on the order of 20–40% (10) obtained for different S. cerevisiae datasets. It appears that, for realistic rates of true positive and false positive, the estimate of the human interactome size remains very similar compared to the simple estimate obtained in this article of ≈650,000 protein–protein interactions. Similar curves can be drawn for the other species, too, and in each case we obtain comparable values for most combinations of realistic error rates. Thus, we believe that error rates exert a comparatively moderate effect on the estimator (Eq. 8). Overall, it therefore appears that estimates obtained from Eq. 8 should be accurate to within less than an order of magnitude even under the very worst circumstances. A much more realistic estimate, however, can be obtained from comparing the different and essentially independent estimates for S. cerevisiae. These findings suggest that an accuracy of approximately a factor of 2 is more realistic. Reassuringly, these results are confirmed when applying a recent multimodel inference procedure (39) that deals with incomplete network data. Discussion We have shown that it is possible to estimate the size of interactomes reliably from present partial interaction data. Our estimator is powerful and robust, relying on assumptions that appear to be met by typical systematic high-throughput studies. Unlike the previous approach of Hart et al. (24), who implicitly assume that interactions do not occur between surveyed proteins and those not yet surveyed, our estimate deals with missing data in a coherent and statistically meaningful manner; the route taken by Grigoriev (23) can be understood as a special case of the present approach when two or more datasets are available. Moreover, noise and different sampling/ascertainment strategies are straightforwardly included in the analysis (38, 40). We have illustrated the power of this approach by using simulated sampling processes in S. cerevisiae and have found that the estimator, Eq. 8, and the bootstrap confidence intervals have very good coverage properties. We have then applied this inferential framework to published datasets in four eukaryotic organisms. We found that the predicted interactome sizes differ quite considerably between these species. For example, the human interactome appears to be an order of magnitude larger than the D. melanogaster interactome. Unfortunately, for maize and rice, which have comparable or even larger number of genes to humans, only tiny PIN datasets are available and we cannot obtain useful estimates for their respective interactome sizes. If conventional assumptions about the different complexity of organisms are indeed correct, and if interactome size does reflect organismic complexity (1–3, 41), then we would expect these organisms to have smaller interactomes than humans. The increase of interactome size with number of proteins/ORFs should thus not be uniform or even monotonic. We note that the estimate of ≈ 650,000 interactions means that the human PIN will still be relatively sparse: this corresponds to only ≈0.2% of all possible pairwise interactions being present; for most other species, however, the network is even sparser. There are a number of other factors that may contribute to an explanation of the increase in phenotypic bauplan complexity between species: the diversity of the transcriptome (42) and protein-domain architecture (43) have all been implicated in the literature. Here, we have demonstrated that interactome sizes are consistent with biological intuition about the complexity of eukaryotic organisms. We note that our estimator is very flexible and reflects the quality of present data: we predict the number of interactions that are detectable given present experimental technology. For example, we have not considered (physiologically probably very important) transient or condition-specific interactions. Should more sensitive and reliable experimental methodologies or better estimates of experimental error rates become available in the future, then Eq. 8 can, of course, be used to predict an updated number of protein–protein interactions for an organism. Our formalism is also readily extended to directed network data (such as gene-regulation networks). As a final note, we want to stress that the estimates necessarily reflect experimental technology. Thus, the estimates in Table 1 refer only to the types of interactions that are detectable given present experimental methods and protocols. The estimator for the size of the true network, however, will remain universally correct for suitable datasets and for all types of networks. We will thus be able to use it in the future and apply it to other network datasets as well. Supporting Information
Acknowledgments. This work was supported by the Wellcome Trust (M.P.H.S., E.d.S., and T.T.), the Royal Society and the Carlsberg Foundation (M.P.H.S. and C.W.), and an EMBO Young Investigator fellowship (to M.P.H.S.). C.W. is supported by the Danish Research Council. Footnotes The authors declare no conflict of interest. This article is a PNAS Direct Submission. See Commentary on page
6795. This article contains supporting information online at www.pnas.org/cgi/content/full/0708078105/DCSupplemental. ††Eq. 8 is a general result for general (random) graphs; it is equally true for all ensembles of random graphs such as Erdös–Rényi and scale-free random graphs. In SI Text we further illustrate the simple quadratic relationship by using simulations. ‡‡The degree–degree distribution is not significantly different from the product degree distribution (by using the Kolmogorov–Smirnov test); that is, P(k, l) ≈ P(k)P(l) for the datasets considered here. References 1. Lander E, et al. Initial sequencing and analysis of the human genome. Nature. 2001;409:860–921. [PubMed] 2. Venter J, et al. The sequence of the human genome. Science. 2001;291:1304–1351. [PubMed] 3. Copley R. The animal in the genome: comparative genomics and evolution. Philos Trans R Soc London Ser B. 2008;363:1453–1461. [PubMed] 4. Tian B, Pan Z, Lee JY. Widespread mRNA polyadenylation events in introns indicate dynamic interplay between polyadenylation and splicing. Genome Res. 2007;17:156–165. [PubMed] 5. Henikoff S. Histone modifications: Combinatorial complexity or cumulative simplicity? Proc Natl Acad Sci USA. 2005;102:5308–5309. [PubMed] 6. Hegde RS, Bernstein HD. The surprising complexity of signal sequences. Trends Biochem Sci. 2006;31:563–571. [PubMed] 7. Stranger BE, et al. Population genomics of human gene expression. Nat Genet. 2007;39:1217–1224. [PubMed] 8. Stelzl U, et al. A human protein–protein interaction network: A resource for annotating the proteome. Cell. 2005;122:957–968. [PubMed] 9. Rual J, et al. Towards a proteome-scale map of the human protein–protein interaction network. Nature. 2005;437:1173–1178. [PubMed] 10. Bader JS, Chaudhuri A, Rothberg JM, Chant J. Gaining confidence in high-throughput protein interaction networks. Nat Biotechnol. 2004;22:78–85. [PubMed] 11. Deeds EJ, Ashenberg O, Shaknovich EI. A simple physical model for scaling in protein–protein interaction networks. Proc Natl Acad Sci USA. 2006;103:311–316. [PubMed] 12. de Silva E, Stumpf M. Complex networks and simple models in biology. J R Soc Interface. 2005;2:419–430. [PubMed] 13. Rives AW, Galitski T. Modular organization of cellular networks. Proc Natl Acad Sci USA. 2003;100:1128–1133. [PubMed] 14. Stumpf M, Kelly W, Thorne T, Wiuf C. Evolution at the system level: The natural history of protein interaction networks. Trends Ecol Evol. 2007;22:366–373. [PubMed] 15. Reguly T, et al. Comprehensive curation and analysis of global interaction networks in Saccharomyces cerevisiae. J Biol. 2006;5:11. [PubMed] 16. Stumpf M, Wiuf C, May R. Subnets of scale-free networks are not scale-free: Sampling properties of networks. Proc Natl Acad Sci USA. 2005;102:4221–4224. [PubMed] 17. Stumpf M, Wiuf C. Sampling properties of random graphs: The degree distribution. Phys Rev E. 2005;72 036118. 18. Wiuf C, Stumpf M. Binomial subsampling. Proc R Soc A. 2006;462:1181–1195. 19. Han J, Dupuy D, Bertin N, Cusick M, Vidal M. Effect of sampling on topology predictions of protein–protein interaction networks. Nat Biotechnol. 2005;23:839–844. [PubMed] 20. Efron B, Tibshirani R. An Introduction to the Bootstrap. New York: Chapman & Hall/CRC; 1998. 21. Cox D, Hinkley D. Theoretical Statistics. New York: Chapman&Hall/CRC; 1974. 22. Hazbun T, Fields S. Networking proteins in yeast. Proc Natl Acad Sci USA. 2001;98:4277–4278. [PubMed] 23. Grigoriev A. On the number of protein–protein interactions in the yeast proteome. Nucleic Acids Res. 2003;31:4157–4161. [PubMed] 24. Hart G, Ramani A, Marcotte E. How complete are current yeast and human protein-interaction networks? Genome Biol. 2006;7:120. [PubMed] 25. Silvey S. Statistical Inference. New York: Chapman & Hall; 1975. 26. von Mering C, et al. Comparative assessment of large-scale data sets of protein–protein interactions. Nature. 2002;417:399–403. [PubMed] 27. Lappe M, Holm L. Unraveling protein interaction networks with near-optimal efficiency. Nat Biotechnol. 2004;22:98–103. [PubMed] 28. Uetz P, Finley R. From protein networks to biological systems. FEBS Lett. 2005;579:1821–1827. [PubMed] 29. Uetz P, et al. A comprehensive analysis of protein–protein interactions in Saccharomyces cerevisiae. Nature. 2000;403:623–627. [PubMed] 30. Ito T, et al. A comprehensive two-hybrid analysis to explore the yeast protein interactome. Proc Natl Acad Sci USA. 2001;98:4569–4574. [PubMed] 31. Ho Y, et al. Systematic identification of protein complexes in Saccharomyces cerevisiae by mass spectrometry. Nature. 2002;415:180–183. [PubMed] 32. Gavin AC, et al. Functional organization of the yeast proteome by systematic analysis of protein complexes. Nature. 2002;415:141–147. [PubMed] 33. Li S, et al. A map of the interactome network of the metazoan C. elegans. Science. 2004;303:540–543. [PubMed] 34. Giot L, et al. A protein interaction map of Drosophila melanogaster. Science. 2003;302:1727–1736. [PubMed] 35. Stanyon C, et al. A Drosophila protein-interaction map centered on cell-cycle regulators. Genome Biol. 2004;5:R96. [PubMed] 36. Formstecher E, et al. Protein interaction mapping: A Drosophila case study. Genome Res. 2005;15:376–384. [PubMed] 37. Duan X, Xenarios I, Eisenberg D. Describing biological protein interactions in terms of protein states and state transitions: The LiveDIP database. Mol Cell Proteomics. 2002;1:104–116. [PubMed] 38. de Silva E, et al. The effects of incomplete protein interaction data on structural and evolutionary inferences. BMC Biol. 2006;4:39. [PubMed] 39. Stumpf M, Thorne T. Multimodel inference of network properties from incomplete data. J Integr Bioinf. 2006;3:32. 40. Lin N, Zhao H. Are scale-free networks robust to measurement errors? BMC Bioinformatics. 2005;6:119. [PubMed] 41. Tucker C, Gera J, Uetz P. Towards an understanding of complex protein networks. Trends Cell Biol. 2001;11:102–106. [PubMed] 42. Carninci P, et al. The transcriptional landscape of the mammalian genome. Science. 2005;309:1559–1563. [PubMed] 43. Chothia C, Gough J, Vogel C, Teichmann S. Evolution of the protein repertoire. Science. 2003;300:1701–1703. [PubMed] |
PubMed related articles
Your browsing activity is empty. Activity recording is turned off. |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Nature. 2001 Feb 15; 409(6822):860-921.
[Nature. 2001]Science. 2001 Feb 16; 291(5507):1304-51.
[Science. 2001]Philos Trans R Soc Lond B Biol Sci. 2008 Apr 27; 363(1496):1453-61.
[Philos Trans R Soc Lond B Biol Sci. 2008]Genome Res. 2007 Feb; 17(2):156-65.
[Genome Res. 2007]Proc Natl Acad Sci U S A. 2005 Apr 12; 102(15):5308-9.
[Proc Natl Acad Sci U S A. 2005]Cell. 2005 Sep 23; 122(6):957-68.
[Cell. 2005]Nature. 2005 Oct 20; 437(7062):1173-8.
[Nature. 2005]Nat Biotechnol. 2004 Jan; 22(1):78-85.
[Nat Biotechnol. 2004]Proc Natl Acad Sci U S A. 2006 Jan 10; 103(2):311-6.
[Proc Natl Acad Sci U S A. 2006]J R Soc Interface. 2005 Dec 22; 2(5):419-30.
[J R Soc Interface. 2005]Proc Natl Acad Sci U S A. 2001 Apr 10; 98(8):4277-8.
[Proc Natl Acad Sci U S A. 2001]Nucleic Acids Res. 2003 Jul 15; 31(14):4157-61.
[Nucleic Acids Res. 2003]Genome Biol. 2006; 7(11):120.
[Genome Biol. 2006]Nat Biotechnol. 2004 Jan; 22(1):78-85.
[Nat Biotechnol. 2004]Proc Natl Acad Sci U S A. 2006 Jan 10; 103(2):311-6.
[Proc Natl Acad Sci U S A. 2006]J R Soc Interface. 2005 Dec 22; 2(5):419-30.
[J R Soc Interface. 2005]Nature. 2002 May 23; 417(6887):399-403.
[Nature. 2002]Nat Biotechnol. 2004 Jan; 22(1):98-103.
[Nat Biotechnol. 2004]Genome Biol. 2004; 5(12):R96.
[Genome Biol. 2004]Genome Res. 2005 Mar; 15(3):376-84.
[Genome Res. 2005]Cell. 2005 Sep 23; 122(6):957-68.
[Cell. 2005]Nature. 2005 Oct 20; 437(7062):1173-8.
[Nature. 2005]Nature. 2000 Feb 10; 403(6770):623-7.
[Nature. 2000]Nature. 2000 Feb 10; 403(6770):623-7.
[Nature. 2000]Proc Natl Acad Sci U S A. 2001 Apr 10; 98(8):4569-74.
[Proc Natl Acad Sci U S A. 2001]Nature. 2002 Jan 10; 415(6868):180-3.
[Nature. 2002]Nature. 2002 Jan 10; 415(6868):141-7.
[Nature. 2002]Science. 2004 Jan 23; 303(5657):540-3.
[Science. 2004]Proc Natl Acad Sci U S A. 2001 Apr 10; 98(8):4277-8.
[Proc Natl Acad Sci U S A. 2001]Nucleic Acids Res. 2003 Jul 15; 31(14):4157-61.
[Nucleic Acids Res. 2003]J Biol. 2006; 5(4):11.
[J Biol. 2006]BMC Biol. 2006 Nov 3; 4():39.
[BMC Biol. 2006]Science. 2004 Jan 23; 303(5657):540-3.
[Science. 2004]Science. 2003 Dec 5; 302(5651):1727-36.
[Science. 2003]Nat Biotechnol. 2004 Jan; 22(1):78-85.
[Nat Biotechnol. 2004]Genome Biol. 2006; 7(11):120.
[Genome Biol. 2006]Nucleic Acids Res. 2003 Jul 15; 31(14):4157-61.
[Nucleic Acids Res. 2003]BMC Biol. 2006 Nov 3; 4():39.
[BMC Biol. 2006]BMC Bioinformatics. 2005 May 16; 6():119.
[BMC Bioinformatics. 2005]Nature. 2001 Feb 15; 409(6822):860-921.
[Nature. 2001]Science. 2005 Sep 2; 309(5740):1559-63.
[Science. 2005]Science. 2003 Jun 13; 300(5626):1701-3.
[Science. 2003]