![]() | ![]() |
Formats:
|
||||||||||||||||||||||||
Copyright Kyu Kim, Marcotte. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Age-Dependent Evolution of the Yeast Protein Interaction Network Suggests a Limited Role of Gene Duplication and Divergence Center for Systems and Synthetic Biology, Institute for Cellular and Molecular Biology, University of Texas at Austin, Austin, Texas, United States of America Ruth Nussinov, Editor National Cancer Institute United States of America and Tel Aviv University, Israel * E-mail: marcotte/at/icmb.utexas.edu Conceived and designed the experiments: WKK EMM. Performed the experiments: WKK. Analyzed the data: WKK EMM. Contributed reagents/materials/analysis tools: WKK. Wrote the paper: WKK EMM. Received September 2, 2008; Accepted October 17, 2008. This article has been cited by other articles in PMC.Abstract Proteins interact in complex protein–protein interaction (PPI) networks whose topological properties—such as scale-free topology, hierarchical modularity, and dissortativity—have suggested models of network evolution. Currently preferred models invoke preferential attachment or gene duplication and divergence to produce networks whose topology matches that observed for real PPIs, thus supporting these as likely models for network evolution. Here, we show that the interaction density and homodimeric frequency are highly protein age–dependent in real PPI networks in a manner which does not agree with these canonical models. In light of these results, we propose an alternative stochastic model, which adds each protein sequentially to a growing network in a manner analogous to protein crystal growth (CG) in solution. The key ideas are (1) interaction probability increases with availability of unoccupied interaction surface, thus following an anti-preferential attachment rule, (2) as a network grows, highly connected sub-networks emerge into protein modules or complexes, and (3) once a new protein is committed to a module, further connections tend to be localized within that module. The CG model produces PPI networks consistent in both topology and age distributions with real PPI networks and is well supported by the spatial arrangement of protein complexes of known 3-D structure, suggesting a plausible physical mechanism for network evolution. Author Summary Proteins function together forming stable protein complexes or transient interactions in various cellular processes, such as gene regulation and signaling. Here, we address the basic question of how these networks of interacting proteins evolve. This is an important problem, as the structures of such networks underlie important features of biological systems, such as functional modularity, error-tolerance, and stability. It is not yet known how these network architectures originate or what driving forces underlie the observed network structure. Several models have been proposed over the past decade—in particular, a “rich get richer” model (preferential attachment) and a model based upon gene duplication and divergence—often based only on network topologies. Here, we show that real yeast protein interaction networks show a unique age distribution among interacting proteins, which rules out these canonical models. In light of these results, we developed a simple, alternative model based on well-established physical principles, analogous to the process of growing protein crystals in solution. The model better explains many features of real PPI networks, including the network topologies, their characteristic age distributions, and the spatial distribution of subunits of differing ages within protein complexes, suggesting a plausible physical mechanism of network evolution. Introduction Life is highly organized at all levels of molecules, cells, tissues, and organisms, and such relationships among biological entities are often represented as networks, with vertices representing e.g. genes or proteins, and edges representing e.g. physical protein interactions, transcriptional regulation, or metabolic reactions. The topology of biological networks shows many interesting characteristics, such as scale-free topology (power-law or broad degree distribution) and hierarchical modularity (reviewed in [1]). These properties are believed to be the basis of functional modularity, error-tolerance, and stability [2]–[5] characteristic of many biological networks. One important question is thus how these important network architectures originate, and what driving forces underlie the observed networks. It has not been clear whether network architecture results from the mosaic sum of each gene or protein's inherent properties, such as stickiness or interactive promiscuity [6],[7], or from a stochastic mechanism underlying network evolution, in which the trajectory of network evolution is conditioned on the previous state of the network [8]. This problem has been of wide interest because it raises fundamental questions about design principles of molecular networks and the role of natural selection in the evolution of network structure [9]. Initially, Barabási and Albert proposed a preferential attachment rule as a general mechanism to generate scale-free networks [8]. In this model, a newly introduced node is more likely to be attached to highly connected nodes, resulting in a power-law degree distribution. In a network of protein-protein interactions (PPI), gene duplication and divergence (DD) is most popularly thought of as the origin of the scale-free topology of protein interaction networks [10]–[15]. In the DD model, the degree of a node increases mainly by having duplicate genes as its neighbors. Therefore, the preferential attachment rule is achieved implicitly, with highly connected nodes having more chance to have duplicate genes as their neighbors [1]. The DD model is also shown to generate hierarchically modular networks under certain conditions [16]. Although the DD model generates scale-free and modular networks, it has drawbacks that must be noted if it is to be considered a main mechanism for PPI network evolution. Primarily, only a small fraction of duplicate genes effectively contribute to the overall network topology. The key feature of the DD model originates from the fact that duplicate genes share a certain number of interaction partners. However, the interaction patterns of duplicate genes diverge rapidly [17], and the vast majority of gene duplicates are shown to share no interaction partners [18]–[20]. Some duplicates, in fact, may have diverged so extensively that they can no longer be detected by sequence homology. These distant duplicates would share even fewer interaction partners, and thus they are essentially indistinguishable from non-duplicate pairs in terms of interaction patterns. To better understand the evolution of PPI networks, we analyzed a non-topological property—the age of each protein as estimated based upon the taxonomic distribution of its constituent domains [21],[22]—and observe that yeast PPI networks show a unique interaction density pattern between different protein age groups. The density pattern of the yeast PPI network was compared with those generated by canonical network evolution models—preferential attachment (the Barabási-Albert model), duplication-divergence (DD), and anti-preferential attachment (AP). Each model generates a unique interaction density pattern between the age groups; thus, the validity of the models could be effectively discriminated. Using this test, we observe that none of the canonical models are consistent with real yeast PPI networks. The age-dependent interaction density pattern nonetheless suggests growth by a stochastic process. We therefore propose an alternative model called the crystal growth (CG) model, which is based upon known physical and chemical principles and shows good agreement with real PPI networks in both topological and age properties as well as the 3-D subunit configurations of protein complexes. Results Interaction Density Patterns between Protein Age Groups First, we introduce the basic attachment rules of protein-protein interactions. The interaction densities, Dm,n, between two protein age groups (m,n) show unique patterns depending upon the attachment rule. Three basic rules are considered—random attachment (RA), preferential attachment (PA) by Barabási and Albert [8],[23], and anti-preferential attachment (AP). Here, we consider three protein age groups (G1, G2, and G3, from oldest to youngest), and assume a fixed number of new connections (ΔE) are made between a newly introduced node and the existing nodes as a network grows. In the RA model, a new node is randomly connected to existing nodes with equal probabilities. Initially, at time t = 1, the first age group, G1, makes only intra-group connections. Then a new group, G2, is introduced and connected randomly either to G1 (inter-group) or within G2 (intra-group). In the RA model, the expected interaction density, D, is the same between D1,2 and D2,2. Similarly, G3 connects to G1, G2, and within G3, showing the pattern of D1,3 = D2,3 = D3,3. More generally, the RA model shows a pattern of Dm,n = Dm+1,n (m<n) (Figure 1A
As a measure of age-dependency of interaction density, ΔD is defined as the average value of Dm+1,n - Dm,n (m<n) (see Methods). A positive ΔD indicates that protein interactions are more likely between similar age groups. The sign of ΔD effectively discriminates each model—it is positive in PA, negative in AP, and near zero in the RA model. Characterization of the Yeast PPI Network We collected two independent sets of yeast PPIs - literature curated (LC) and high-throughput (HTP) PPIs, using the method of Batada et al. [23],[24] (Dataset S1 and Dataset S2) and inspected both the network topology and the age-dependency of interaction density. The number of nodes, N (proteins) and edges, E (interactions) in the LC and HTP networks are NLC = 3268, ELC = 12058 and NHTP = 2488, EHTP = 6766 respectively. The union (LC+HTP) of the two networks has 3780 nodes and 16505 edges. As HTP and LC+HTP show highly similar characteristics (Figure S2) as well as the original set by Batada et al. [23],[24], we mainly discuss the LC data set as the yeast PPI network (PPIyeast) here. The recently compiled set (Y2H-union) by Vidal and colleagues [25] from large-scale yeast two-hybrid experiments showed the same trend (Figure S2).The PPIyeast recapitulates known topological features such as a scale-free degree distribution, hierarchical modularity, and degree-dissortative mixing property [8], [26]–[28], which were characterized by the various network property indices shown in the first column (PPI) in Figure 2
Surprisingly, the interaction density of PPIyeast is also highly age-dependent. Yeast proteins were assigned to one of the age groups ABE, AE/BE, E and F depending on the taxonomic distribution of constituent domains among archaea (A), bacteria (B), eukaryote (E) and fungi (F) (see Methods, Figure S1). We measured the interaction density between the age groups and observe a positive ΔD similar to AP model (the row IV in Figure 2 Simulation of Canonical Network Growth Models We next simulated PPI network evolution using the three canonical models—PA (preferential attachment), DD (duplication and divergence), and AP (anti-preferential attachment) and tested compatibility with PPIyeast in terms of both topology and age-dependency. In all three models, the network starts from a small number, N0 = 4 of seed nodes and a new node is added until the total number of nodes reaches N = 3,000, which is comparable to the PPIyeast (LC) with 3,268 nodes and 12,058 edges. In the PA and AP models, a fixed number of edges (ΔE = 4) are added for each new node, which makes the final network size similar to the PPIyeast. The link probability (P) is proportional to the degree in the PA model (P ~ k) and inversely proportional in the AP model (P ~ k−1). For the DD model, we employ one of the simplest models by Vázquez et al. [12]: One node (i) is duplicated randomly, the new node (i') is connected to all of the neighbors of i, and then the duplicates (i and i') are linked with a small probability p. For each neighbor (j) of the duplicates, one of the two links (i,j and i',j) is chosen randomly and deleted with the divergence probability q. Because this model may generate orphan nodes that are not connected to any other nodes, orphan nodes were removed in each duplication step.Surprisingly, none of the three models satisfied all of the characteristics of PPIyeast (the 2nd, 3rd and 4th columns in Figure 2 While additional variants of each model might be considered [13],[20],[32], the critical characteristics of each model are largely captured by these canonical models, e.g. the DD model has no mechanism to generate positive ΔD. The inconsistency of these models with the interaction age density of real PPI networks clearly suggest that none of these canonical models is sufficient in itself to qualify as a valid model for the evolution of the yeast PPI network. A Crystal Growth Model To better address both topological and age properties of real networks, we developed an alternative model for PPI network evolution called the crystal growth model (CG), in which we view the growth of a PPI network as analogous to incorporating new proteins into crystals grown in solution (Figure 3A
The procedure of the CG model is illustrated in Figure 3B = 4), and a new node makes a fixed number of connections (here, ΔE = 4) to existing nodes. For each new node added, network modules are redefined as local dense regions in the network. As modules emerge as a result of network growth and are not pre-defined artificially, the number of modules (M) is not fixed but may increase or decrease in each step. With a small probability Pnew, a new node becomes a new module by itself and makes connections ΔE times to other nodes in accordance with the AP rule. Otherwise, an existing module is selected randomly, and the new node is committed to the module by making connections exclusively within the selected module. The connection takes two steps, dubbed “anchoring and extension”. In the anchoring step, the new node connects to an anchor node in the module in accordance with the AP rule, and then, in the extension step, the new node further connects only to the neighbors of the anchor node in the module. Connections are created randomly to neighboring nodes until ΔE connections are made. The anchoring and extension steps are analogous to the node e in Figure 3AThe CG model introduces two parameters, how to define the network modules and how frequently a new module is created (Pnew). A network module is generally defined as a densely connected sub-network, and there are various ways to partition a network into modules. Most stringently, modules can be defined as complete subgraphs or cliques, and more loosely they can be defined as k-cores, triangularly connected components (TCC) and so on. We tested two different module definitions, one by Newman [33] and the other by TCC. We mainly discuss the results by the Newman definition, but results using TCC were highly similar (Figure S3). Also, Pnew was assigned as M−1 because the chance of creating a new module generally decreases with the number of existing modules (M). Setting a small, fixed value of Pnew also show a similar result (data not shown). Networks generated by the CG model show a remarkable similarity to real PPI networks for all tested network properties. A typical result of the CG model is shown in the 5th column in Figure 2 = 1,000 and N = 5,000 (data not shown).Comparison of the Network Properties between Network Growth Models and Yeast PPI Network The canonical models were shown to significantly deviate from the PPIyeast, but the CG model shows a good agreement not only qualitatively but also quantitatively (Figure 4
DD and PA show an inverse age-dependency of PPIyeast and much less modularity in terms of clustering coefficient and triangle density although they show scale-free degree distributions (Figure 4B and 4C Age-Dependency of Homodimeric Frequency in CG Model In the CG model, homodimers would be more frequent in older groups because there are simply fewer proteins with which to make connections in earlier stages. The age distribution of homodimeric interactions was exactly in the order of ABE>AE/BE>E>Fu among the 166 homodimeric yeast proteins collected from UniProt [34] and the literature (Figure 5
Sub-Networks and Spatial Arrangement of Complex Subunits Within the sub-networks of known complexes from MIPS, protein subunits tend to be either more likely to be connected among similar age groups in agreement with the general tendency of positive ΔD in the full yeast PPI networks (Figures S4A and S4B) or consist mostly of the same age group, reflecting the creation of a new protein module at a certain evolutionary lineage e.g. actin-associated proteins (Figure S4E). Other complexes form densely connected sub-networks, where age-dependency was not evident, e.g. RNA polymerase I and III (Figures S4C and S4D). We further validated the CG model by inspecting the 3-D subunit arrangement of protein complexes according to age. Obviously, a protein subunit of a stable complex interacts mostly with the subunits of its participating complex. When a subunit is in contact with multiple other subunits in a protein complex, it is most likely that the partner subunits are spatially close, often interacting among themselves as well. For transient interactions, the member proteins can interact with fewer spatial constraints but the interactions are much denser within each biological module, e.g. as for a MAP kinase signaling pathway or transcription initiation complex. Therefore, a protein tends to interact in a highly “localized” manner within the biological modules it belongs to. None of the canonical models has such a module-oriented mechanism as the CG model. In the CG model, older subunits of protein complexes would tend to be more centrally located than younger ones because each protein is attached in the order of its age. Therefore, it is more likely that older subunits are aggregated centrally and younger subunits are scattered at the periphery in a protein complex. To examine this trend among known protein complexes, we collected protein complexes from the Protein Databank (PDB) which consisted of at least 3 protein chains, with at least 2 age groups represented; these are stringent criteria that strongly limit the number of available complexes. After removing inappropriate complexes, such as non-protein structures, viral proteins, antibodies and small peptides, a non-redundant set of 12 multi-protein complexes was collected that met these criteria (detailed descriptions are in Methods). In general, older subunits tend to be aggregated centrally (red tone), while younger ones are separated peripherally (green and blue) (Figure 6 = 0.019, based on random permutations of chain arrangements within the asymmetric unit of each complex.
It is notable that the total degree of PPIyeast is underestimated relative to the actual degree due to homomeric interactions and subunit stoichiometry. For example, the APRIL-TACI complex (Figure 6A = 3 (two homomeric, one heteromeric) and kB = 1 (one heteromeric). In contrast, only one interaction (A–B) would be counted for each subunit in PPIyeast.Discussion The validity of network evolution models have been measured mainly by the resulting network topology, such as a power-law degree distribution, hierarchical modularity and dissortativity as observed in real PPI networks. Accordingly, the DD model has been thought of as the principal mechanism for PPI network evolution. Here, we dissect the history of PPI network evolution by inspecting several protein age-dependent patterns such as interaction density, homodimeric frequency, and the 3-D spatial arrangement of subunits within multiprotein complexes. The age-dependencies are shown to be very effective in discriminating the validity of different models as summarized in Table 1. The tested aspects of age-dependency were independent of topologies as well as of each other, and are thus highly useful as orthogonal criteria for valid models. Importantly, the age-dependent interaction patterns provided insights on PPI evolution, suggesting evidence against the DD model as the dominant mode of PPI network evolution, instead supporting an alternative model, the CG model.
In the CG model, we view the PPI network as sparse and dynamic protein crystals per se. The CG model mimics the process of growing protein crystals in solution by sequentially adding each protein. Despite the huge differences in time scale and heterogeneous composition, PPI network evolution likely obeys similar constraints on growing protein crystals. In the CG model, a protein complex or a tightly linked module is analogous to individual crystals, and the number and membership of modules are not pre-defined but rather emerge naturally in each growing step. Crystals grow around multiple nuclei just as protein networks consist of multiple modules/complexes. New modules are generated as the genome size increases and novel function evolves in higher organisms, in a manner similar to how a new crystal forms occasionally through new nucleation events. The CG model exploits two keys ideas, the first being that the chance of new connection is proportional to the availability of free surface, which is a feature readily recognized by a new protein molecule; this results in an anti-preferential attachment (AP) rule. Although the same surface of a protein can be involved in multiple interactions with different partners through spatial and temporal differentiation, such a factor uniformly increases the capacity of interactions in any protein. Therefore, the connection probability is still positively correlated with the available surface area. These results agree with those of Kim et al. [39], which show that the evolutionary rate is anti-correlated with available surface area. There, multi-interface hubs were nearly four times more frequent than single-interface hubs, reflecting the dominant connection mode of the AP rule. The second key idea is that once an initial connection is made, the subsequent connections are localized to the neighbors of the initial partner within the same module. This localized connection enforces high modularity, similar to that observed in real PPI networks. At the basis of the crystal growth model is the notion that new interactions form preferentially within existing physical complexes (enforcing modularity), and thus are limited by available protein surface area (the AP rule). Thus modularity & the AP rule both arise due to simple physical constraints of which proteins are most accessible to each other. Recently, Levy and colleagues has shown that the successive steps of homo-oligomeric assembly mimics the evolutionary pathway [38]. The CG model expands this idea, where crystal growth reproduces the evolution of the entire PPI network. Given that the CG model follows an AP rule, how does it generate scale-freeness or “the rich get richer” connectivity? In the CG model, the network grows by anchoring and extension, where a node increases its degree either by becoming an anchor node (anchoring) or by being the neighbor of the anchor node (extension). Therefore, the highly connected nodes have greater chances to increase their degree within each module because they have more opportunities to have anchors as their neighbors. Therefore, the CG model implicitly implements the preferential attachment (PA) rule within each module in a manner similar to the DD model, where the nodes increase their degree by having duplicating genes as their neighbors. Our result suggests that the CG model is a more plausible mechanism for PPI network evolution than the DD model. First, all the age-dependent aspects tested agree well with the CG model but disagree with the DD model. Second, the CG model is more comprehensive than the DD model in that the CG model can accommodate both gene duplication and horizontal gene transfer as the origins of new nodes (genes). Practically, the DD model may be applicable only to ~20% of the yeast proteome having identifiable duplicates [40]. The CG model also embodies the rapid divergence of gene duplicates [17] by the AP rule, which avoids competition for the same interface on common partners and connects to new partners with less occupied surfaces. Finally, the CG model is more robust than the DD model. The DD model shows a highly variable degree distribution depending upon parameters and network sizes [14],[41]. In contrast, the CG model shows stable characteristics regardless of network size or different module definition methods. Taken together, these strongly suggest that the DD model is unlikely to be the principal, and strongly unlikely to be the sole, mechanism of PPI network evolution. The age-dependency of interaction density also sheds light on a more fundamental question regarding the mechanism of PPI network evolution. It has been hypothesized that inherent features of proteins, such as stickiness and hydrophobicity are dominant factors in shaping the global network structure [6]. However, the observed age-dependency is inconsistent with such a hypothesis and suggests that a stochastic process played a major role. For example, the yeast PPI network shows the patterns of both DABE,AE/BE>DABE,E and DAE/BE,Fu<DE,Fu (the row IV in Figure 2 Power-law distributions have been commonly observed in various types of networks, such as the Internet, social networks, and biological networks. However, the growth of a PPI network poses unique constraints compared to other types of networks. For example, in an airline or railroad network, each new connection is made by considering the context of global network topology (e.g., to minimize average path length), which seems intuitively unlikely to be the case in PPI networks. The CG model follows two simple constraints of available free surface and localized connection, which are physically plausible and depend only on local context but not global topology. With these minimal assumptions analogous to growing protein crystals, the CG model recapitulates remarkably well the age-dependencies as well as the network topologies of the yeast PPI networks. Methods Yeast Protein Interaction Data Two independent sets of yeast protein-protein interaction data were collected using a method essentially identical to that described by Batada et al. [23],[24], only differing in that the HTP set was collected from the original publications instead of from BioGrid [42]. We compiled the HTP set from Uetz et al. [43], Ito et al. [44], the merged set of Gavin et al. [45],[46], Ho et al. [47], and Krogan et al. [48], and then filtered out the interactions supported by only a single experiment. Repeated and reciprocal assays were considered as independent experiments even if they were performed in the same publication. The LC data set was collected from the latest release of BioGrid, excluding high-throughput data. Ribosomal proteins were removed from both LC and HTP data sets. All protein-RNA interactions and interactions supported only by co-localization or co-fractionation were removed. We further removed interactions supported only by Ptacek et al. [49], Grandi [50], Collins et al. [51], or Fields et al. [52]. Yeast Protein Age Groups Pfam domains were assigned for yeast proteins using BioMart (http://www.biomart.org). The taxonomic distributions of Pfam domains were obtained for archaea (A), bacteria (B), eukaryotes (E), and fungi (F) (http://www.sanger.ac.uk/Software/Pfam). According to these distributions, each Pfam domain was assigned to one of the age groups ABE, AE/BE, E, and F. The group ABE includes the oldest proteins common to all three kingdoms, while group F is the youngest, being specific to fungi. As yeast is a eukaryote, groups A, B, and AB do not occur. A protein's age group was assigned as the youngest age of its constituent Pfam domains—e.g., E for a protein with domains from ABE and E (Dataset S3, Figure S1). Interaction Density and ΔD Interaction density Dm,n measures the normalized interaction density between two age groups m, n (m<n). ΔD measures the interaction preference of a new node by the age differences. A positive value of ΔD indicates that a new node makes connections more frequently with close age groups than with distant ones. First, the normalized interaction density Dm,n between two age groups m,n (m<n) is calculated as
Measure of Modularity The modularity of a network is measured by the modularity index Q by Newman [29] after its modules are defined using the method described in [33]:
= the total number of modules, L = the number of total edges in the network, ls = the number of edges within the module s, and ds = the sum of the degrees of the module s. The modularity index Q measures the difference between the intra-module interaction density and the expected interaction density at random for a given partition, where Q≈0 for a random network and Q = 1 for a completely modular network [53].Protein 3-D Complexes Data The list of PDB entries and 3-D coordinates were obtained from PQS (Protein Quaternary Structure Server, ftp://ftp.ebi.ac.uk/pub/databases/msd/pqs). First, we took the PDB entries having three or more protein chains. The PDB entries annotated as crystal packing interfaces by PQS or from non X-ray crystallographic method were excluded. The protein chain clusters at 30% sequence identity cut-off were downloaded from PDB (Protein Data Bank, ftp://ftp.wwpdb.org). PDB entries consisting of the same set of NR30 clusters were grouped together regardless of the number of chains and one representative PDB entry was selected in each group as NR30 entries. For NR30 entries, the age group of each PDB chain was assigned using BLAST against NR90 set of archaea, bacteria and eukaryote sequences from UNIPROT (ftp://ftp.uniprot.org/pub/databases/uniprot) using >30% identity and >30 alignment length as criteria. We took only the PDB entries consisting of two or more protein age groups and further applied a number of filters manually, excluding the entries with DNAs, RNAs, viral proteins, small peptides (<30 amino acids) and immunoproteins such as antibodies and MHCs with antigens. Where available, ambiguous quaternary structures were removed by comparing the data from PQS, PDB biological units and 3D complex databases [54]. Dataset S3 The age group assignment of yeast genes (0.08 MB TDS) Click here for additional data file.(79K, tds) Dataset S4 The list of homodimeric proteins and their age group assignment (0.01 MB TDS) Click here for additional data file.(2.1K, tds) Figure S1 The protein ratio of different age groups in yeast PPI networks. LC: literature-curated, HTP: high-throughput, LC+HTP: the union of LC and HTP. (0.08 MB PDF) Click here for additional data file.(77K, pdf) Figure S2 The network properties of the HTP, LC+HTP, and Y2H-union dataset. The plots in each row, I-IV, indicate (I) The degree distribution P(k), (II) the clustering coefficient C(k), (III) the average degree of nearest neighbors <knn>(k), and (IV) the interaction density pattern (ΔD) between protein age groups. HTP, LC+HTP, and Y2H-union set show similar characteristics as LC dataset. (0.29 MB PDF) Click here for additional data file.(285K, pdf) Figure S3 The network properties by the CG model, where the network modules were defined by TCC (triangularly connected components) instead of the Newman's method. The network structure is still similar to the yeast PPI networks, showing scale-free, hierarchical modular, degree-dissortative characteristics and an interaction density pattern of DD>0. (A) The degree distribution P(k), (B) the clustering coefficient C(k), (C) the average degree of nearest neighbors <knn>(k), (D) the interaction density pattern between protein age groups. (0.09 MB PDF) Click here for additional data file.(89K, pdf) Figure S4 Age-dependent interaction patterns of several MIPS complexes in the LC+HTP set. In mRNA splicing (A) and replication (B) complexes, the subunits of the same age group are more likely to be connected. In RNA polymerase I & III (C and D), most subunits are densely connected to each other, therefore age-dependency is not evident. In the case of actin-associated proteins, most subunits are of the same age group (E), reflecting a relatively recently emerged module. (0.52 MB PDF) Click here for additional data file.(504K, pdf) Table S1 The network characteristics of the yeast PPI data. (0.06 MB PDF) Click here for additional data file.(59K, pdf) Table S2 The network characteristics of the network growth models (0.13 MB PDF) Click here for additional data file.(128K, pdf) Footnotes The authors have declared that no competing interests exist. This work was supported by grants from the N.S.F. (IIS-0325116), N.I.H. (GM06779-01), Welch (F1515), and a Packard Fellowship (EMM). References 1. Barabasi AL, Oltvai ZN. Network biology: understanding the cell's functional organization. Nat Rev Genet. 2004;5:101–113. [PubMed] 2. Ravasz E, Somera AL, Mongru DA, Oltvai ZN, Barabasi AL. Hierarchical organization of modularity in metabolic networks. Science. 2002;297:1551–1555. [PubMed] 3. Albert R, Jeong H, Barabasi AL. Error and attack tolerance of complex networks. Nature. 2000;406:378–382. [PubMed] 4. Valente AX, Cusick ME. Yeast Protein Interactome topology provides framework for coordinated-functionality. Nucleic Acids Res. 2006;34:2812–2819. [PubMed] 5. Klemm K, Bornholdt S. Topology of biological networks and reliability of information processing. Proc Natl Acad Sci U S A. 2005;102:18414–18419. [PubMed] 6. Deeds EJ, Ashenberg O, Shakhnovich EI. A simple physical model for scaling in protein-protein interaction networks. Proc Natl Acad Sci U S A. 2006;103:311–316. [PubMed] 7. Rachlin J, Cohen DD, Cantor C, Kasif S. Biological context networks: a mosaic view of the interactome. Mol Syst Biol. 2006;2:66. [PubMed] 8. Barabasi AL, Albert R. Emergence of scaling in random networks. Science. 1999;286:509–512. [PubMed] 9. Wagner A. Does selection mold molecular networks? Sci STKE. 2003;2003:PE41. [PubMed] 10. Rzhetsky A, Gomez SM. Birth of scale-free molecular networks and the number of distinct DNA and protein domains per genome. Bioinformatics. 2001;17:988–996. [PubMed] 11. Pastor-Satorras R, Smith E, Sole RV. Evolving protein interaction networks through gene duplication. J Theor Biol. 2003;222:199–210. [PubMed] 12. Vázquez A, Flammini A, Maritan A, Vespignani A. Modeling of Protein Interaction Networks. ComPlexUs. 2003;1:38–44. 13. Middendorf M, Ziv E, Wiggins CH. Inferring network mechanisms: the Drosophila melanogaster protein interaction network. Proc Natl Acad Sci U S A. 2005;102:3192–3197. [PubMed] 14. Ispolatov I, Krapivsky PL, Yuryev A. Duplication-divergence model of protein interaction network. Phys Rev E Stat Nonlin Soft Matter Phys. 2005;71:061911. [PubMed] 15. Evlampiev K, Isambert H. Conservation and topology of protein interaction networks under duplication-divergence evolution. Proc Natl Acad Sci U S A. 2008;105:9863–9868. [PubMed] 16. Kashtan N, Alon U. Spontaneous evolution of modularity and network motifs. Proc Natl Acad Sci U S A. 2005;102:13773–13778. [PubMed] 17. Wagner A. The yeast protein interaction network evolves rapidly and contains few redundant duplicate genes. Mol Biol Evol. 2001;18:1283–1292. [PubMed] 18. Makino T, Suzuki Y, Gojobori T. Differential evolutionary rates of duplicated genes in protein interaction network. Gene. 2006;385:57–63. [PubMed] 19. Stelzl U, Worm U, Lalowski M, Haenig C, Brembeck FH, et al. A human protein-protein interaction network: a resource for annotating the proteome. Cell. 2005;122:957–968. [PubMed] 20. Berg J, Lassig M, Wagner A. Structure and evolution of protein interaction networks: a statistical model for link dynamics and gene duplications. BMC Evol Biol. 2004;4:51. [PubMed] 21. Qin H, Lu HH, Wu WB, Li WH. Evolution of the yeast protein interaction network. Proc Natl Acad Sci U S A. 2003;100:12820–12824. [PubMed] 22. Kunin V, Pereira-Leal JB, Ouzounis CA. Functional evolution of the yeast protein interaction network. Mol Biol Evol. 2004;21:1171–1176. [PubMed] 23. Batada NN, Reguly T, Breitkreutz A, Boucher L, Breitkreutz BJ, et al. Stratus not altocumulus: a new view of the yeast protein interaction network. PLoS Biol. 2006;4:e317. doi:10.1371/journal.pbio.0040317. [PubMed] 24. Batada NN, Reguly T, Breitkreutz A, Boucher L, Breitkreutz BJ, et al. Still stratus not altocumulus: further evidence against the date/party hub distinction. PLoS Biol. 2007;5:e154. doi:10.1371/journal.pbio.0020154. [PubMed] 25. Yu H, Braun P, Yildirim MA, Lemmens I, Venkatesan K, et al. High-Quality Binary Protein Interaction Map of the Yeast Interactome Network. Science. 2008 August 21, 2008, 1158684. 26. Spirin V, Mirny LA. Protein complexes and functional modules in molecular networks. Proc Natl Acad Sci U S A. 2003;100:12123–12128. [PubMed] 27. Ravasz E, Barabasi AL. Hierarchical organization in complex networks. Phys Rev E Stat Nonlin Soft Matter Phys. 2003;67:026112. [PubMed] 28. Camon E, Magrane M, Barrell D, Lee V, Dimmer E, et al. The Gene Ontology Annotation (GOA) Database: sharing knowledge in Uniprot with Gene Ontology. Nucleic Acids Res. 2004;32:D262–D266. [PubMed] 29. Newman ME, Girvan M. Finding and evaluating community structure in networks. Phys Rev E Stat Nonlin Soft Matter Phys. 2004;69:026113. [PubMed] 30. Maslov S, Sneppen K. Specificity and stability in topology of protein networks. Science. 2002;296:910–913. [PubMed] 31. Kellis M, Birren BW, Lander ES. Proof and evolutionary analysis of ancient genome duplication in the yeast Saccharomyces cerevisiae. Nature. 2004;428:617–624. [PubMed] 32. He X, Zhang J. Rapid subfunctionalization accompanied by prolonged and substantial neofunctionalization in duplicate gene evolution. Genetics. 2005;169:1157–1164. [PubMed] 33. Newman ME. Fast algorithm for detecting community structure in networks. Phys Rev E Stat Nonlin Soft Matter Phys. 2004;69:066133. [PubMed] 34. The Universal Protein Resource (UniProt). Nucleic Acids Res. 2007;35:D193–D197. [PubMed] 35. Kim WK, Henschel A, Winter C, Schroeder M. The many faces of protein-protein interactions: A compendium of interface geometry. PLoS Comput Biol. 2006;2:e124. doi/10.1371/journal.pcbi.0030042. [PubMed] 36. Pereira-Leal JB, Levy ED, Kamp C, Teichmann SA. Evolution of protein complexes by duplication of homomeric interactions. Genome Biol. 2007;8:R51. [PubMed] 37. Ispolatov I, Yuryev A, Mazo I, Maslov S. Binding properties and evolution of homodimers in protein-protein interaction networks. Nucleic Acids Res. 2005;33:3629–3635. [PubMed] 38. Levy ED, Boeri Erba E, Robinson CV, Teichmann SA. Assembly reflects evolution of protein complexes. Nature. 2008;453:1262–1265. [PubMed] 39. Kim PM, Lu LJ, Xia Y, Gerstein MB. Relating three-dimensional structures to protein networks provides evolutionary insights. Science. 2006;314:1938–1941. [PubMed] 40. Byrne KP, Wolfe KH. The Yeast Gene Order Browser: combining curated homology and syntenic context reveals gene fate in polyploid species. Genome Res. 2005;15:1456–1461. [PubMed] 41. Kim J, Krapivsky PL, Kahng B, Redner S. Infinite-order percolation and giant fluctuations in a protein interaction network. Phys Rev E Stat Nonlin Soft Matter Phys. 2002;66:055101. [PubMed] 42. Stark C, Breitkreutz BJ, Reguly T, Boucher L, Breitkreutz A, et al. BioGRID: a general repository for interaction datasets. Nucleic Acids Res. 2006;34:D535–D539. [PubMed] 43. Uetz P, Giot L, Cagney G, Mansfield TA, Judson RS, et al. A comprehensive analysis of protein-protein interactions in Saccharomyces cerevisiae. Nature. 2000;403:623–627. [PubMed] 44. Ito T, Chiba T, Ozawa R, Yoshida M, Hattori M, et al. A comprehensive two-hybrid analysis to explore the yeast protein interactome. Proc Natl Acad Sci U S A. 2001;98:4569–4574. [PubMed] 45. Gavin AC, Bosche M, Krause R, Grandi P, Marzioch M, et al. Functional organization of the yeast proteome by systematic analysis of protein complexes. Nature. 2002;415:141–147. [PubMed] 46. Gavin AC, Aloy P, Grandi P, Krause R, Boesche M, et al. Proteome survey reveals modularity of the yeast cell machinery. Nature. 2006;440:631–636. [PubMed] 47. Ho Y, Gruhler A, Heilbut A, Bader GD, Moore L, et al. Systematic identification of protein complexes in Saccharomyces cerevisiae by mass spectrometry. Nature. 2002;415:180–183. [PubMed] 48. Krogan NJ, Cagney G, Yu H, Zhong G, Guo X, et al. Global landscape of protein complexes in the yeast Saccharomyces cerevisiae. Nature. 2006;440:637–643. [PubMed] 49. Ptacek J, Devgan G, Michaud G, Zhu H, Zhu X, et al. Global analysis of protein phosphorylation in yeast. Nature. 2005;438:679–684. [PubMed] 50. Grandi P, Rybin V, Bassler J, Petfalski E, Strauss D, et al. 90S pre-ribosomes include the 35S pre-rRNA, the U3 snoRNP, and 40S subunit processing factors but predominantly lack 60S synthesis factors. Mol Cell. 2002;10:105–115. [PubMed] 51. Collins SR, Kemmeren P, Zhao XC, Greenblatt JF, Spencer F, et al. Toward a comprehensive atlas of the physical interactome of Saccharomyces cerevisiae. Mol Cell Proteomics. 2007;6:439–450. [PubMed] 52. Miller JP, Lo RS, Ben-Hur A, Desmarais C, Stagljar I, et al. Large-scale identification of yeast integral membrane protein interactions. Proc Natl Acad Sci U S A. 2005;102:12123–12128. [PubMed] 53. Guimera R, Nunes Amaral LA. Functional cartography of complex metabolic networks. Nature. 2005;433:895–900. [PubMed] 54. Levy ED, Pereira-Leal JB, Chothia C, Teichmann SA. 3D complex: a structural classification of protein complexes. PLoS Comput Biol. 2006;2:e155. doi:10.1371/journal.pcbi.0020155. [PubMed] |
PubMed related articles
Your browsing activity is empty. Activity recording is turned off. |
|||||||||||||||||||||||
Nat Rev Genet. 2004 Feb; 5(2):101-13.
[Nat Rev Genet. 2004]Science. 2002 Aug 30; 297(5586):1551-5.
[Science. 2002]Proc Natl Acad Sci U S A. 2005 Dec 20; 102(51):18414-9.
[Proc Natl Acad Sci U S A. 2005]Proc Natl Acad Sci U S A. 2006 Jan 10; 103(2):311-6.
[Proc Natl Acad Sci U S A. 2006]Mol Syst Biol. 2006; 2():66.
[Mol Syst Biol. 2006]Science. 1999 Oct 15; 286(5439):509-12.
[Science. 1999]Sci STKE. 2003 Sep 30; 2003(202):PE41.
[Sci STKE. 2003]Science. 1999 Oct 15; 286(5439):509-12.
[Science. 1999]Bioinformatics. 2001 Oct; 17(10):988-96.
[Bioinformatics. 2001]Proc Natl Acad Sci U S A. 2008 Jul 22; 105(29):9863-8.
[Proc Natl Acad Sci U S A. 2008]Nat Rev Genet. 2004 Feb; 5(2):101-13.
[Nat Rev Genet. 2004]Proc Natl Acad Sci U S A. 2005 Sep 27; 102(39):13773-8.
[Proc Natl Acad Sci U S A. 2005]Mol Biol Evol. 2001 Jul; 18(7):1283-92.
[Mol Biol Evol. 2001]Gene. 2006 Dec 30; 385():57-63.
[Gene. 2006]BMC Evol Biol. 2004 Nov 27; 4(1):51.
[BMC Evol Biol. 2004]Proc Natl Acad Sci U S A. 2003 Oct 28; 100(22):12820-4.
[Proc Natl Acad Sci U S A. 2003]Mol Biol Evol. 2004 Jul; 21(7):1171-6.
[Mol Biol Evol. 2004]Science. 1999 Oct 15; 286(5439):509-12.
[Science. 1999]PLoS Biol. 2006 Oct; 4(10):e317.
[PLoS Biol. 2006]PLoS Biol. 2006 Oct; 4(10):e317.
[PLoS Biol. 2006]PLoS Biol. 2007 Jun; 5(6):e154.
[PLoS Biol. 2007]Science. 1999 Oct 15; 286(5439):509-12.
[Science. 1999]Proc Natl Acad Sci U S A. 2003 Oct 14; 100(21):12123-8.
[Proc Natl Acad Sci U S A. 2003]Nucleic Acids Res. 2004 Jan 1; 32(Database issue):D262-6.
[Nucleic Acids Res. 2004]Phys Rev E Stat Nonlin Soft Matter Phys. 2004 Feb; 69(2 Pt 2):026113.
[Phys Rev E Stat Nonlin Soft Matter Phys. 2004]Phys Rev E Stat Nonlin Soft Matter Phys. 2003 Feb; 67(2 Pt 2):026112.
[Phys Rev E Stat Nonlin Soft Matter Phys. 2003]Phys Rev E Stat Nonlin Soft Matter Phys. 2005 Jun; 71(6 Pt 1):061911.
[Phys Rev E Stat Nonlin Soft Matter Phys. 2005]J Theor Biol. 2003 May 21; 222(2):199-210.
[J Theor Biol. 2003]Nature. 2004 Apr 8; 428(6983):617-24.
[Nature. 2004]Proc Natl Acad Sci U S A. 2005 Mar 1; 102(9):3192-7.
[Proc Natl Acad Sci U S A. 2005]BMC Evol Biol. 2004 Nov 27; 4(1):51.
[BMC Evol Biol. 2004]Genetics. 2005 Feb; 169(2):1157-64.
[Genetics. 2005]Phys Rev E Stat Nonlin Soft Matter Phys. 2004 Jun; 69(6 Pt 2):066133.
[Phys Rev E Stat Nonlin Soft Matter Phys. 2004]Nucleic Acids Res. 2007 Jan; 35(Database issue):D193-7.
[Nucleic Acids Res. 2007]PLoS Comput Biol. 2006 Sep 29; 2(9):e124.
[PLoS Comput Biol. 2006]Genome Biol. 2007; 8(4):R51.
[Genome Biol. 2007]Nucleic Acids Res. 2005; 33(11):3629-35.
[Nucleic Acids Res. 2005]Nature. 2008 Jun 26; 453(7199):1262-5.
[Nature. 2008]Science. 2006 Dec 22; 314(5807):1938-41.
[Science. 2006]Nature. 2008 Jun 26; 453(7199):1262-5.
[Nature. 2008]Genome Res. 2005 Oct; 15(10):1456-61.
[Genome Res. 2005]Mol Biol Evol. 2001 Jul; 18(7):1283-92.
[Mol Biol Evol. 2001]Phys Rev E Stat Nonlin Soft Matter Phys. 2005 Jun; 71(6 Pt 1):061911.
[Phys Rev E Stat Nonlin Soft Matter Phys. 2005]Phys Rev E Stat Nonlin Soft Matter Phys. 2002 Nov; 66(5 Pt 2):055101.
[Phys Rev E Stat Nonlin Soft Matter Phys. 2002]Proc Natl Acad Sci U S A. 2006 Jan 10; 103(2):311-6.
[Proc Natl Acad Sci U S A. 2006]PLoS Biol. 2006 Oct; 4(10):e317.
[PLoS Biol. 2006]PLoS Biol. 2007 Jun; 5(6):e154.
[PLoS Biol. 2007]Nucleic Acids Res. 2006 Jan 1; 34(Database issue):D535-9.
[Nucleic Acids Res. 2006]Nature. 2000 Feb 10; 403(6770):623-7.
[Nature. 2000]Proc Natl Acad Sci U S A. 2001 Apr 10; 98(8):4569-74.
[Proc Natl Acad Sci U S A. 2001]Phys Rev E Stat Nonlin Soft Matter Phys. 2004 Feb; 69(2 Pt 2):026113.
[Phys Rev E Stat Nonlin Soft Matter Phys. 2004]Phys Rev E Stat Nonlin Soft Matter Phys. 2004 Jun; 69(6 Pt 2):066133.
[Phys Rev E Stat Nonlin Soft Matter Phys. 2004]Nature. 2005 Feb 24; 433(7028):895-900.
[Nature. 2005]PLoS Comput Biol. 2006 Nov 17; 2(11):e155.
[PLoS Comput Biol. 2006]