Logo of pnasPNASInfo for AuthorsSubscriptionsAboutThis Article
Proc Natl Acad Sci U S A. 2008 Jan 22; 105(3): 950–954.
Published online 2008 Jan 16. doi:  10.1073/pnas.0707293105
PMCID: PMC2242688

The evolutionary dynamics of the Saccharomyces cerevisiae protein interaction network after duplication


Gene duplication is an important mechanism in the evolution of protein interaction networks. Duplications are followed by the gain and loss of interactions, rewiring the network at some unknown rate. Because rewiring is likely to change the distribution of network motifs within the duplicated interaction set, it should be possible to study network rewiring by tracking the evolution of these motifs. We have developed a mathematical framework that, together with duplication data from comparative genomic and proteomic studies, allows us to infer the connectivity of the preduplication network and the changes in connectivity over time. We focused on the whole-genome duplication (WGD) event in Saccharomyces cerevisiae. The model allowed us to predict the frequency of intergene interaction before WGD and the post duplication probabilities of interaction gain and loss. We find that the predicted frequency of self-interactions in the preduplication network is significantly higher than that observed in today's network. This could suggest a structural difference between the modern and ancestral networks, preferential addition or retention of interactions between ohnologs, or selective pressure to preserve duplicates of self-interacting proteins.

Keywords: gene duplication, network motifs, self-interacting proteins, whole-genome duplication

Complex biological networks result from the evolutionary growth of simpler networks with fewer components. Gene duplication is thought to be a key mechanism by which networks evolve and new components are added (16, 43). These duplication events can act on a single gene, a chromosomal segment, or even a whole genome (1, 711). After duplication, the duplicate genes may assume one of several fates, including differentiation of sequence and function, or loss of one of the duplicates (1217, 44). These outcomes are thought to be affected by genetic factors including redundancy, modularization, and expression dosage (9, 12, 15, 1822, 45).

Little is known about the rules that govern the modification of gene interactions after a duplication event or the effects of gene interaction on the fate of duplicate genes. Here, we report a mathematical framework for inferring the preduplication connectivity properties of a network and for describing its postduplication dynamics. Our method decomposes a protein interaction network into a vector of network motifs and tracks the evolution of this vector over time. We apply our methodology to the protein interaction network of Saccharomyces cerevisiae (2329), which has undergone a whole-genome duplication (WGD) event, resulting in hundreds of coordinately duplicated gene pairs (ohnologs) (8, 9, 11).

Results and Discussion

Network motifs are small subgraphs, or interaction patterns, that occur in networks more frequently than would be expected by chance (30). Motifs have been a valuable tool in identifying functional structure in many biological networks including in transcriptional, neural, and developmental networks (30, 31). We applied the concept of network motifs to WGD genes in S. cerevisiae and analyzed network motifs composed of pairs of ohnologs (namely, motifs of interactions within four proteins, Fig. 1A). There are six possible interactions between any four proteins, hence 64 possible motifs (26). This number is reduced to 19 different motif classes after accounting for the symmetry between the motif's ohnolog pairs and the symmetry of the genes within each ohnolog pair [supporting information (SI) Table 3].

Fig. 1.
Whole-genome duplication (WGD) produces network motifs between ohnolog pairs. (A) The paths genes take through time after a WGD. In most cases only one of the duplicated genes is retained (light gray). Surviving gene duplicate pairs are present as ohnologs ...

The proteins we considered for our motif analysis are the 450 WGD ohnolog pairs, as listed in Kellis et al. (8). Interactions between these proteins are listed in the Database of Interacting Proteins (DIP) (2329). From these data we determined the modern distribution (mmodern) of our 19 motif classes (Table 1). We observe a rich variability in motif prevalences. Even for motifs with the same number of interactions, we observed that frequencies vary across several orders of magnitude, indicating that motif frequencies reflect evolutionary processes rather than stochastic effects. We then asked how much of the motif distribution observed today could be explained by a neutral model accounting for the evolutionary dynamics of gene duplication after the WGD event.

Table 1.
Motif distribution in the modern protein interaction network

We developed a model describing protein connectivity within the subnetwork of surviving ohnologs (Fig. 1A) (5, 36). The model consists of two steps: duplication and divergence (Fig. 1B). The duplication step assumes that each protein is duplicated along with all its interactions. Because the two daughter proteins are initially identical to each other, the resulting interaction sets are identical. Accordingly, if a protein was self-interacting, each of its duplicates will be self-interacting, and an interaction will exist between the duplicates. This duplication process can generate only 6 different motifs of the possible 19 (Fig. 2A). We term these initial patterns “zero-order motifs,” and represent their distribution by a vector, m0. The frequencies of these zero-order motifs are governed by Psi and Pi, defined as the probabilities of protein self-interaction and of interaction between two different proteins in the preduplication network, respectively (Fig. 2A).

Fig. 2.
Ohnolog motif frequencies provide a method for estimating ancestral connectivity and rewiring parameters. (A) Immediately after duplication, ohnolog motifs can be one of six zero-order motifs with probability vector m0 (row vector shown as its transpose). ...

The second step in the model encompasses the evolutionary dynamics after duplication (1). Mutations leading to the addition or deletion of an interaction are assumed to occur with probabilities P+ and P, respectively. We define these probabilities as describing the overall period from the WGD event until today, accounting for the possibility of multiple rounds of addition and deletion.** We assume that rewiring events are independent, so that the probability of adding or removing multiple interactions is described by the product of the individual probabilities. This rewiring dynamic is described mathematically by a transition matrix (T, Fig. 2B) whose elements are the probabilities of evolution from the initial, six-element condition vector, m0, to an observed, 19-element vector, m0T. For example, the probability of a motif in class An external file that holds a picture, illustration, etc.
Object name is zpq00108889100g1.jpg becoming a motif of class An external file that holds a picture, illustration, etc.
Object name is zpq00108889100g2.jpg is P(1 − P+)5—the probability of losing the one interaction multiplied by the probability of not gaining an interaction at any of the five open positions. The final outcome of duplication and divergence should yield the motif distribution observed today, mmodern. We obtain a system of 19 equations, one for each motif class, with four variables: Pi, Psi, P+, and P:

equation image

The transition matrix elements are functions of P+ and P,and the initial condition zero-order motif vector m0 is a function of the preduplication parameters Pi and Psi. Because these four parameters are overdetermined by the 19 equations of Eq. 1, the existence of a solution is not mathematically guaranteed. We solved the equations for the best-fit values of Pi, Psi, P+, and P (Methods and Table 2). Fig. 3A shows that the observed number of motifs is in good agreement with the predictions of the model given the best-fit parameters obtained. This indicates that our simplified model is able to capture much of the complexity of the preduplication network and its rewiring dynamics. Our model is less predictive for some of the motifs, in particular some low-frequency ones (see SI Text for further discussion on potential reasons for these outliers). As shown in Table 2, postduplication rewiring of the network involved a high probability of interaction loss, whereas the likelihood of gaining an interaction was small. This result is consistent with previous work (5, 38).

Table 2.
Best-fit values of preduplication network connectivity and postduplication dynamics inferred from the proteomic network motif distribution of Saccharomyces cerevisiae
Fig. 3.
The modern motif distribution closely resembles the expected distribution. (A) We solved our system of 19 equations in 4 unknowns to compute the best-fit network. The expected number of motifs given the best-fit parameters Pi, Psi, P+, and P ...

We also observe an enrichment of interactions between the ohnologs themselves. Based on the modern frequency of protein interactions (0.13%), we would expect <1 ohnolog pair to interact. We observe 44 interactions of this type (binomial P ≪ 10−10)—nearly 10% of our ohnologs (see, for example, refs. 18, 32, 33, and 37). This phenomenon translates itself in the context of our model to a high probability of self-interaction in the preduplication network (Psi = 0.25). This frequency of self-interaction is nearly fivefold higher than observed in the modern value (0.056,††Fig. 3B).

A simple explanation for this phenomenon is that the ancestral network contained more self-interacting proteins than exist in the modern network and that the ohnolog interactions are descendents of the frequent ancestral self-interactions. This would suggest a structural difference between the ancient and modern proteome. Because a network's structure can reflect its functional capabilities, such a difference might imply unique functional capabilities of the ancestral proteome or potentially proteomic subfunctionalization between the pre- and postduplication organisms (3638). Alternatively, these ohnologous interactions might be de novo. Because overall P+ is small, this would suggest an evolutionary preference for adding or retaining ohnolog interactions (i.e., P+,ohnolog > P+,nonohnolog, or P−,ohnolog < P−,nonohnolog) (36).

Another intriguing explanation is that the high estimate for Psi results from selective retention of duplicates descended from ancestrally self-interacting proteins. Assuming that self-interactions were not more common in the ancestral network, our data may suggest that these pairs were under selective pressure to be maintained (46). Because they would be retained over long periods of time, they are more likely to have evolved a novel function (22, 38, 49). We suggest a simple dose-dependent model (described in SI Text) consistent with the idea that duplicated self-interacting proteins are selectively preserved (39). This could be an important contributor to the evolution of protein complexes (38, 45, 49).

Our model explains the current prevalence of the 19 ohnolog motifs and provides an estimate for pre- and postduplication parameters of the interaction network. The estimated frequency of self-interaction in the ancestral network is significantly higher than in today's network. This could indicate preferential retention of self-interacting protein duplicates, structural differences between the networks, or an inherent asymmetry between ohnologous and nonohnologous protein interaction dynamics. Our results are based on DIP and should be taken with caution because of possible bias and inherent noise associated with the high-throughput data that make up a significant portion of the DIP (2329, 48). It will be interesting to see whether similar observations appear in other sources of interaction data for S. cerevisiae and other species (1, 21, 40, 41).



We used the protein interactions listed in the DIP database (23, 2629). Data can be downloaded at http://dip.doe-mbi.ucla.edu/. The whole-genome duplicates are listed in the supplemental material of Kellis et al. (8).

Minimization Algorithm.

We solve Eq. 1 for the parameters that best fit the data by minimizing the error associated with the fit. The right hand side, mmodern, is directly derived from the data (Table 1). The left hand side, m0(Pi,PsiT(P+,P) yields a vector mexpected that depends on the four parameters Pi, Psi, P+, and P. For a motif i, the goodness of fit is given by the square of the difference between the observed abundance mmodern,i and the expected abundance mexpected,i, scaled by the expected number of motifs:

equation image

We then minimize E using the simplex search method (42) implemented by the fminsearch function in Matlab, obtaining best-fit values of Pi, Psi, P+, and P (see Table 2). The algorithm to estimate the error in the parameters is described in SI Text. We tested the model on simulated networks (SI Text and SI Table 4) before running on the actual yeast proteome.

Supplementary Material

Supporting Information:


We acknowledge N. Barkai, M. Brenner, A. DeLuna, E. Lieberman, I. Nachman, I. Wapinski, and K. Wolfe for their advice and helpful discussions and E. Lieberman and R. Milo for critical readings of the manuscript. This work was supported in part by National Institutes of Health Grants GM068763 (to M.B.E.) and R01GM081617 (to R.K.). A.P. was supported by a National Science Foundation Graduate Fellowship and a National Defense Science and Engineering Graduate Fellowship.


The authors declare no conflict of interest.

This article is a PNAS Direct Submission. L.K. is a guest editor invited by the Editorial Board.

This article contains supporting information online at www.pnas.org/cgi/content/full/0707293105/DC1.

**Explicitly, we allow one edge transition per site. This would not include cases where we have multiple transitions at a single site (e.g., An external file that holds a picture, illustration, etc.
Object name is zpq00108889100g3.jpg is equivalent in our method to An external file that holds a picture, illustration, etc.
Object name is zpq00108889100g4.jpg). In practice, multiple transitions are improbable, but we define our transitions to include these higher-order transitions for completeness.

††According to DIP, the dataset on which we base our analysis. In other datasets, this parameter ranges in value, with the largest being 0.138 [large literature-curated dataset (35)].


1. Aury JM, Jaillon O, Duret L, Noel B, Jubin C, Porcel BM, Segurens B, Daubin V, Anthouard V, Aiach N, et al. Nature. 2006;444:171–178. [PubMed]
2. Barabasi AL, Albert R. Science. 1999;286:509–512. [PubMed]
3. Dehal P, Boore JL. PLoS Biol. 2005;3:e314. [PMC free article] [PubMed]
4. Ispolatov I, Krapivksy PL, Mazo I, Yuryev A. New J Phys. 2005;7:145. [PMC free article] [PubMed]
5. Pastor-Satorras R, Smith E, Sole RV. J Theor Biol. 2003;222:199–210. [PubMed]
6. Hughes AL. Proc R Soc London Ser B. 1994;256:119–123.
7. Wolfe K. Curr Biol. 2004;14:R392–R394. [PubMed]
8. Kellis M, Birren BW, Lander ES. Nature. 2004;428:617–624. [PubMed]
9. Langkjaier RB, Cliften PF, Johnston M, Piskur J. Nature. 2003;421:848–852. [PubMed]
10. Ohno S. Evolution by Gene Duplication. London: Allen and Unwin; 1970.
11. Wolfe KH, Shields DC. Nature. 1997;387:708–713. [PubMed]
12. Conant GC, Wolfe KH. PLoS Biol. 2006;4:545–554.
13. Ihmels J, Collins SR, Schuldiner M, Krogan NJ, Weissman JS. Mol Syst Biol. 2007;3 [PMC free article] [PubMed]
14. Kafri R, Bar-Even A, Pilpel Y. Nat Genet. 2005;37:295–299. [PubMed]
15. Lynch M, Force A. Genetics. 2000;154 [PMC free article] [PubMed]
16. Tirosh I, Barkai N. Genome Biol. 2007;8:R50. [PMC free article] [PubMed]
17. Wagner A. Mol Biol Evol. 2002;19:1760–1768. [PubMed]
18. Papp B, Pal C, Hurst LD. Nature. 2003;424:194–197. [PubMed]
19. Cliften PF, Fulton RS, Wilson RK, Johnston M. Genetics. 2006;172:863–872. [PMC free article] [PubMed]
20. Mintseris J, Weng Z. Proc Natl Acad Sci USA. 2005;102:10930–10935. [PMC free article] [PubMed]
21. Scannell DR, Byrne KP, Gordon JL, Wong S, Wolfe K. Nature. 2006;440:341–345. [PubMed]
22. Wapinski I, Pfeffer A, Friedman N, Regev A. Nature. 2007;449:54–61. [PubMed]
23. Ho Y, Gruhler A, Heilbut A, Bader GD, Moore L, Adams SL, Millar A, Taylor P, Bennett K, Boutilier K, et al. Nature. 2002;415:180–183. [PubMed]
24. Ito T, Chiba T, Ozawa R, Yoshida M, Hattori M, Sakaki Y. Proc Natl Acad Sci USA. 2001;98:4569–4574. [PMC free article] [PubMed]
25. Salwinski L, Miller CS, Smith AJ, Pettit FK, Bowie JU, Eisenberg D. Nucleic Acids Res Database Issue. 2004;32:D449–D451. [PMC free article] [PubMed]
26. Uetz P, Giot L, Cagney G, Mansfield TA, Judson RS, Knight JR, Lockshon D, Narayan V, Srinivasan M, Pochart P, et al. Nature. 2000;403:623–627. [PubMed]
27. Xenarios I, Rice DW, Salwinski L, Baron MK, Marcotte EM, Eisenberg D. Nucleic Acids Res. 2000;28:289–291. [PMC free article] [PubMed]
28. Xenarios I, Fernandez E, Salwinski L, Duan XJ, Thompson MJ, Marcotte EM, Eisenberg D. Nucleic Acids Res. 2001;29:239–241. [PMC free article] [PubMed]
29. Xenarios I, Salwinski L, Duan XJ, Higney P, Kim S, Eisenberg D. Nucleic Acids Res. 2002;30:303–305. [PMC free article] [PubMed]
30. Milo R, Shen-Orr S, Itzkovitz S, Kashtan N, Chkovskii D, Alon U. Science. 2002;298:824–827. [PubMed]
31. Shen-Orr S, Milo R, Mangan S, Alon U. Nat Genet. 2003;32:64–68.
32. DeLuna A, Avendaño A, Riego L, González A. J Biol Chem. 2001;276:43775–43783. [PubMed]
33. Gibson TJ, Spring J. TiG. 1999;14:46–49. [PubMed]
34. Guldner U, Munsterkotter M, Oesterheld M, Pagel P, Ruepp A, Mewes HW, Stumpflen V. Nucleic Acids Res. 2006;34:D436–D441. [PMC free article] [PubMed]
35. Reguly T, Breitkreutz A, Boucher L, Breitkreutz B-J, Hon GC, Myers CL, Parsons A, Friesen H, Oughtred R, Tong A, et al. J Biol. 2006;5:11.11–11.28. [PMC free article] [PubMed]
36. Wagner A. Proc R Soc London Ser B. 2003;270:457–466. [PMC free article] [PubMed]
37. Ispolatov I, Yuryev A, Mazo I, Maslov S. Nucleic Acids Res. 2005;33:3629–3635. [PMC free article] [PubMed]
38. Pereira-Leal JB, Levy ED, Kamp C, Teichmann SA. Genome Biol. 2007;8:R51.51–R51.12. [PMC free article] [PubMed]
39. Hughes T, Ekman D, Ardawatia H, Elofsson A, Liberles DA. Genome Biol. 2007;8:8:213.211–218:213.214.
40. Britten RJ. Proc Natl Acad Sci USA. 2006;103:19027–19032. [PMC free article] [PubMed]
41. Jaillon O, Aury J-M, Brunet F, Petit J-L, Stange-Thomann N, Mauceli E, Bouneau L, Fischer C, Ozouf-Costaz C, Bernot A, et al. Nature. 2004;431:946–957. [PubMed]
42. Press WH, Teukolsky SA, Vetterling WT, Flannery BP. Numerical Recipes in C. Cambridge, UK: Cambridge Univ Press; 1992.
43. Prince VE, Pickett FB. Not Rev Genet. 2002;3:827–837. [PubMed]
44. Wagner A. Mol Biol Evol. 2001:18:1283–1292. [PubMed]
45. Pereira-Leal JB, Teichmann SA. Genome Res. 2005;15:552–559. [PMC free article] [PubMed]
46. Marianayagam NJ, Sunde M, Mathews JM. Trends Biochem Sci. 2004;29:618–625. [PubMed]
47. Batada NN, Reguly T, Breitkreutz A, Boucher L, Breitkreutz B-J, Hurst LD, Tyers M. PLoS Biol. 2007;5:e154. [PMC free article] [PubMed]
48. Yu H, Paccanaro A, Trifonov V, Gerstein M. Bioinformatics. 2006;22:823–829. [PubMed]
49. Musso G, Zhang Z, Emili A. Retention of protein–protein interactions by ancient duplicated gene products in budding yeast. Trends Genet. 2007;23:266–269. [PubMed]

Articles from Proceedings of the National Academy of Sciences of the United States of America are provided here courtesy of National Academy of Sciences
PubReader format: click here to try


Save items

Related citations in PubMed

See reviews...See all...

Cited by other articles in PMC

See all...


Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...