![]() | ![]() |
Formats:
|
||||||||||||||||||
Copyright © 2008 by The National Academy of Sciences of the USA Evolution The evolutionary dynamics of the Saccharomyces cerevisiae protein interaction network after duplication *School of Engineering and Applied Sciences, Harvard University, Cambridge, MA 02138; †Broad Institute, Cambridge, MA 02142; ‡Division of Biology and Department of Applied Physics, California Institute of Technology, Pasadena, CA 91125; §Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, Cambridge, MA 02139; and ¶Department of Systems Biology, Harvard Medical School, Boston, MA 02115 ‖To whom correspondence should be addressed. E-mail: roy_kishony/at/hms.harvard.edu Edited by Leonid Kruglyak, Princeton University, Princeton, NJ, and accepted by the Editorial Board November 20, 2007 Author contributions: A.P., M.B.E., and R.K. designed research; A.P. performed research; A.P., M.K., and R.K. analyzed data; and A.P., M.B.E., M.K., and R.K. wrote the paper. Received August 2, 2007. This article has been cited by other articles in PMC.Abstract Gene duplication is an important mechanism in the evolution of protein interaction networks. Duplications are followed by the gain and loss of interactions, rewiring the network at some unknown rate. Because rewiring is likely to change the distribution of network motifs within the duplicated interaction set, it should be possible to study network rewiring by tracking the evolution of these motifs. We have developed a mathematical framework that, together with duplication data from comparative genomic and proteomic studies, allows us to infer the connectivity of the preduplication network and the changes in connectivity over time. We focused on the whole-genome duplication (WGD) event in Saccharomyces cerevisiae. The model allowed us to predict the frequency of intergene interaction before WGD and the post duplication probabilities of interaction gain and loss. We find that the predicted frequency of self-interactions in the preduplication network is significantly higher than that observed in today's network. This could suggest a structural difference between the modern and ancestral networks, preferential addition or retention of interactions between ohnologs, or selective pressure to preserve duplicates of self-interacting proteins. Keywords: gene duplication, network motifs, self-interacting proteins, whole-genome duplication Complex biological networks result from the evolutionary growth of simpler networks with fewer components. Gene duplication is thought to be a key mechanism by which networks evolve and new components are added (1–6, 43). These duplication events can act on a single gene, a chromosomal segment, or even a whole genome (1, 7–11). After duplication, the duplicate genes may assume one of several fates, including differentiation of sequence and function, or loss of one of the duplicates (12–17, 44). These outcomes are thought to be affected by genetic factors including redundancy, modularization, and expression dosage (9, 12, 15, 18–22, 45). Little is known about the rules that govern the modification of gene interactions after a duplication event or the effects of gene interaction on the fate of duplicate genes. Here, we report a mathematical framework for inferring the preduplication connectivity properties of a network and for describing its postduplication dynamics. Our method decomposes a protein interaction network into a vector of network motifs and tracks the evolution of this vector over time. We apply our methodology to the protein interaction network of Saccharomyces cerevisiae (23–29), which has undergone a whole-genome duplication (WGD) event, resulting in hundreds of coordinately duplicated gene pairs (ohnologs) (8, 9, 11). Results and Discussion Network motifs are small subgraphs, or interaction patterns, that occur in networks more frequently than would be expected by chance (30). Motifs have been a valuable tool in identifying functional structure in many biological networks including in transcriptional, neural, and developmental networks (30, 31). We applied the concept of network motifs to WGD genes in S. cerevisiae and analyzed network motifs composed of pairs of ohnologs (namely, motifs of interactions within four proteins, Fig. 1
The proteins we considered for our motif analysis are the 450 WGD ohnolog pairs, as listed in Kellis et al. (8). Interactions between these proteins are listed in the Database of Interacting Proteins (DIP) (23–29). From these data we determined the modern distribution (mmodern) of our 19 motif classes (Table 1). We observe a rich variability in motif prevalences. Even for motifs with the same number of interactions, we observed that frequencies vary across several orders of magnitude, indicating that motif frequencies reflect evolutionary processes rather than stochastic effects. We then asked how much of the motif distribution observed today could be explained by a neutral model accounting for the evolutionary dynamics of gene duplication after the WGD event.
We developed a model describing protein connectivity within the subnetwork of surviving ohnologs (Fig. 1
The second step in the model encompasses the evolutionary dynamics after duplication (1). Mutations leading to the addition or deletion of an interaction are assumed to occur with probabilities P+ and P−, respectively. We define these probabilities as describing the overall period from the WGD event until today, accounting for the possibility of multiple rounds of addition and deletion.** We assume that rewiring events are independent, so that the probability of adding or removing multiple interactions is described by the product of the individual probabilities. This rewiring dynamic is described mathematically by a transition matrix (T, Fig. 2 becoming a motif of class is P−(1 − P+)5—the probability of losing the one interaction multiplied by the probability of not gaining an interaction at any of the five open positions. The final outcome of duplication and divergence should yield the motif distribution observed today, mmodern. We obtain a system of 19 equations, one for each motif class, with four variables: Pi, Psi, P+, and P−:
The transition matrix elements are functions of P+ and P−,and the initial condition zero-order motif vector m0 is a function of the preduplication parameters Pi and Psi. Because these four parameters are overdetermined by the 19 equations of Eq. 1, the existence of a solution is not mathematically guaranteed. We solved the equations for the best-fit values of Pi, Psi, P+, and P− (Methods and Table 2). Fig. 3
We also observe an enrichment of interactions between the ohnologs themselves. Based on the modern frequency of protein interactions (0.13%), we would expect <1 ohnolog pair to interact. We observe 44 interactions of this type (binomial P 10−10)—nearly 10% of our ohnologs (see, for example, refs. 18, 32, 33, and 37). This phenomenon translates itself in the context of our model to a high probability of self-interaction in the preduplication network (Psi = 0.25). This frequency of self-interaction is nearly fivefold higher than observed in the modern value (0.056,††Fig. 3A simple explanation for this phenomenon is that the ancestral network contained more self-interacting proteins than exist in the modern network and that the ohnolog interactions are descendents of the frequent ancestral self-interactions. This would suggest a structural difference between the ancient and modern proteome. Because a network's structure can reflect its functional capabilities, such a difference might imply unique functional capabilities of the ancestral proteome or potentially proteomic subfunctionalization between the pre- and postduplication organisms (36–38). Alternatively, these ohnologous interactions might be de novo. Because overall P+ is small, this would suggest an evolutionary preference for adding or retaining ohnolog interactions (i.e., P+,ohnolog > P+,nonohnolog, or P−,ohnolog < P−,nonohnolog) (36). Another intriguing explanation is that the high estimate for Psi results from selective retention of duplicates descended from ancestrally self-interacting proteins. Assuming that self-interactions were not more common in the ancestral network, our data may suggest that these pairs were under selective pressure to be maintained (46). Because they would be retained over long periods of time, they are more likely to have evolved a novel function (22, 38, 49). We suggest a simple dose-dependent model (described in SI Text) consistent with the idea that duplicated self-interacting proteins are selectively preserved (39). This could be an important contributor to the evolution of protein complexes (38, 45, 49). Our model explains the current prevalence of the 19 ohnolog motifs and provides an estimate for pre- and postduplication parameters of the interaction network. The estimated frequency of self-interaction in the ancestral network is significantly higher than in today's network. This could indicate preferential retention of self-interacting protein duplicates, structural differences between the networks, or an inherent asymmetry between ohnologous and nonohnologous protein interaction dynamics. Our results are based on DIP and should be taken with caution because of possible bias and inherent noise associated with the high-throughput data that make up a significant portion of the DIP (23–29, 48). It will be interesting to see whether similar observations appear in other sources of interaction data for S. cerevisiae and other species (1, 21, 40, 41). Methods Databases. We used the protein interactions listed in the DIP database (23, 26–29). Data can be downloaded at http://dip.doe-mbi.ucla.edu/. The whole-genome duplicates are listed in the supplemental material of Kellis et al. (8). Minimization Algorithm. We solve Eq. 1 for the parameters that best fit the data by minimizing the error associated with the fit. The right hand side, mmodern, is directly derived from the data (Table 1). The left hand side, m0(Pi,Psi)·T(P+,P−) yields a vector mexpected that depends on the four parameters Pi, Psi, P+, and P−. For a motif i, the goodness of fit is given by the square of the difference between the observed abundance mmodern,i and the expected abundance mexpected,i, scaled by the expected number of motifs:
We then minimize E using the simplex search method (42) implemented by the fminsearch function in Matlab, obtaining best-fit values of Pi, Psi, P+, and P− (see Table 2). The algorithm to estimate the error in the parameters is described in SI Text. We tested the model on simulated networks (SI Text and SI Table 4) before running on the actual yeast proteome. Supporting Information
ACKNOWLEDGMENTS. We acknowledge N. Barkai, M. Brenner, A. DeLuna, E. Lieberman, I. Nachman, I. Wapinski, and K. Wolfe for their advice and helpful discussions and E. Lieberman and R. Milo for critical readings of the manuscript. This work was supported in part by National Institutes of Health Grants GM068763 (to M.B.E.) and R01GM081617 (to R.K.). A.P. was supported by a National Science Foundation Graduate Fellowship and a National Defense Science and Engineering Graduate Fellowship. Footnotes The authors declare no conflict of interest. This article is a PNAS Direct Submission. L.K. is a guest editor invited by the Editorial Board. This article contains supporting information online at www.pnas.org/cgi/content/full/0707293105/DC1. **Explicitly, we allow one edge transition per site. This would not include cases where we have multiple transitions at a single site (e.g., is equivalent in our method to ). In practice, multiple transitions are improbable, but we define our transitions to include these higher-order transitions for completeness.††According to DIP, the dataset on which we base our analysis. In other datasets, this parameter ranges in value, with the largest being 0.138 [large literature-curated dataset (35)]. References 1. Aury JM, Jaillon O, Duret L, Noel B, Jubin C, Porcel BM, Segurens B, Daubin V, Anthouard V, Aiach N, et al. Nature. 2006;444:171–178. [PubMed] 2. Barabasi AL, Albert R. Science. 1999;286:509–512. [PubMed] 3. Dehal P, Boore JL. PLoS Biol. 2005;3:e314. [PubMed] 4. Ispolatov I, Krapivksy PL, Mazo I, Yuryev A. New J Phys. 2005;7:145. 5. Pastor-Satorras R, Smith E, Sole RV. J Theor Biol. 2003;222:199–210. [PubMed] 6. Hughes AL. Proc R Soc London Ser B. 1994;256:119–123. 7. Wolfe K. Curr Biol. 2004;14:R392–R394. [PubMed] 8. Kellis M, Birren BW, Lander ES. Nature. 2004;428:617–624. [PubMed] 9. Langkjaier RB, Cliften PF, Johnston M, Piskur J. Nature. 2003;421:848–852. [PubMed] 10. Ohno S. Evolution by Gene Duplication. London: Allen and Unwin; 1970. 11. Wolfe KH, Shields DC. Nature. 1997;387:708–713. [PubMed] 12. Conant GC, Wolfe KH. PLoS Biol. 2006;4:545–554. 13. Ihmels J, Collins SR, Schuldiner M, Krogan NJ, Weissman JS. Mol Syst Biol. 2007;3 14. Kafri R, Bar-Even A, Pilpel Y. Nat Genet. 2005;37:295–299. [PubMed] 15. Lynch M, Force A. Genetics. 2000;154 16. Tirosh I, Barkai N. Genome Biol. 2007;8:R50. [PubMed] 17. Wagner A. Mol Biol Evol. 2002;19:1760–1768. [PubMed] 18. Papp B, Pal C, Hurst LD. Nature. 2003;424:194–197. [PubMed] 19. Cliften PF, Fulton RS, Wilson RK, Johnston M. Genetics. 2006;172:863–872. [PubMed] 20. Mintseris J, Weng Z. Proc Natl Acad Sci USA. 2005;102:10930–10935. [PubMed] 21. Scannell DR, Byrne KP, Gordon JL, Wong S, Wolfe K. Nature. 2006;440:341–345. [PubMed] 22. Wapinski I, Pfeffer A, Friedman N, Regev A. Nature. 2007;449:54–61. [PubMed] 23. Ho Y, Gruhler A, Heilbut A, Bader GD, Moore L, Adams SL, Millar A, Taylor P, Bennett K, Boutilier K, et al. Nature. 2002;415:180–183. [PubMed] 24. Ito T, Chiba T, Ozawa R, Yoshida M, Hattori M, Sakaki Y. Proc Natl Acad Sci USA. 2001;98:4569–4574. [PubMed] 25. Salwinski L, Miller CS, Smith AJ, Pettit FK, Bowie JU, Eisenberg D. Nucleic Acids Res Database Issue. 2004;32:D449–D451. 26. Uetz P, Giot L, Cagney G, Mansfield TA, Judson RS, Knight JR, Lockshon D, Narayan V, Srinivasan M, Pochart P, et al. Nature. 2000;403:623–627. [PubMed] 27. Xenarios I, Rice DW, Salwinski L, Baron MK, Marcotte EM, Eisenberg D. Nucleic Acids Res. 2000;28:289–291. [PubMed] 28. Xenarios I, Fernandez E, Salwinski L, Duan XJ, Thompson MJ, Marcotte EM, Eisenberg D. Nucleic Acids Res. 2001;29:239–241. [PubMed] 29. Xenarios I, Salwinski L, Duan XJ, Higney P, Kim S, Eisenberg D. Nucleic Acids Res. 2002;30:303–305. [PubMed] 30. Milo R, Shen-Orr S, Itzkovitz S, Kashtan N, Chkovskii D, Alon U. Science. 2002;298:824–827. [PubMed] 31. Shen-Orr S, Milo R, Mangan S, Alon U. Nat Genet. 2003;32:64–68. 32. DeLuna A, Avendaño A, Riego L, González A. J Biol Chem. 2001;276:43775–43783. [PubMed] 33. Gibson TJ, Spring J. TiG. 1999;14:46–49. [PubMed] 34. Guldner U, Munsterkotter M, Oesterheld M, Pagel P, Ruepp A, Mewes HW, Stumpflen V. Nucleic Acids Res. 2006;34:D436–D441. [PubMed] 35. Reguly T, Breitkreutz A, Boucher L, Breitkreutz B-J, Hon GC, Myers CL, Parsons A, Friesen H, Oughtred R, Tong A, et al. J Biol. 2006;5:11.11–11.28. [PubMed] 36. Wagner A. Proc R Soc London Ser B. 2003;270:457–466. 37. Ispolatov I, Yuryev A, Mazo I, Maslov S. Nucleic Acids Res. 2005;33:3629–3635. [PubMed] 38. Pereira-Leal JB, Levy ED, Kamp C, Teichmann SA. Genome Biol. 2007;8:R51.51–R51.12. [PubMed] 39. Hughes T, Ekman D, Ardawatia H, Elofsson A, Liberles DA. Genome Biol. 2007;8:8:213.211–218:213.214. 40. Britten RJ. Proc Natl Acad Sci USA. 2006;103:19027–19032. [PubMed] 41. Jaillon O, Aury J-M, Brunet F, Petit J-L, Stange-Thomann N, Mauceli E, Bouneau L, Fischer C, Ozouf-Costaz C, Bernot A, et al. Nature. 2004;431:946–957. [PubMed] 42. Press WH, Teukolsky SA, Vetterling WT, Flannery BP. Numerical Recipes in C. Cambridge, UK: Cambridge Univ Press; 1992. 43. Prince VE, Pickett FB. Not Rev Genet. 2002;3:827–837. 44. Wagner A. Mol Biol Evol. 2001:18:1283–1292. 45. Pereira-Leal JB, Teichmann SA. Genome Res. 2005;15:552–559. [PubMed] 46. Marianayagam NJ, Sunde M, Mathews JM. Trends Biochem Sci. 2004;29:618–625. [PubMed] 47. Batada NN, Reguly T, Breitkreutz A, Boucher L, Breitkreutz B-J, Hurst LD, Tyers M. PLoS Biol. 2007;5:e154. [PubMed] 48. Yu H, Paccanaro A, Trifonov V, Gerstein M. Bioinformatics. 2006;22:823–829. [PubMed] 49. Musso G, Zhang Z, Emili A. Retention of protein–protein interactions by ancient duplicated gene products in budding yeast. Trends Genet. 2007;23:266–269. [PubMed] |
PubMed related articles
Your browsing activity is empty. Activity recording is turned off. |
|||||||||||||||||
Nature. 2006 Nov 9; 444(7116):171-8.
[Nature. 2006]Science. 1999 Oct 15; 286(5439):509-12.
[Science. 1999]PLoS Biol. 2005 Oct; 3(10):e314.
[PLoS Biol. 2005]J Theor Biol. 2003 May 21; 222(2):199-210.
[J Theor Biol. 2003]Curr Biol. 2004 May 25; 14(10):R392-4.
[Curr Biol. 2004]Nature. 2002 Jan 10; 415(6868):180-3.
[Nature. 2002]Proc Natl Acad Sci U S A. 2001 Apr 10; 98(8):4569-74.
[Proc Natl Acad Sci U S A. 2001]Nature. 2000 Feb 10; 403(6770):623-7.
[Nature. 2000]Nucleic Acids Res. 2000 Jan 1; 28(1):289-91.
[Nucleic Acids Res. 2000]Nucleic Acids Res. 2001 Jan 1; 29(1):239-41.
[Nucleic Acids Res. 2001]Science. 2002 Oct 25; 298(5594):824-7.
[Science. 2002]Nature. 2004 Apr 8; 428(6983):617-24.
[Nature. 2004]Nature. 2002 Jan 10; 415(6868):180-3.
[Nature. 2002]Proc Natl Acad Sci U S A. 2001 Apr 10; 98(8):4569-74.
[Proc Natl Acad Sci U S A. 2001]Nature. 2000 Feb 10; 403(6770):623-7.
[Nature. 2000]Nucleic Acids Res. 2000 Jan 1; 28(1):289-91.
[Nucleic Acids Res. 2000]J Theor Biol. 2003 May 21; 222(2):199-210.
[J Theor Biol. 2003]Nature. 2006 Nov 9; 444(7116):171-8.
[Nature. 2006]J Theor Biol. 2003 May 21; 222(2):199-210.
[J Theor Biol. 2003]Genome Biol. 2007; 8(4):R51.
[Genome Biol. 2007]PLoS Biol. 2007 Jun; 5(6):e154.
[PLoS Biol. 2007]Nature. 2003 Jul 10; 424(6945):194-7.
[Nature. 2003]J Biol Chem. 2001 Nov 23; 276(47):43775-83.
[J Biol Chem. 2001]Trends Genet. 1998 Feb; 14(2):46-9; discussion 49-50.
[Trends Genet. 1998]Nucleic Acids Res. 2005; 33(11):3629-35.
[Nucleic Acids Res. 2005]Nucleic Acids Res. 2005; 33(11):3629-35.
[Nucleic Acids Res. 2005]Genome Biol. 2007; 8(4):R51.
[Genome Biol. 2007]Trends Biochem Sci. 2004 Nov; 29(11):618-25.
[Trends Biochem Sci. 2004]Nature. 2007 Sep 6; 449(7158):54-61.
[Nature. 2007]Genome Biol. 2007; 8(4):R51.
[Genome Biol. 2007]Trends Genet. 2007 Jun; 23(6):266-9.
[Trends Genet. 2007]Genome Res. 2005 Apr; 15(4):552-9.
[Genome Res. 2005]Nature. 2002 Jan 10; 415(6868):180-3.
[Nature. 2002]Proc Natl Acad Sci U S A. 2001 Apr 10; 98(8):4569-74.
[Proc Natl Acad Sci U S A. 2001]Nature. 2000 Feb 10; 403(6770):623-7.
[Nature. 2000]Nucleic Acids Res. 2000 Jan 1; 28(1):289-91.
[Nucleic Acids Res. 2000]Nucleic Acids Res. 2001 Jan 1; 29(1):239-41.
[Nucleic Acids Res. 2001]Nature. 2002 Jan 10; 415(6868):180-3.
[Nature. 2002]Nature. 2000 Feb 10; 403(6770):623-7.
[Nature. 2000]Nucleic Acids Res. 2000 Jan 1; 28(1):289-91.
[Nucleic Acids Res. 2000]Nucleic Acids Res. 2001 Jan 1; 29(1):239-41.
[Nucleic Acids Res. 2001]Nucleic Acids Res. 2002 Jan 1; 30(1):303-5.
[Nucleic Acids Res. 2002]J Biol. 2006; 5(4):11.
[J Biol. 2006]