![]() | ![]() |
Formats:
|
||||||||
Copyright © 2005, EMBO and Nature Publishing Group Expression dynamics of a cellular metabolic network 1Department of Genetics, Harvard Medical School, Boston, MA, USA 2Department of Biomedical Informatics, Center for Computational Biology and Bioinformatics, Columbia University, New York, NY, USA aDepartment of Genetics, New Research Building (NRB) Room 238, 77 Ave. Louis Pasteur, Harvard Medical School, Boston, MA 02115, USA. Tel.: +1 617 432 1278; Fax: +1 617 432 6511; E-mail: g1m1c1/at/arep.med.harvard.edu bDepartment of Biomedical Informatics, Center for Computational Biology and Bioinformatics, Columbia University, 1150 St Nicholas Ave., New York, NY 10032, USA. Tel.: +1 212 851 5151; Fax: +1 212 851 5290; E-mail: vitkup/at/dbmi.columbia.edu Received March 2, 2005; Accepted June 30, 2005. This article has been cited by other articles in PMC.Abstract Toward the goal of understanding system properties of biological networks, we investigate the global and local regulation of gene expression in the Saccharomyces cerevisiae metabolic network. Our results demonstrate predominance of local gene regulation in metabolism. Metabolic genes display significant coexpression on distances smaller than the average network distance, a behavior supported by the distribution of transcription factor binding sites in the metabolic network and genome context associations. Positive gene coexpression decreases monotonically with distance in the network, while negative coexpression is strongest at intermediate network distances. We show that basic topological motifs of the metabolic network exhibit statistically significant differences in coexpression behavior. Keywords: expression, genome context, metabolism, motifs, network Introduction Recent studies of general topological organization of metabolic networks have provided valuable insights into the functional properties of these systems (Jeong et al, 2000; Ravasz et al, 2002). The topological characteristics, however, provide only a static description of the biological networks. Functions of cellular networks are highly regulated in a temporal fashion at a number of levels, including transcriptional regulation. The expression dynamics of protein–protein and protein–DNA interaction networks have been previously investigated and small, but statistically significant, expression correlations were found between network neighbors (Ge et al, 2001). Earlier studies have analyzed coexpression in individual metabolic pathways (Gerstein and Jansen, 2000; Hughes et al, 2000; Karp et al, 2002; Pavlidis et al, 2002; Ihmels et al, 2003), identified highly coexpressed modules (Hanisch et al, 2002; Ihmels et al, 2003; Patil and Nielsen, 2005) and have characterized coexpression properties of metabolic junctions (Ihmels et al, 2003). In the present study, we demonstrate several novel aspects of the expression dynamics in yeast metabolic network. We also show that the observed patterns of the mRNA coexpression are similar to the patterns of other context-based associations between genes: phylogenetic profiles and chromosomal distance. In our analysis we represent metabolism as a graphical model, with nodes of the graph corresponding to genes encoding metabolic enzymes, and edges to metabolic connections between corresponding enzymes (see Materials and methods). We investigate how positive and negative correlation of mRNA expression profiles depends on the metabolic network distance, and determine the maximum distance at which genes display statistically significant coexpression. At the level of individual biochemical reactions, we examine coexpression and functional association patterns of local topological motifs of the metabolic network (Shen-Orr et al, 2002). Our results show that different motifs exhibit distinct coexpression patterns, which may elucidate dynamic design principles of metabolic networks. Results and discussion Global properties of gene coexpression Our analysis uses Saccharomyces cerevisiae metabolic network recently reconstructed by Forster et al (2003). In contrast to currently available protein–protein interaction data, yeast metabolic network has a substantially smaller error rate compared to commonly available physical interaction data (Mering et al, 2002; Spinzak et al, 2003). Using the established metabolic connectivity between S. cerevisiae enzyme-encoding genes, we investigate how the degree of gene coexpression changes with the network distance. A similar question was posed recently for adjacent genes in the protein–protein physical interaction network (Ge et al, 2001; Jansen et al, 2002). An intuitive expectation is that genes close in the metabolic network would also be coexpressed. The dependency of mean expression distance (distance=1−correlation coefficient, see Materials and methods) on metabolic network distance is shown in Figure 1A
Local coregulation can also be illustrated by examining distribution of transcription factor binding in the metabolic network. Because genes sharing DNA binding transcription factors are, generally, expected to be coregulated, the distribution of transcription factor binding cooccurrences in the metabolic network should confirm the pattern observed for coexpression. Indeed, we find (Figure 1D The expression distance dependency is shown separately for gene pairs with positive and negative correlation of expression profiles (Figure 1B and C Both coexpression and transcription factor binding site cooccurrence reflect the degree of functional association between metabolic genes. To generalize our results, we have examined other functional association evidence based on the genome context: physical clustering of genes on the chromosome (Overbeek et al, 1999) and gene cooccurrence in phylogenetic profiles (Pellegrini et al, 1999) (see Materials and methods). We find that associations based on both cooccurrence in phylogenetic profiles (Figure 1E Coexpression in the metabolic network depends not only on the network distance between genes, but also on the characteristics of connecting metabolites. Analyzing the dependency between the total number of enzyme pairs connected by a given metabolite (metabolite enzyme pair number) and mean expression of the connected pairs (see Supplementary Figure 1), we find that increasing enzyme pair number corresponds, on average, to weaker positive coexpression (Spearman rank r=0.21, P=4.7 × 10−5) and stronger negative coexpression (r=−0.14, P=2.7 × 10−2). The details of this analysis and of similar genome context association dependencies are given in Supplementary information. It should be noted that our analysis examines coexpression across a large number of conditions. As it has been recently demonstrated (Patil and Nielsen, 2005), individual environmental perturbations may affect sets of metabolic genes connected by some of the common cofactors. Nevertheless, our study shows that the overall coexpression is stronger for genes connected by metabolites used in a small number of reactions. In general, a decrease in the reaction number of connecting metabolite increases linearity of the pathways going through this node. Consequently, our results suggest that positive coexpression and strong functional associations dominate in linear parts of the network, while negative coexpression is stronger in highly branched pathways. Gene coexpression in metabolic motifs To understand patterns of local regulation in the metabolic network, we compare coexpression properties of elementary topological motifs formed by adjacent enzymes in the metabolite graph. Several studies have recently investigated local regulatory motifs in bacteria and yeast (Lee et al, 2002; Shen-Orr et al, 2002). These studies identified elementary topological motifs that are significantly more abundant in real biological networks than expected by chance. In contrast, we analyze coexpression of all possible two-gene motifs and a majority of three-gene substructures of the metabolite graph (see Materials and methods). Mean expression distances of these motifs are given in Supplementary Table 1. The same analysis was repeated on each individual expression data set, and the results are shown in Supplementary Tables 2–4. Below, we examine behavior established by positive coexpression as being predominant in magnitude and consistent between different expression data sets (considering both positive and negative coexpression results in the same motif ordering). The ordering of irreversible two-gene motifs established by positive coexpression is shown in Figure 2A
The M2 and M3 motifs represent pairs of divergent and convergent metabolic reactions correspondingly. The mean level of positive coexpression of genes within the M2 motif is significantly higher than within the M3 motif (Wilcoxon test P=3.4 × 10−3), implying that coregulation of divergent metabolic pathways is generally stronger compared to convergent pathways. This suggests that regulation of the metabolic network emphasizes reactions in which one metabolic precursor, such as a carbon source, is simultaneously used to synthesize a variety of compounds required for biomass growth. In analyzing three-gene motifs, we compare coexpression among different types of gene pairs within each motif. The M6 and M7 motifs (Figure 2B In contrast to the M7 motif, the pattern of positive coexpression for the three-gene M6 motif differs from the behavior observed in two-gene motifs. Here, enzymes consuming the same metabolite exhibit, on average, stronger coexpression with each other than with the enzyme producing the metabolite. Although the observed pattern for the M6 motif coexpression differs from the conclusions of the divergent junction analysis reported by Ihmels et al (2003), no statistical significance has been established in either case (Wilcoxon test P=0.33 for positively coexpressed pairs, P=0.54 for positive and negative coexpression, no P-value was reported by Ihmels et al). Overall, coexpression patterns of both M6 and M7 motifs support predominance of coregulation in divergent versus convergent branches demonstrated in the analysis of the two-gene motifs. The coexpression behavior of local topological motifs is matched by the strength of genome context associations between genes. Genome context association of the metabolite graph motifs (clustering of genes on the chromosome, and gene cooccurrence in phylogenetic profiles) is given in Supplementary Tables 6 and 7. Since genome context scores rely on ortholog identities, these analyses exclude motifs formed by homologous gene pairs (see Materials and methods). The ordering of irreversible two-gene motifs (M1–M5) established by the genome context associations is identical to the order established by positive coexpression, with the exception of M1–M2 switch in the phylogenetic profile association ordering (Supplementary Figure 7). Significantly larger association of genes in divergent (M2) compared to convergent (M3) motif is also supported by the difference in the mean chromosome clustering scores (Wilcoxon test P=4.6 × 10−19) and phylogenetic profile cooccurrence (Wilcoxon test P=3.8 × 10−30). Similarly, the relative coexpression behavior of the three-gene M6 and M7 motifs is matched by the relationship of genome context associations. Conclusion The presented results reveal interesting and statistically significant patterns of coregulation in the metabolic network. We find that regulation of metabolic genes is local and extends, generally, to distances smaller than the mean network distance. Such regulation implies that genes close in the metabolic network are usually coexpressed together, possibly to optimize local metabolic fluxes (Zaslaver et al, 2004). Positive coexpression is strongest among adjacent genes and decreases monotonically with network distance. In contrast, negative coexpression is most prominent at intermediate distances. Functional associations based on the genome context analysis exhibit the same local property observed in the case of positive coexpression. These results suggest that regulation of the metabolic network establishes a number of local, positively coexpressed regions that may exhibit some degree of negative coexpression between each other. Furthermore, we find that positive coexpression and functional associations are strongest in the linear parts of metabolism, while negative coexpression is more pronounced in highly branched regions. Our analysis of the elementary topological motifs illustrates that coexpression in divergent branches is significantly stronger than that observed in convergent branches. This pattern suggests emphasis on coregulation of biomass synthesis or degradation from common metabolic precursors. Good agreement between the mRNA coexpression and genome context associations suggests that the observed patterns of metabolic regulation are reflected in genome evolution and affect the location of genes on the chromosomes. In future studies, it will be important to confirm our findings using the metabolic networks of other organisms. It would also be interesting to perform similar analysis of other cellular networks, for example signaling and protein–protein interaction networks. Materials and methods Metabolic dependency graph and separation between genes Metabolism was represented in a form of a connectivity graph. The nodes of the graph correspond to metabolic genes, and edges correspond to connections established by metabolic reactions. Metabolic genes X and Y are considered connected if and only if there exists a metabolite that is present among the list of either reactants or products of reactions catalyzed by enzymes encoded by both X and Y. The metabolic connectivity graph is used to calculate network distance (or metabolic separation) between genes. We define a pair of directly connected metabolic genes as being separated by distance 1. In general, we define network distance between the genes X and Y as the length of the shortest path from X to Y on the metabolic connectivity graph. A hand-curated metabolic network model of S. cerevisiae (Forster et al, 2003) was used to construct a comprehensive metabolic connectivity graph. While any metabolite can be used to deduce gene connectivity, the relationships established by the common cofactors, such as ATP, are not likely to connect genes with similar metabolic functions. In compiling a global metabolic connectivity graph, we consider a subset of metabolites, which excludes most highly connected metabolic species. An exclusion threshold was determined based on the connectivity of the resulting gene dependency graph (Supplementary Figure 1). A total of 14 most highly connected metabolites (ATP, ADP, AMP, CO2, CoA, glutamate, H, NAD, NADH, NADP, NADPH, NH3, orthophosphate, pyrophosphate) and their mitochondrial and external analogs were excluded. The general trends described in the paper are not sensitive to the precise choice of the metabolite set; however, the actual values change when more or less metabolites are considered. For detailed analysis, see Supplementary information. Genes encoding enzymes that are part of known complexes, according to MIPS complex database (http://mips.gsf.de) and SGD (http://www.yeastgenome.org/), were masked as unassigned enzymes, so that their expression profiles would not be included in any of the analysis (36 enzyme-encoding genes in total). Distances between gene expression profiles We used three data sets as sources of gene expression information. The Rosetta's ‘compendium' data set (Hughes et al, 2000) measures expression profiles of over 6200 S. cerevisiae open reading frames (ORFs) across 287 deletion strains and 13 chemical conditions. In addition, the data set contains 63 negative control measurements comparing two independent cultures of the same strain. These were used to establish individual error models for each ORF, providing not only the raw intensity and ratio measurement values for each experimental data point, but also a P-value gauging the significance of change in expression level. The ratio data were used for all analysis. We also used a data set from Brown's group, containing 173 environmental perturbations (Gasch et al, 2000), and a data set from Young's group with 34 conditions describing seven environmental perturbation time courses (Causton et al, 2001). Log 10 intensity ratios of each data set were normalized to have a mean of 0 and a variance of 1. Separate time courses contained in these data sets were first normalized individually and then combined. Supplementary Table 8 shows the relative variability of metabolic enzyme-encoding genes in each data set. The expression distance measure between ORFs X and Y is then taken to be 1−S (px,py) , where px and py are expression profile vectors of X and Y, and S corresponds to Spearman rank correlation coefficient, calculated according to Press et al (2002). Combined expression profile vectors are formed by concatenation of log 10 ratio values from three data sets. Conditions missing experimental value for at least one of the genes were omitted in calculating the expression distance of a given gene pair. Genes that were missing expression values for more than 25% of conditions were not considered in the analysis.Transcription factor binding Information on transcription factor binding to the metabolic gene promoter sites was taken from Lee et al (2002). A P-value threshold of 0.001 was used to select transcription factor binding occurrences. Clustering of genes on the chromosome To assess genome context association based on the physical clustering of genes on the chromosome, we relied on gene order statistics. Chromosome clustering association score between genes x and y was calculated as S(x y)= g GP(dg(x,y)), where G is a set of genomes and P(dg(x, y)) is the probability of observing gene order distance dg(x, y) between genes x and y in a genome g, calculated based on the chromosome sizes of organism under the null hypothesis that genes are randomly ordered across the chromosomes. The scores were calculated using a set of 105 bacterial and three eukaryotic genomes. Orthology mapping was established using best bidirectional hits from KEGG SSDB (Itoh et al, 2004).Cooccurrence in phylogenetic profiles Phylogenetic profile cooccurrence association (Pellegrini et al, 1999) was assessed using hypergeometric probability, as described by Bowers et al (2004). The orthology data set was constructed based on best bidirectional BLASTP hits against NCBI NR protein data set. The calculation was limited to organisms containing orthologs for at least 1% of S. cerevisiae genes. Metabolic motifs The elementary topological motifs of the metabolite graph were classified in terms of the expression properties of the genes involved. The elementary node structures were enumerated, and sets of genes that were connected in the appropriate topology were extracted for each structure. It is possible for one gene to be included in multiple occurrences of the same motif; in other words, we counted any substructure with the correct topology as an occurrence. Motif instances formed around top 14 most connected metabolites, as well as their mitochondrial and external forms, were not included in the analysis. Mean expression distances of different types of gene pairs were compared using the Wilcoxon rank test (Wilcoxon, 1945). In generating homolog-filtered data, a pair of metabolic genes was excluded from the analysis if the BLAST score comparing the two nucleotide sequences was below an E-value of 10−3. Acknowledgments We thank Patrik D'haeseleer, Philippe Marc, John Aach, Yonathan Grad, Leonid Mirny, Andrey Rzhetsky, Paul Pavlidis and Andrea Califano for useful discussions and critical reading of the manuscript. GMC was supported by the US Department of Energy, the Defense Advanced Research Projects Agency, and the PhRMA Foundation References
|
PubMed related articles
Your browsing activity is empty. Activity recording is turned off. |
|||||||
Nature. 2000 Oct 5; 407(6804):651-4.
[Nature. 2000]Science. 2002 Aug 30; 297(5586):1551-5.
[Science. 2002]Nat Genet. 2001 Dec; 29(4):482-6.
[Nat Genet. 2001]Curr Opin Struct Biol. 2000 Oct; 10(5):574-84.
[Curr Opin Struct Biol. 2000]Cell. 2000 Jul 7; 102(1):109-26.
[Cell. 2000]Nat Genet. 2002 May; 31(1):64-8.
[Nat Genet. 2002]Genome Res. 2003 Feb; 13(2):244-53.
[Genome Res. 2003]Nature. 2002 May 23; 417(6887):399-403.
[Nature. 2002]J Mol Biol. 2003 Apr 11; 327(5):919-23.
[J Mol Biol. 2003]Nat Genet. 2001 Dec; 29(4):482-6.
[Nat Genet. 2001]Genome Res. 2002 Jan; 12(1):37-46.
[Genome Res. 2002]Cell. 2000 Jul 7; 102(1):109-26.
[Cell. 2000]Mol Biol Cell. 2000 Dec; 11(12):4241-57.
[Mol Biol Cell. 2000]Mol Biol Cell. 2001 Feb; 12(2):323-37.
[Mol Biol Cell. 2001]Proc Natl Acad Sci U S A. 1999 Mar 16; 96(6):2896-901.
[Proc Natl Acad Sci U S A. 1999]Proc Natl Acad Sci U S A. 1999 Apr 13; 96(8):4285-8.
[Proc Natl Acad Sci U S A. 1999]Proc Natl Acad Sci U S A. 2005 Feb 22; 102(8):2685-9.
[Proc Natl Acad Sci U S A. 2005]Science. 2002 Oct 25; 298(5594):799-804.
[Science. 2002]Nat Genet. 2002 May; 31(1):64-8.
[Nat Genet. 2002]Mol Biol Evol. 2002 Oct; 19(10):1760-8.
[Mol Biol Evol. 2002]Nat Biotechnol. 2004 Jan; 22(1):86-92.
[Nat Biotechnol. 2004]Nat Biotechnol. 2004 Jan; 22(1):86-92.
[Nat Biotechnol. 2004]Nat Genet. 2004 May; 36(5):486-91.
[Nat Genet. 2004]Genome Res. 2003 Feb; 13(2):244-53.
[Genome Res. 2003]Cell. 2000 Jul 7; 102(1):109-26.
[Cell. 2000]Mol Biol Cell. 2000 Dec; 11(12):4241-57.
[Mol Biol Cell. 2000]Mol Biol Cell. 2001 Feb; 12(2):323-37.
[Mol Biol Cell. 2001]Science. 2002 Oct 25; 298(5594):799-804.
[Science. 2002]Proc Natl Acad Sci U S A. 1999 Apr 13; 96(8):4285-8.
[Proc Natl Acad Sci U S A. 1999]Genome Biol. 2004; 5(5):R35.
[Genome Biol. 2004]