![]() | ![]() |
Formats:
|
|||||||||||||||||||||||
Copyright : © 2007 de Bivort et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Empirical Multiscale Networks of Cellular Regulation 1 Department of Molecular and Cellular Biology, Harvard University, Cambridge, Massachusetts, United States of America 2 Vascular Biology Program, Children's Hospital and Harvard Medical School, Boston, Massachusetts, United States of America 3 Department of Pathology, Children's Hospital and Harvard Medical School, Boston, Massachusetts, United States of America 4 Department of Surgery, Children's Hospital and Harvard Medical School, Boston, Massachusetts, United States of America 5 New England Complex Systems Institute, Cambridge, Massachusetts, United States of America Yitzhak Pilpel, Editor Weizmann Institute of Science, Israel * To whom correspondence should be addressed. E-mail: bivort/at/fas.harvard.edu Received December 18, 2006; Accepted September 7, 2007. Abstract Grouping genes by similarity of expression across multiple cellular conditions enables the identification of cellular modules. The known functions of genes enable the characterization of the aggregate biological functions of these modules. In this paper, we use a high-throughput approach to identify the effective mutual regulatory interactions between modules composed of mouse genes from the Alliance for Cell Signaling (AfCS) murine B-lymphocyte database which tracks the response of ~15,000 genes following chemokine perturbation. This analysis reveals principles of cellular organization that we discuss along four conceptual axes. (1) Regulatory implications: the derived collection of influences between any two modules quantifies intuitive as well as unexpected regulatory interactions. (2) Behavior across scales: trends across global networks of varying resolution (composed of various numbers of modules) reveal principles of assembly of high-level behaviors from smaller components. (3) Temporal behavior: tracking the mutual module influences over different time intervals provides features of regulation dynamics such as duration, persistence, and periodicity. (4) Gene Ontology correspondence: the association of modules to known biological roles of individual genes describes the organization of functions within coexpressed modules of various sizes. We present key specific results in each of these four areas, as well as derive general principles of cellular organization. At the coarsest scale, the entire transcriptional network contains five divisions: two divisions devoted to ATP production/biosynthesis and DNA replication that activate all other divisions, an “extracellular interaction” division that represses all other divisions, and two divisions (proliferation/differentiation and membrane infrastructure) that activate and repress other divisions in specific ways consistent with cell cycle control. Author Summary In a eukaryotic organism such as the mouse, the complete transcriptional network contains ~15,000 genes and up to 225 million regulatory relationships between pairs of genes. Determining all of these relationships is currently intractable using traditional experimental techniques, and, thus, a comprehensive description of the entire mouse transcriptional network is elusive. Alternatively, one can apply the limited amount of experimental data to determine the entire transcriptional network at a less detailed, higher level. This is analogous to considering a map of the world resolved to the kilometer rather than to the millimeter. Here, we derive from mouse microarray data several high-scale transcriptional networks by determining the mutual effective regulatory influences of large modules of genes. In particular, global transcriptional networks containing 12 to 72 modules are derived, and analysis of these multiscale networks reveals properties of the transcriptional network that are universal at all scales (e.g., maintenance of homeostasis) and properties that vary as a function of scale (e.g., the fractions of module pairs that exert mutual regulation). In addition, we describe how cellular functions associated with large modules (those containing many genes) are composed of more specific functions associated with smaller modules. Introduction The importance of modular organization in biology is widely appreciated [1–6] and is manifested in conserved gene modules across species [7–9]. High-throughput data has yielded progress in molecular-level descriptions of interactions of genes, proteins, and metabolites [10–14]; however, understanding an entire cell or its major components from genetic information is a major methodological challenge [15]. Here, we use genome-wide expression Alliance for Cell Signaling (AfCS) data to first empirically obtain modular functions and then empirically obtain the effective inhibitory and activating regulatory influences between these modules at many scales of resolution (see Figure 1
The technique used to infer these regulatory influences relies on the correlation of the expression levels of transcriptional regulators at one time with the expression levels of their regulatory targets a fixed interval of time later. This correlative analysis aggregates direct and indirect causal influences, and co-occurring behaviors. Still, the transition matrix obtained can be used to predict [15] the transcriptional level changes of large cellular modules over fixed time intervals with surprising accuracy (r > 0.95). This approach generates many specific results, each of which is the strength and polarity (activating or inhibiting) effective regulatory influence of one functional module on another. The results are derived directly from experimental data and are statistically validated. This multiscale analysis yields a description of cell behavior in terms of traditional biological concepts (i.e., cellular or physiological systems such as “respiration” and “mitosis”), identifying the genes whose collective behavior they comprise. At all scales, new network models of regulatory interactions among modules encompassing the behavior of the entire cell are presented. Previous studies have considered genome-wide multiscale groupings of genes according to their expression behavior [16–18]. This work extends the paradigm of multiscale gene grouping by determining for the first time, at multiple scales, the network of mutual regulation of groups of genes on groups of genes, an approach that is analogous to having geographic maps of varying resolution. Our analysis of attributes of these novel cellular-level regulatory networks reveals principles of organization, such as scale-dependent homeostatic feedback and target specificity, and asymmetric restrictions on the number of ingoing and outgoing regulatory influences. Knowledge gained that is unique to multiscale analysis includes how functions of smaller modules contribute to the aggregate function of larger modules, which is analogous to how physiological systems are composed of organs, and organs out of tissues. We provide all of the networks and Gene Ontology (GO) data in the supporting information, as well as a discussion of the statistical methods used to identify them. Specific questions about interactions between cellular modules can be addressed with these databases, as well as general insights or quantitative models of cellular response to perturbations. In Text S1, we discuss in detail samples of (1) randomly chosen and (2) particularly intriguing regulatory relationships inferred from our analysis. These examples can be considered at length, as they provide a large number of specific insights into the complex biological functioning of the cell, manifesting the ability of our methodology to extract them, and the informational value of the large AfCS datasets. Given the high complexity of cellular function, it should not be expected that a simple summary of modular interactions would serve as a sufficient description of the large number of results obtained. In the Results section, we focus on a selection of results that demonstrate the variety of interesting results that were found and general principles that have been abstracted from them. Results Regulatory Implications The mutual regulatory influences for networks comprising n = 12, 20, 42, and 72 cellular modules are shown in Figure 2
We calculated the average of the influence outputs (Figure 2 To convey the distribution of all interactions, from strongly activating to strongly inhibiting, we cluster-ordered [23] the rows and columns of the n × n transition matrices at each scale (Figure 2 A goal of developing larger-scale models is to relate genetic function to conceptually accessible models of cellular function. Still, even at the largest scale given above, with 12 modules and 144 potential interactions, it is hard to develop a complete mental picture of the behavior of the cell. We therefore developed an even more accessible, larger-scale summary of cellular function, which can serve as a first guide to the understanding of cell behavior at all finer levels of organization. Inspecting the regulatory effects of the groups reveals that the cell transcription network at n = 12 can be partitioned into five functional divisions: (1) energy and component production, (2) proliferation and differentiation, (3) extracellular interaction, (4) membrane infrastructure, and (5) DNA replication. Groups 0, 1, and 4 comprise the first division. They are all enriched for genes involved in ATP synthesis and the production of nucleic acids and proteins, and are appropriately all global activators of transcription over short time-scales (1.5 h). Each group has unique sub-behaviors, with group 0 involved in endocytosis and group 1 in apoptosis and protein folding. Group 4 is involved in cell-cycle regulation (group 4 is also a global activator over 1 h), providing a connection between energy production and proliferation. Groups 2 and 3 belong to the division contributing to proliferation and differentiation. They are both enriched for genes involved in small-molecule metabolism, and unlike the previous division, they are not global activators of transcription. Instead, they activate their own division over all time-scales and inhibit the DNA replication division (below), particularly after 1 and 3 h. This periodic repression of DNA replication by the proliferation division may provide for the timing of S-phase during the cell cycle. Group 2 shares some functions with the first division, containing genes involved in translation and transcription, but has no role in ATP synthesis. Like group 4, it is involved in regulating the cell cycle. Group 3 has specific sub-functions related to the immune response, cell adhesion, and, more generally, behaviors unique to particular cell types. Groups 5, 8, and 9 are unified by gene content related to interactions between the cell and its external environment (division 3). They are respectively enriched for genes involved in cell matrix/adhesion and endocytosis; oxygen transport and exocytosis; and oxygen transport, chemotaxis, and the immune response. Functionally, all of these genes are global repressors of transcription after 1.5 h, with regulatory influences that are in opposition to the energy and component production division. Group 5 is also a weak early (0.5–1 h) activator of global transcription. Groups 6 and 7 comprise the membrane infrastructure division, with both enriched for genes involved in the physical production and maintenance of the cell membrane, such as lipid metabolism, lipid catabolism, and cholesterol metabolism. These groups are generally self-activating, but do not exert strong global transcriptional influences on other groups over any time interval, consistent with their infrastructural role. Unlike other divisions, these groups are virtually indistinguishable in terms of gene function. Last, groups 10 and 11 form the DNA replication division. They are enriched for nucleotide synthesis, nucleosome assembly, and regulators of DNA methylation. They are both strong global activators of transcription 2 h after they are activated, and while group 11 is at other times a weak global activator of transcription, group 10 is essentially not globally regulatory over other intervals. Group 10 is also uniquely enriched for genes promoting and suppressing apoptosis, and, like groups 8 and 9, has role in oxygen regulation. Behavior across Scales We analyzed trends in the networks to determine which cellular properties hold across scales of observation and which vary with scale (Figure 3
At all scales the distribution of the number of inputs to each module is Gaussian (Figure 3 Scaling of Target Specificity At all scales, the number of modules with only activating outputs is nearly identical to the number of modules with only inhibiting outputs (Figure 3 At the finest scale of n = 72, the targets of a particular module output tended to have similar expression profiles. This was manifested in the frequent adjacency of modules' targets in the SOM array (Figure 4
Temporal Behavior In addition to the 1.5-h influences, we determined the transition matrices for all other time intervals (0.5 h [i.e., the transition between 0.5 h and 1 h], 1 h, 2 h, 3 h, and 3.5 h) in the AfCS data at each scale (available in Table S5). We calculated the average output influence of each module as a function of time (Figure S1); modules with similar regulation under all assayed conditions (such as putative housekeeping modules and constitutively repressed modules) have lower-magnitude outputs than modules that are regulated in a situation-dependent manner. Modules with weak outputs tend to occur around the periphery of the SOM array, while modules with the greatest variance are in the interior. “Peripheral modules” tend to have monotonically increasing or decreasing responses to all perturbations, whereas the interior modules' responses are more complex. Averaging the magnitudes of the mean outputs of each module over all the modules reveals which time intervals mediate the greatest changes in expression. At all scales the average influence magnitude varies periodically with time, with greater frequency at finer scales (i.e., for n = 12, the most potent influences occur over t = 1.5 h, and the least over t = 3 h, whereas for n = 42, those times are 1 h and 2 h, respectively; see Figure S1C and S1D). Moreover, the magnitude of influences decreases at finer scales. This supports the idea that smaller modules bring about smaller transcriptional changes over shorter times [6]. Scaling of Gene-Module Function We next considered the mapping of ontological terms for gene function between scales. To analyze how cellular functions are composed of sub-functions, we identified the mappings by which larger scale modules are composed of finer-scale modules. These mappings are shown in Figure 5
These mappings also show how ontological functions are distributed across modules at various levels as illustrated for four GO categories in Figure 5 Finally, we examined to what extent the distribution of sub-functions across the multiscale SOM groupings can predict relatedness of GO function. Cluster analysis of a GO label's abundance similarity across the n = 12 and n = 20 SOM groupings (Figure 5 Discussion We have determined networks of regulatory influences exerted by large groups of similarly behaving genes on other such large groups across multiple scales of resolution. These effective regulatory influences between modules are composed of direct and indirect causal mechanisms as well as temporally correlated effects that are seen across all 33 perturbations. Given a perturbation of gene expression, all of these components contribute to a prediction of the transcriptional state at later times [15]. Since the effective regulatory interactions accurately predict the transcriptional state, they capture almost all of the biologically relevant causal regulation occurring within that time interval. We determined these effective regulatory interactions between gene modules at different scales of observations comprising between five and 72 components. The gene composition of these modules is not strictly hierarchical; i.e., two genes in the same fine-scale group may not belong to the same large-scale group. This is a natural consequence of imposing discrete classification categories onto systems that need not be hierarchically structured across scales. For example, if one were to classify visible colors into six categories, they might very well comprise red, orange, yellow, green, blue, and indigo. The hues “yellow-orange” and “yellow-green” might reasonably fall into the yellow category of this six-group partitioning. However, if one divides the same colors into three higher-scale categories—red, green, and blue—those two same hues would fall into separate categories (red and green, respectively). Similar considerations apply to physiological and metabolic categories. Therefore, our a priori expectation should be that gene modules would be nonhierarchical across multiple scales, as is observed in results of the SOM partitioning. From these groups and regulatory influences, we derived many results comprising a first multiscale analysis of global gene-regulatory influences. We consistently observed mechanisms consistent with the maintenance of homeostatic equilibrium across the modules, particularly at higher scales. For example, the apparent regulatory dissimilarity of modules with similar expression patterns (Figure 2 A multiscale approach is conceptually essential given the organization of living systems into structures at many scales, and is critical given the staggering challenge of obtaining a complete description of pairwise gene interactions. Still, in view of the complexity of biological function, there is a large amount of information that arises from a multiscale analysis. In this sense, our analysis can be considered as foundational to the development of many other results. It is a high-throughput analysis methodology analogous to high-throughput experimental methods of genome sequencing or gene expression data collection; through our approach, a seemingly overwhelming amount of data is generated by high-throughput consideration of the large number of regulatory interactions of modules across multiple scales. Our analysis of these results has been correspondingly multiscale. First, we identified global principles, such as the many facets of homeostasis and universality of regulatory effects at larger scales. Second, we found new patterns of multiscale organization, such as the dichotomous distributions of the number of regulatory inputs and outputs at various scales, the increased target specificity and speed of regulation at finer scales, and the aggregation of sub-module functions into collective larger-scale functions. Last, we provide detailed discussion of many specific regulatory relationships in Text S1. The diversity of analysis points the way to many new lines of investigation, in particular experimentally testable hypotheses at large scales of cellular organization. Materials and Methods To separate genes into modules [1–9] and determine their mutual regulation, we used the AfCS murine B-lymphocyte perturbation expression data [33] tracking the response of ~15,000 genes to 33 perturbations at four time points (0.5 h, 1 h, 2 h, and 4 h; see Figure 1 The SOM process organizes the modules into a 2-D array according to the relatedness of their average changes in expression [34] such that modules that are adjacent in the SOM array have more similar expression responses across all conditions. Generally, genes that had monotonic responses to many perturbations (i.e., always being activated or repressed) tended to be placed in the corner positions of this array. These groupings were performed using the GEDI software [35]. Varying n allowed us to consider global sets of modules at various scales of description. Low n yields large-scale modules with many genes in each module; higher n yields small-scale modules with fewer genes. A representative profile for each module was used to represent the modules' behaviors, and was determined as the centroid of the expression profiles of all genes composing the module. From the n by 132 = 33 × 4 “module transcriptional profile” datasets, we obtained the effective regulatory interactions as an n × n transition matrix (M), where M × Xt = X t+k, and Xi is the n × 1 transcriptional state at time i. If the matrices were dense, the greatest mathematically solvable n would be the number of perturbations, 33. However, the matrices are sparse, and we used a bootstrapping technique to obtain transition matrices as large as 72 × 72. This was done by randomly choosing 12 out of n modules, solving for their mutual interactions, and repeating this process until each of the n2 interactions was estimated many times in different 12 × 12 sub-matrices. We constructed our regulatory networks out of only those interactions that were statistically reliable across perturbation and transcription contexts, using a signal to noise analysis (Protocol S1). The bootstrapping was performed using custom written C++ code, and the linear systems were solved using Mathematica (Wolfram Research, http://www.wolfram.com). Clustering trees were all generated using the Fitch-Margoliash method as implemented in the Phylip program [36]. Figure S1: Temporal Dependence of Regulation (A) The average output of each gene group (shown in SOM array order) as a function of the time-step for each gene group for the n = 20 scale. Time points range from 0.5 h to 3.5 h. The standard deviation across time steps is indicated by color (see legend). (B) Similar to (A) for the n = 72 scale. (C) The average magnitude of regulation across all gene groups versus time-step (blue). Gray curves show fit by sinusoidal waves. (D) Frequency and magnitude of transcriptional regulation oscillations versus scale. The decrease in magnitude and increase in frequency indicates that regulation is weaker and quicker (higher frequency) at finer scales. (152 KB PDF) Click here for additional data file.(152K, pdf) Protocol S1: Supporting Statistical Analysis (419 KB DOC) Click here for additional data file.(420K, doc) Table S1: Gene Ontology Labels Associated with AfCS Probe IDs and Gene Indices (2.5 MB XLS) Click here for additional data file.(2.5M, xls) Table S2: Assignment of Genes into SOM Groupings of Varying Size (1.5 MB XLS) Click here for additional data file.(1.4M, xls) Table S3: Statistical Estimates of Interactions between Gene Groups (898 KB XLS) Click here for additional data file.(898K, xls) Table S4: Statistical Association of Gene Ontology Labels with Gene Groups (466 KB XLS) Click here for additional data file.(467K, xls) Table S5: Interactions between Gene Groups over Varying Time-Steps (1.1 MB XLS) Click here for additional data file.(1.1M, xls) Text S1: Specific Regulatory Insights Gained from Analysis of Networks of Cellular Regulation (80 KB DOC) Click here for additional data file.(80K, doc) Acknowledgments We thank L. Gustafson and I. Epstein for helpful comments. Abbreviations
Footnotes Author contributions. BLdB, SH, and YB conceived and designed the experiments, performed the experiments, analyzed the data, contributed reagents/materials/analysis tools, and wrote the paper. We would like to acknowledge a National Science Foundation (NSF) Graduate Research Fellowship to BLdB. Funding. The authors received no specific funding for this study. Competing interests. The authors have declared that no competing interests exist. References
|
PubMed related articles
Your browsing activity is empty. Activity recording is turned off. |
||||||||||||||||||||||
Nature. 1999 Dec 2; 402(6761 Suppl):C47-52.
[Nature. 1999]Proc Natl Acad Sci U S A. 1998 Dec 8; 95(25):14863-8.
[Proc Natl Acad Sci U S A. 1998]Genome Biol. 2002 Oct 25; 3(11):research0064.
[Genome Biol. 2002]Nat Rev Genet. 2004 Feb; 5(2):101-13.
[Nat Rev Genet. 2004]Proc Natl Acad Sci U S A. 2003 May 13; 100(10):5944-9.
[Proc Natl Acad Sci U S A. 2003]Proc Natl Acad Sci U S A. 2004 Dec 21; 101(51):17687-92.
[Proc Natl Acad Sci U S A. 2004]Nat Biotechnol. 2004 Jan; 22(1):86-92.
[Nat Biotechnol. 2004]Proc Natl Acad Sci U S A. 2004 Dec 21; 101(51):17687-92.
[Proc Natl Acad Sci U S A. 2004]Horiz Biochem Biophys. 1983; 7():139-53.
[Horiz Biochem Biophys. 1983]Proc Natl Acad Sci U S A. 2004 Feb 3; 101(5):1200-5.
[Proc Natl Acad Sci U S A. 2004]Curr Opin Cell Biol. 1999 Jun; 11(3):336-41.
[Curr Opin Cell Biol. 1999]Science. 1967 Jan 20; 155(760):279-84.
[Science. 1967]Proc Natl Acad Sci U S A. 2004 Dec 21; 101(51):17687-92.
[Proc Natl Acad Sci U S A. 2004]Science. 2001 Dec 14; 294(5550):2364-8.
[Science. 2001]Science. 2004 Feb 6; 303(5659):808-13.
[Science. 2004]Proc Natl Acad Sci U S A. 2001 Apr 10; 98(8):4569-74.
[Proc Natl Acad Sci U S A. 2001]Nature. 2000 Feb 10; 403(6770):623-7.
[Nature. 2000]Genome Biol. 2004; 5(9):R63.
[Genome Biol. 2004]Proc Natl Acad Sci U S A. 2004 Dec 21; 101(51):17687-92.
[Proc Natl Acad Sci U S A. 2004]Science. 1998 May 8; 280(5365):895-8.
[Science. 1998]Blood. 1998 Jul 15; 92(2):348-51; discussion 352.
[Blood. 1998]Blood. 2002 May 1; 99(9):3089-101.
[Blood. 2002]Nature. 1999 Dec 2; 402(6761 Suppl):C47-52.
[Nature. 1999]Genome Biol. 2002 Oct 25; 3(11):research0064.
[Genome Biol. 2002]Nature. 2002 Dec 12; 420(6916):703-6.
[Nature. 2002]Bioinformatics. 2003 Nov 22; 19(17):2321-2.
[Bioinformatics. 2003]