![]() | ![]() |
Formats:
|
||||||||||
Copyright © 2007, American Society of Plant Biologists Network Inference, Analysis, and Modeling in Systems Biology Departments of Physics and Biology Pennsylvania State University University Park, PA 16802 ralbert/at/phys.psu.edu This article has been cited by other articles in PMC.Cells use signaling and regulatory pathways connecting numerous constituents, such as DNA, RNA, proteins, and small molecules, to coordinate multiple functions, allowing them to adapt to changing environments. High-throughput experimental methods enable the measurement of expression levels for thousands of genes and the determination of thousands of protein–protein or protein–DNA interactions. It is increasingly recognized that theoretical methods, such as statistical inference, graph analysis, and dynamic modeling, are needed to make sense of this abundance of information. This perspective argues that theoretical methods and models are most useful if they lead to novel biological predictions and reviews biological predictions arising from three systems biology topics: graph inference (i.e., reconstructing the network of interactions among a set of biological entities), graph analysis (i.e., mining the information content of the network), and dynamic network modeling (i.e., connecting the interaction network to the dynamic behavior of the system). The methods and principles discussed in this perspective are generally applicable, and the examples were selected from plant biology wherever possible. INTRODUCTION To understand the function of a cell or of higher units of biological organization, often it is beneficial to conceptualize them as systems of interacting elements. For such a systems-level description (which represents the main goal of systems biology), one needs to know (1) the identity of the components that constitute the biological system; (2) the dynamic behavior of these components (i.e., how their abundance or activity changes over time in various conditions); and (3) the interactions among these components (Kitano, 2002). Ultimately, this information can be combined into a model that is not only consistent with current knowledge but provides new insights and predictions, such as the behavior of the system in conditions that were previously unexplored. The origins of systems biology can be traced back to systems theory, a line of inquiry based on the assumptions that all phenomena can be viewed as a web of relationships among elements, and all systems can be handled by a common set of methods (von Bertalanffy, 1968; Weinberg, 1975; Bogdanov, 1980; Heinrich and Schuster, 1996; Francois, 1999; Voit, 2000). Early attempts at systems-level understanding of biology suffered from inadequate data on which to base the theories and models; however, the recent advent of high-throughput technologies brought an abundance of data on system elements and interactions, leading to a revival of systems biology. In some cases, the organization of the network of interactions underlying a biological system is straightforward (e.g., a linear chain of interactions), while in other cases a more formal representation, offered by mathematical graph theory (Bollobás, 1979), is required. The simplest possible graph representation reduces the system's elements to graph nodes (also called vertices) and reduces their pairwise relationships to edges (also called links) connecting pairs of nodes (Figure 1
This essay focuses on the biological predictions arising from three related topics of importance in systems biology: graph inference, graph analysis, and dynamic network modeling. Graph inference refers to the problem in which the information on the identity and the state of a system's elements is used to infer interactions or functional relationships among these elements and to construct the interaction graph underlying the system. Graph analysis means the use of graph theory to analyze a known (complete or incomplete) interaction graph and to extract new biological insights and predictions from the results. Dynamic network modeling aims to describe how known interactions among defined elements determine the time course of the state of the elements, and of the whole system, under different conditions. A dynamic model that correctly captures experimentally observed normal behavior allows researchers to track the changes in the system's behavior due to perturbations. These three lines of inquiry are often combined in the literature since they provide three facets of the same objective: to understand, predict, and if possible control (tune toward a desired feature) the dynamic behavior of biological interacting systems. The possible predictions obtained from these methods range from prediction of new interactions (from graph inference and analysis), identification of key components and pathways (from graph analysis and dynamic network modeling), determination of key parameters (from dynamic modeling), and distillation of key features, such as interaction or functional motifs (from all three methods combined). INFERENCE OF INTERACTION NETWORKS FROM EXPRESSION INFORMATION The most prevalent use of graph inference is using gene/protein expression information to predict network structure (i.e., to predict which gene/protein influences which other genes/proteins through transcriptional, posttranscriptional, translational, or posttranslational regulation). A predicted regulatory relationship among two genes can be verified by experimental testing of the interactions and regulatory relationships among the two genes/proteins. Genes with statistically similar (highly correlated) expression profiles in time or across several experimental conditions can be grouped using clustering algorithms (Wen et al., 1998; Tavazoie et al., 1999). Clustering tools such as the Arabidopsis coexpression tool, based on microarray data from the Nottingham Arabidopsis Stock Centre (Craigon et al., 2004), allow users to quantify gene coexpression across selected experiments or the complete data set (Jen et al., 2006). These methods give insight into groups of genes that respond in a similar manner to varying conditions and that might therefore be coregulated (Qian et al., 2001); however, that two nodes belong to the same group does not imply a causal relationship among them. The ability to extract meaning from clustering depends on the user's prior biological understanding of the objects that are organized. Most applications derive biological insight through “guilt by association;” that is, they predict the function of unknown gene products by their association with recognized clusters (Schuldiner et al., 2005; Bjorklund et al., 2006). Data analysis methods, such as principal component analysis and the partial least-squares method, aim to highlight the global patterns in the expression of a large number of genes/proteins by condensing the multivariate data into just two or three composite variables that capture the maximal covariation between all the individual patterns. The partial least-squares method is also able to test a proposed causal relationship by splitting variables into independent variables and dependent variables, simultaneously identifying the principal components of the dependent and independent block and relating them by a linear relationship (Janes and Yaffe, 2006). This method was used to link the level of 19 proteins involved in apoptotic signaling in human colon adenocarcinoma cells to four quantitative measures of apoptosis, leading to the prediction of cell death responses to molecular perturbations and of the roles of key signaling intermediaries (Janes et al., 2005). A study combining principal component analysis with a number of machine learning algorithms applied to a comprehensive Arabidopsis thaliana gene expression data set identified 50 previously unannotated genes that are potentially involved in plant response to abiotic stress (Lan et al., 2007). Preliminary experimental validation of the predicted function of one of these genes was presented by Lan et al. (2007). Bayesian methods aim to find a directed, acyclic (i.e., feedback loopless) graph describing the causal dependency relationships among components of a system and a set of local joint probability distributions that statistically convey these relationships (Friedman et al., 2000). The starting edges are established heuristically based on an initial assessment of the experimental data and are refined by an iterative search-and-score algorithm until the causal network and posterior probability distribution best describing the observed state of each node are found (Yu et al., 2004). Bayesian inference was recently used to infer the signaling network responsible for embryonic stem cell fate responses to external cues based on measurements of 28 signaling protein phosphorylation states across 16 different factorial combinations of stimuli. The inferred network predicted novel influences between ERK phosphorylation and differentiation as well as between RAF phosphorylation and differentiated cell proliferation (Woolf et al., 2005). Model-based methods of regulatory network inference from time-course expression data seek to relate the rate of change in the expression level of a given gene with the levels of other genes. Continuous methods postulate a system of differential equations (Chen et al., 1999), while discrete methods assume a logical (Boolean) relationship (Shmulevich et al., 2002). Experimental data on gene expression levels is substituted into the relational equations, and the ensuing system of equations is then solved for the regulatory relationships between two or more components (Figure 1
Metabolic pathway reconstruction from known reaction stoichiometric information is usually performed by constraint-based deterministic methods, such as flux balance analysis (Reed and Palsson, 2003) or S-systems, power-law approximations of enzyme-catalyzed reactions (Irvine and Savageau, 1990). For example, a constraint-based optimization method allowed identification of changes in an Escherichia coli genome-scale metabolic model that were needed to minimize the discrepancy between model predictions of optimal flux distributions and experimentally measured flux data (Herrgard et al., 2006). Several types of experimental results are best interpreted as indirect causal evidence that indicates the involvement of a protein or molecule in a certain process or pathway. Differential responses to a stimulus in wild-type organisms versus an organism where the respective protein's expression or activity is disrupted is an example of such indirect causal evidence connecting the stimulus, protein, and response. These observations can be represented by two intersecting paths (successions of adjacent edges; see below) in the underlying interaction network: one connecting stimulus to response and the other connecting the protein to response. Graph-based inference algorithms integrate indirect causal relationships and direct interactions to find the most parsimonious network consistent with all available experimental observations (Li et al., 2006; Albert et al., 2007b). This method was used to reconstruct the signal transduction network corresponding to stomatal closure in plants in response to the stress hormone abscisic acid (ABA; Li et al., 2006) and is implemented in the software NET-SYNTHESIS (Albert et al., 2007a). NETWORK ANALYSIS Depending on the types of interaction or regulatory relationships incorporated as edges of the biological interaction graph, several distinct network types have been defined. In protein interaction graphs, the nodes are proteins, and two proteins are connected by a nondirected edge if there is strong evidence of their association. The full representation of transcriptional regulatory maps associates two separate node classes with transcription factors and mRNAs, respectively, and has two types of directed edge, which correspond to transcriptional regulation (which can be positive or negative) and translation (Lee et al., 2002). Metabolic networks have been represented in various degrees of detail, two of the simplest being the substrate graph, whose nodes are reactants and whose edges mean co-occurrence in the same chemical reaction, and the reaction graph, whose nodes are reactions and whose edges mean sharing at least one metabolite (Wagner and Fell, 2001). Signal transduction networks involve both protein interactions and biochemical reactions, and their edges are mostly directed, indicating the direction of signal propagation. Finally, composite networks superimpose protein–protein and protein–DNA interactions (Yeger-Lotem et al., 2004), protein–protein interactions, genetic interaction, transcriptional regulation, sequence homology, and expression correlation (Zhang et al., 2005) or metabolic reactions and transcriptional regulation of metabolic genes (Herrgard et al., 2006). The development of high-throughput interaction assays (e.g., yeast two-hybrid, split ubiquitin, and chromatin immunoprecipitation assays) and of curated databases has led to the generation of large-scale interaction networks for a considerable number of organisms. In plant biology, the first large-scale Arabidopsis interactome (protein interaction network) was recently predicted from the knowledge of interacting Arabidopsis protein orthologs in Saccharomyces cerevisiae, Caenorhabditis elegans, Drosophila melanogaster, and Homo sapiens (Geisler-Lee et al., 2007). As illustrated in this section, graph analysis of the currently available (sub)cellular networks reveals a significant degree of consensus among their organizational features, as well as a few notable differences. Note, however, that there is a considerable level of variation in the number of networks available for each interaction type, in the coverage of these networks, and in the confidence of the interactions included in the network; thus, the predictions arising from network analysis may need updating as more information becomes available. The organizational features of interaction graphs can be quantified by network measures whose information content ranges from local (e.g., properties of single nodes or edges) to network-wide (e.g., whether all nodes are connected). These two seemingly disparate scales are intimately linked in networks, as global connectivity is realized by a succession of adjacent edges. Thus, as we will see later, sometimes a surprisingly small number of linked events can lead to wide consequences. The most often-used network measures describe the connectivity (reachability) among nodes, the importance (centrality) of individual nodes, and the homogeneity or heterogeneity of the network in terms of a given node property (Figure 1 A path (sequence of adjacent edges) (Bollobás, 1979) signifies a transformation route from a nutrient to an end product in a metabolic network or a chain of ligand-induced reactions in a signal transduction network. The distance (path length) between any two nodes in a network is defined to be the number of edges in the shortest path connecting those nodes. If the edges of a network are weighted (e.g., with rate constants), then the distance between two nodes will be the sum of edge weights along the path for which this sum is a minimum (Dijkstra, 1959). The average path length of several large cellular networks, including metabolic networks (Jeong et al., 2000; Wagner and Fell, 2001), transcriptional networks (Lee et al., 2002), protein interaction networks (Giot et al., 2003; Yook et al., 2004), and signal transduction networks (Ma'ayan et al., 2005) is less than four. This result predicts that these networks are capable of rapid response to inputs or perturbations. Cellular networks also tend to exhibit path redundancy and the availability of multiple paths between a pair of nodes (Papin and Palsson, 2004; Li et al., 2006). This network feature reflects cellular networks' capacity to employ multiple channels between the same input and output and predicts that these networks will be able to efficiently compensate for perturbations in the preferred pathway. In many networks, only a fraction of the nodes in the network will be accessible (connected) to any given node. The subset of nodes connected by paths in both forward and reverse directions form the so-called strongly connected cluster. One can also define the in-cluster (nodes that can reach the strongly connected cluster but that cannot be reached from it) and out-cluster (the converse). Nodes of each of these subsets tend to have a shared task; for example, in signal transduction networks, the nodes of the in-cluster tend to be involved in ligand-receptor binding; the nodes of the strongly connected cluster form a central signaling subnetwork; and the nodes of the out-cluster are responsible for the transcription of target genes and for phenotypic changes (Ma'ayan et al., 2005). All protein interaction networks mapped so far, including the predicted Arabidopsis interactome, have a strongly connected cluster connecting the vast majority of the proteins (Giot et al., 2003; Yook et al., 2004; Geisler-Lee et al., 2007). This finding predicts a capacity for pleiotropy, since perturbations of a single gene or protein can propagate through the network and can have seemingly unrelated or broad effects. By contrast, the currently available maps of transcriptional networks do not have significant strongly connected components, suggesting a unidirectional regulation mode with relatively little transcriptional crosstalk (Balázsi et al., 2005). The currently available metabolic and signal transduction networks are more connected, with 50 to 60% of the nodes forming the largest strongly connected component (Ma and Zeng, 2003; Ma'ayan et al., 2005). This intriguing range of interconnectivity from relatively unidirectional transcriptional regulatory maps to strongly connected protein interaction maps is affected by several factors. First, the fact that protein interactions are represented by nondirected edges is due to the constraints of current experimental assays; as new information on the source and target of protein interactions leads to assigning directions to some of the edges, the size of the strongly connected component may decrease. Second, some transcriptional regulatory networks are less well mapped than protein interaction networks, and new additions to these networks may increase their connectivity. Third, as transcription factors are often regulated posttranslationally, an integrated transcriptional/(post)translational regulatory network would be a more appropriate representation and may have more connectivity and feedback than a map focused on transcriptional regulation alone. It will be interesting to follow whether new experimental evidence and novel network representations decrease the range of connectivity among molecular interaction networks. In addition to the clusters characterizing the global (whole network level) connectivity of cellular networks, one also can identify recurring interaction motifs, which are small subgraphs (i.e., subsets of the full graph) that have well-defined topologies. Interaction motifs, such as autoregulation (usually a negative feedback; Figure 1 The number, directionality, and strength of connections associated with a given node can be synthesized into measures of that node's centrality (importance). The simplest such measure is the node degree, or the number of edges adjacent to that node. If the directionality of interaction is important, a node's total degree can be broken into an in-degree and out-degree, quantifying the number of incoming and outgoing edges adjacent to the node (Figure 1 While the node degree or betweenness centrality of a specific node is a local topological measure, this local information can be synthesized into a global description of the network by reporting the degree distribution P(k), which gives the fraction of nodes in the network having degree k. A significant number of cellular interaction networks, including protein interaction networks (Jeong et al., 2001; Giot et al., 2003; Yook et al., 2004; Geisler-Lee et al., 2007), metabolic networks (Jeong et al., 2000; Wagner and Fell, 2001; Arita, 2004; Tanaka, 2005), signal transduction networks (Ma'ayan et al., 2005), and transcriptional regulatory networks (Guelzim et al., 2002; Lee et al., 2002), exhibit a high heterogeneity (diversity) for node centralities that precludes the existence of a typical node that could be used to characterize the rest of the nodes in the network. Networks with this high heterogeneity are often referred to as scale free (reviewed in Albert and Barabási, 2002; Barabási and Oltvai, 2004). The degree distribution of scale-free networks is generally close to a power law P(k) = Ak−γ, where A is a normalization constant and the degree exponent γ is between 2 and 3. Exceptions from the heterogeneity associated with power-law distributions are also notable: the in-degree distribution of transcriptional networks and the degree distribution of enzymes have a small range, reflecting that combinatorial regulation by several transcription factors is less frequent than regulation of several targets by the same transcription factor and that enzymes catalyzing several different reactions are rare. In scale-free networks, small-degree nodes are most common; however, the highest-degree nodes have degrees that are orders of magnitude higher than the average degree. Such highest-degree (or in general highest-centrality) nodes are commonly referred to as hubs. This heterogeneous structure leads to the prediction that in scale-free networks random node disruptions do not cause a major loss of connectivity, whereas the loss of the hubs causes the breakdown of the network into isolated clusters (Albert and Barabási, 2002). This point has been experimentally corroborated in S. cerevisiae, where the severity of a gene knockout has been shown to correlate with the number of interactions in which the gene's products participate (Jeong et al., 2001; Said et al., 2004). High degree is a practical but nevertheless insufficient predictor of functional importance, as there are several examples of low-degree nodes that are critical for certain outcomes (Holme et al., 2003; Almaas et al., 2005; Mahadevan and Palsson, 2005; Li et al., 2006). Ultimately, a high-precision prediction of functionally important nodes will need to take into account the biological identity of the nodes and the synergistic and dynamic aspects of the interactions and will therefore require significantly more input information than what it currently available for most interaction networks. Given the state of the knowledge on these networks, a suitable combination of node degree with betweenness centrality, and possibly other centrality measures, will offer the optimal trade-off between predictive power and practicality. The graph measures described above, alone or combined with additional information regarding the network nodes (such as the functional annotation of the corresponding genes/proteins), provide testable biological predictions on several scales, from single interactions to functional modules. The functions of unannotated proteins can be inferred on the basis of the annotation of their interacting partners, as it was done for S. cerevisiae and Arabidopsis proteins using interaction, coexpression, and localization data (Vazquez et al., 2003; Lee et al., 2004; Geisler-Lee et al., 2007). New protein interactions can be predicted using machine learning algorithms based on the presence of abundant interaction motifs within the network (Albert and Albert, 2004). New protein functions and interactions can be inferred through global alignment between protein interaction networks in different species (Kelley et al., 2004). Conversely, protein interaction networks of two species can be used to augment sequence-based homology searches as a basis for orthology prediction; in a recent analysis of D. melanogaster and S. cerevisiae, in 61 out of 121 cases with ambiguous homology assignment, the network supported a different orthologous protein pair than that favored by sequence comparisons (Bandyopadhyay et al., 2006). The connected subgraphs of a probabilistic S. cerevisiae gene–gene linkage network have been used to identify highly connected gene clusters (modules). The demonstrably coherent functional annotation of genes within each cluster allowed the annotation of unknown proteins that are part of the cluster (Lee et al., 2004). Finally, construction of an integrated transcriptional and metabolic network allowed global predictions of growth phenotypes and qualitative gene expression changes in E. coli (Covert et al., 2004) and yeast (Herrgard et al., 2006). DYNAMIC MODELING The nodes of cellular interaction networks represent populations of proteins or other molecules. The abundances of these populations can range from a few copies of an mRNA, protein, or metabolite to hundreds or thousands of molecules per cell, and they vary in time and in response to external or internal stimuli. To capture these changes, the interaction network needs to be augmented by quantitative variables indicating the state (i.e., expression, concentration, or activity) of each node and by a set of equations indicating how the state of each node changes in response to changes in the state of its regulators. In other words, the interaction network needs to be developed into a dynamic network model. Dynamic network models have as input the interaction network, the transfer functions describing how the state of each node depends on the state of its regulators, and the initial state of each node in the system. Examples of transfer functions include mass action kinetics for chemical reactions or Hill functions for regulatory relationships and include several kinetic parameters whose values need to be known or estimated. If the model refers to spatio-temporal phenomena, such as those based on cell-to-cell communication, the node states and transfer functions will depend on spatial coordinates (Mjolsness et al., 1991; Palsson and Othmer, 2000). Given the interaction network, transfer functions, and initial states, the model will output the time evolution of the state of the system. The most basic qualitative feature of a dynamic system is the number and type of different behaviors, often called attractors, that are found in the infinite time limit. All initial conditions that evolve to a given attractor constitute its basin of attraction. The attractors of gene regulatory networks are thought to correspond to distinct cellular states (Kauffman, 1993) or cycles (e.g., circadian rhythms; Goldbeter, 2002), while the attractors of a signal transduction network correspond to steady state (time-independent) or sustained oscillatory response(s) to the presence of a given signal (Tyson et al., 2003). A validated dynamic model that correctly captures experimentally observed normal behavior allows researchers to track the changes in the system's behavior due to perturbations, to discover possible covariation between coupled variables, and to identify conditions in which the dynamics of variables are qualitatively similar. It is easier to use a model to search for perturbations that have a significant or beneficial effect on system behavior than it is to perform comparable experiments on the living system; for example, models can predict multiple small perturbations that produce large effects when combined. While the benefits of using verified models are obvious, the information and data requirements necessary to construct a verifiable dynamic model are daunting for all but the smallest systems. Additionally, modelers need to balance a set of features that are nonexclusive but nevertheless cannot be maximized simultaneously. Ideally, a good model should have a low level of uncertainty in the interactions, equations, and parameters used; it should be relatively easy to run or construct; it should provide a high level of understanding or insight; it should be simple and elegant; its predictions should be highly accurate; it should be general (be applicable to a large number of systems); and it should be robust (insensitive to small changes in parameters or assumptions) (Haefner, 2005). Dynamic modeling frameworks are usually classified along two axes: continuous versus discrete and deterministic versus stochastic. The first classification refers to the level of detail in the representation of the node state, while the second indicates whether the transfer functions incorporate any uncertainty or variability. Since variability and noise are pervasive in biological systems, a continuous stochastic model has the highest potential to accurately describe the system; however, it also has the highest requirement for input information. A continuous deterministic model, the most frequently used middle ground, represents the limit of the corresponding continuous stochastic model as the number of molecules becomes large or the noise decreases to zero. Only continuous deterministic models readily allow theoretical methods such as bifurcation analysis (Goldbeter, 2002; Tyson et al., 2003), that is, the analysis of where the system's dynamics changes as a function of various parameters. The conclusions of these analyses can then contribute to the selection of the best-suited high-level stochastic models. Discrete deterministic models exhibit a high level of abstraction in that they classify node states into just a few categories of expression or activity. On the plus side, this means that they require relatively little detailed input and can be constructed in cases where the large number of unknowns makes continuous models impractical or even impossible. On the minus side, the predictions of these models are more coarse grained and less quantitative than the predictions of continuous models. Continuous deterministic models characterize node states by concentrations and describe the rate of production or decay of all components by differential equations based on mass action–like kinetics (Figure 1 Continuous deterministic models of simple regulatory or signaling networks can also be coupled with descriptions of cell growth and mechanics to explain spatio-temporal pattern formation in cell colonies or tissues. For example, a recent model of plant organ positioning driven by auxin patterning predicts that the underlying mechanism is a feedback loop between relative auxin concentrations in adjacent cells and auxin efflux direction. It is proposed that this feedback is realized through the putative auxin efflux mediator PIN1 whose cycling between internal and membrane compartments is auxin regulated in such a way that a higher auxin concentration in a neighboring cell leads to an increased PIN1 localization at the membrane toward that cell, resulting in a higher auxin transport into that cell (Jönsson et al., 2006). The stochasticity (nondeterminism) of biological processes is usually taken into account by appending stochastic (noise) terms to differential equations. Discrete events (such as the initiation of transcription) and low abundances for certain molecules can be incorporated by characterizing the node states by the copy number of each molecule and describing the time evolution of the probabilities of each of a system's possible states (Rao et al., 2002; Andrews and Arkin, 2006). A recent model of the ethylene signaling pathway and its gene response in Arabidopsis combines chemical kinetics for signaling proteins with a probabilistic description of the target genes' states (Diaz and Alvarez-Buylla, 2006). This model reproduces the experimentally observed differential responses to different ethylene concentrations and predicts that the pathway filters rapid stochastic fluctuations in ethylene availability. Discrete deterministic models usually characterize network nodes by two binary states corresponding to, for example, an expressed or not expressed gene, an open or closed ion channel, or above-threshold or below-threshold concentration of a molecule. The change in state of each regulated node is generally described by a logical function using the Boolean operators “and,” “or,” and “not” (Figure 1
Hybrid dynamic models meld a Boolean description of combinatorial regulation with continuous synthesis and decay by describing each node with both a continuous variable (akin to a concentration) and a Boolean variable (akin to activity) (Glass and Kauffman, 1973; Chaves et al., 2006). For example, a hybrid model of the transcriptional regulation of the Endo16 sea urchin gene revealed that its spatial control during embryonic development is mediated by a cis-regulatory switch (Yuh et al., 2001), and a hybrid model of D. melanogaster embryonic segmentation predicts that transient disregulation of posttranslational modifications can have effects as severe as gene knockouts (Chaves et al., 2006). While the details of different dynamic models can be significantly different, and the predictions offered by them are specific to the systems they refer to, there is a considerable level of common insight arising from these models. For example, there is increasing evidence that molecular networks are constructed from simpler modules with generic input-output properties not unlike those of electric circuits (Alon, 2006). Some of these modules exhibit perfect adaptation to a signal (i.e., they exhibit a transient response to changes in signal strength, but their steady state response is independent of the signal strength), switch abruptly and irreversibly from low to high response at a critical (bifurcation) value of the signal, or show sustained oscillations in the response variable (Tyson et al., 2003). Frequently, these dynamic behaviors do not depend on the details of the transfer functions or on the kinetic parameters and are determined by the underlying network; for example, positive feedback loops (either mutual activation or antagonism) may create a discontinuous switch, negative feedback often leads to homeostasis, and sustained oscillations require a negative feedback loop with a time delay. The pursuit of general insight from integrating the lessons learned from specific models is an emergent and rapidly developing topic in systems biology. CONCLUSIONS Systems biology develops through an ongoing dialog and feedback among experimental, computational, and theoretical approaches. High-throughput experiments reveal, or allow the inference of, the edges of global interaction networks. Graph-theoretical analysis of these networks enables insight into the organization of cellular regulation, feeds back to network inference (Albert and Albert, 2004; Gupta et al., 2006; Horvath et al., 2006; Christensen et al., 2007), and allows specific biological predictions. Dynamic modeling of systems with specified inputs and outputs allows the identification of key regulatory components or parameters. Experimental testing of model predictions enables the validation or refinement of the model, which in turn paves the way to more predictions and ultimately the generation of new biological knowledge. Network analysis and dynamic network modeling represent complementary approaches most appropriate for different network scales. Network analysis can be readily performed on networks with tens of thousands of nodes and edges; however, it cannot explicitly incorporate the temporal and quantitative aspects of the processes corresponding to the edges of the network. Detailed deterministic or stochastic models allow for high-fidelity dynamic analysis of small networks but increase dramatically in complexity even for small increments in the number of nodes and edges and thus can hardly be used meaningfully on large-scale networks. A potential middle ground is emerging through the development of qualitative modeling techniques that map the propagation of context-dependent signals through a network (Ma'ayan et al., 2005; Prill et al., 2005; Li et al., 2006). This perspective essay has shown a small sample of network-based modeling in systems biology; the interested reader is referred to excellent review articles and books written from different perspectives (Goldbeter, 2002; Tyson et al., 2003; Barabási and Oltvai, 2004; Ma'ayan et al., 2004; Haefner, 2005; Alon, 2006; Palsson, 2006). To date, much of systems biology research has focused on single-celled organisms, which de facto precludes assessment of endogenous cell–cell signaling. As the scope of inquiry expands from cells to organs and organisms as systems, plants provide unique opportunities to study organism-level responses to environmental challenges. Indeed, while animals tend to rely on behavioral adjustments to evade environmental stress, plants are more likely to emphasize stress resistance and recovery mechanisms. Moreover, the modular structure of plants causes relatively weak coupling between different parts of the same plant as well as significant differences in these parts' microenvironments. Given also that plant signal transduction mechanisms are at least as developed as those of animals and use many conserved components (e.g., heterotrimeric G-proteins and cytosolic Ca2+), plants can eminently serve as model systems and will undoubtedly gain in importance as the field of systems biology matures. Most of the literature on systems biology shares the view that in order for the research community to develop the sophisticated interplay of theory, computation, and experiment that will be needed to understand and manipulate cellular regulatory systems, we will first need to learn to communicate effectively. I hope the examples shown in this perspective will facilitate new and fruitful dialogs. Acknowledgments Research on plant systems biology in the author's laboratory is supported by National Science Foundation Grants MCB-0618402 and CCF-0643529 as well as USDA Grant NRI 2006-02158. References
|
PubMed related articles
Your browsing activity is empty. Activity recording is turned off. |
|||||||||
Science. 2002 Mar 1; 295(5560):1662-4.
[Science. 2002]Proc Natl Acad Sci U S A. 1998 Jan 6; 95(1):334-9.
[Proc Natl Acad Sci U S A. 1998]Nat Genet. 1999 Jul; 22(3):281-5.
[Nat Genet. 1999]Nucleic Acids Res. 2004 Jan 1; 32(Database issue):D575-7.
[Nucleic Acids Res. 2004]Plant J. 2006 Apr; 46(2):336-48.
[Plant J. 2006]J Mol Biol. 2001 Dec 14; 314(5):1053-66.
[J Mol Biol. 2001]Nat Rev Mol Cell Biol. 2006 Nov; 7(11):820-8.
[Nat Rev Mol Cell Biol. 2006]Science. 2005 Dec 9; 310(5754):1646-53.
[Science. 2005]BMC Bioinformatics. 2007 Sep 21; 8():358.
[BMC Bioinformatics. 2007]J Comput Biol. 2000; 7(3-4):601-20.
[J Comput Biol. 2000]Bioinformatics. 2004 Dec 12; 20(18):3594-603.
[Bioinformatics. 2004]Bioinformatics. 2005 Mar; 21(6):741-53.
[Bioinformatics. 2005]Bioinformatics. 2002 Feb; 18(2):261-74.
[Bioinformatics. 2002]BMC Bioinformatics. 2005 Mar 7; 6():44.
[BMC Bioinformatics. 2005]BMC Bioinformatics. 2005 Mar 7; 6():44.
[BMC Bioinformatics. 2005]BMC Bioinformatics. 2005 Mar 7; 6():44.
[BMC Bioinformatics. 2005]J Bacteriol. 2003 May; 185(9):2692-9.
[J Bacteriol. 2003]Genome Res. 2006 May; 16(5):627-35.
[Genome Res. 2006]PLoS Biol. 2006 Oct; 4(10):e312.
[PLoS Biol. 2006]J Comput Biol. 2007 Sep; 14(7):927-49.
[J Comput Biol. 2007]J Comput Biol. 2007 Sep; 14(7):927-49.
[J Comput Biol. 2007]Science. 2002 Oct 25; 298(5594):799-804.
[Science. 2002]Proc Natl Acad Sci U S A. 2004 Apr 20; 101(16):5934-9.
[Proc Natl Acad Sci U S A. 2004]J Biol. 2005; 4(2):6.
[J Biol. 2005]Genome Res. 2006 May; 16(5):627-35.
[Genome Res. 2006]Plant Physiol. 2007 Oct; 145(2):317-29.
[Plant Physiol. 2007]Nature. 2000 Oct 5; 407(6804):651-4.
[Nature. 2000]Science. 2002 Oct 25; 298(5594):799-804.
[Science. 2002]Science. 2003 Dec 5; 302(5651):1727-36.
[Science. 2003]Proteomics. 2004 Apr; 4(4):928-42.
[Proteomics. 2004]Science. 2005 Aug 12; 309(5737):1078-83.
[Science. 2005]Science. 2005 Aug 12; 309(5737):1078-83.
[Science. 2005]Science. 2003 Dec 5; 302(5651):1727-36.
[Science. 2003]Proteomics. 2004 Apr; 4(4):928-42.
[Proteomics. 2004]Plant Physiol. 2007 Oct; 145(2):317-29.
[Plant Physiol. 2007]Proc Natl Acad Sci U S A. 2005 May 31; 102(22):7841-6.
[Proc Natl Acad Sci U S A. 2005]Bioinformatics. 2003 Jul 22; 19(11):1423-30.
[Bioinformatics. 2003]Nat Genet. 2002 May; 31(1):64-8.
[Nat Genet. 2002]Proc Natl Acad Sci U S A. 2005 May 31; 102(22):7841-6.
[Proc Natl Acad Sci U S A. 2005]Science. 2003 Dec 5; 302(5651):1727-36.
[Science. 2003]Nat Genet. 2003 Oct; 35(2):176-9.
[Nat Genet. 2003]Science. 2005 Aug 12; 309(5737):1078-83.
[Science. 2005]Nature. 2001 May 3; 411(6833):41-2.
[Nature. 2001]Science. 2003 Dec 5; 302(5651):1727-36.
[Science. 2003]Proteomics. 2004 Apr; 4(4):928-42.
[Proteomics. 2004]Plant Physiol. 2007 Oct; 145(2):317-29.
[Plant Physiol. 2007]Nature. 2000 Oct 5; 407(6804):651-4.
[Nature. 2000]Nature. 2001 May 3; 411(6833):41-2.
[Nature. 2001]Proc Natl Acad Sci U S A. 2004 Dec 28; 101(52):18006-11.
[Proc Natl Acad Sci U S A. 2004]Bioinformatics. 2003 Mar 1; 19(4):532-8.
[Bioinformatics. 2003]PLoS Comput Biol. 2005 Dec; 1(7):e68.
[PLoS Comput Biol. 2005]Biophys J. 2005 Jan; 88(1):L07-9.
[Biophys J. 2005]Nat Biotechnol. 2003 Jun; 21(6):697-700.
[Nat Biotechnol. 2003]Science. 2004 Nov 26; 306(5701):1555-8.
[Science. 2004]Plant Physiol. 2007 Oct; 145(2):317-29.
[Plant Physiol. 2007]Bioinformatics. 2004 Dec 12; 20(18):3346-52.
[Bioinformatics. 2004]Nucleic Acids Res. 2004 Jul 1; 32(Web Server issue):W83-8.
[Nucleic Acids Res. 2004]J Theor Biol. 1991 Oct 21; 152(4):429-53.
[J Theor Biol. 1991]Proc Natl Acad Sci U S A. 2000 Sep 12; 97(19):10448-53.
[Proc Natl Acad Sci U S A. 2000]Nature. 2002 Nov 14; 420(6912):238-45.
[Nature. 2002]Curr Opin Cell Biol. 2003 Apr; 15(2):221-31.
[Curr Opin Cell Biol. 2003]Nature. 2002 Nov 14; 420(6912):238-45.
[Nature. 2002]Curr Opin Cell Biol. 2003 Apr; 15(2):221-31.
[Curr Opin Cell Biol. 2003]Biotechnol Prog. 2004 Mar-Apr; 20(2):426-42.
[Biotechnol Prog. 2004]J Theor Biol. 2005 Jun 7; 234(3):383-93.
[J Theor Biol. 2005]Mol Syst Biol. 2006; 2():59.
[Mol Syst Biol. 2006]Mol Syst Biol. 2006; 2():58.
[Mol Syst Biol. 2006]Proc Natl Acad Sci U S A. 2006 Jan 31; 103(5):1633-8.
[Proc Natl Acad Sci U S A. 2006]Nature. 2002 Nov 14; 420(6912):231-7.
[Nature. 2002]Curr Biol. 2006 Jul 25; 16(14):R523-7.
[Curr Biol. 2006]Chaos. 2006 Jun; 16(2):023112.
[Chaos. 2006]J Theor Biol. 1998 Jul 27; 193(2):307-19.
[J Theor Biol. 1998]Plant Cell. 2004 Nov; 16(11):2923-39.
[Plant Cell. 2004]Nature. 1991 Sep 5; 353(6339):31-7.
[Nature. 1991]PLoS Biol. 2006 Oct; 4(10):e312.
[PLoS Biol. 2006]PLoS Biol. 2006 Oct; 4(10):e312.
[PLoS Biol. 2006]PLoS Biol. 2006 Oct; 4(10):e312.
[PLoS Biol. 2006]J Theor Biol. 1973 Apr; 39(1):103-29.
[J Theor Biol. 1973]Development. 2001 Mar; 128(5):617-29.
[Development. 2001]Curr Opin Cell Biol. 2003 Apr; 15(2):221-31.
[Curr Opin Cell Biol. 2003]Bioinformatics. 2004 Dec 12; 20(18):3346-52.
[Bioinformatics. 2004]Bioinformatics. 2006 Jan 15; 22(2):209-14.
[Bioinformatics. 2006]Proc Natl Acad Sci U S A. 2006 Nov 14; 103(46):17402-7.
[Proc Natl Acad Sci U S A. 2006]Science. 2005 Aug 12; 309(5737):1078-83.
[Science. 2005]PLoS Biol. 2005 Nov; 3(11):e343.
[PLoS Biol. 2005]PLoS Biol. 2006 Oct; 4(10):e312.
[PLoS Biol. 2006]Nature. 2002 Nov 14; 420(6912):238-45.
[Nature. 2002]Curr Opin Cell Biol. 2003 Apr; 15(2):221-31.
[Curr Opin Cell Biol. 2003]Nat Rev Genet. 2004 Feb; 5(2):101-13.
[Nat Rev Genet. 2004]