![]() | ![]() |
Formats:
|
||||||||||||||||||||||||||||||||||
Copyright © 2006, The American Society for Cell Biology Molecular Interaction Maps of Bioregulatory Networks: A General Rubric for Systems Biology Laboratory of Molecular Pharmacology, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Bethesda, MD 20892 Gerard Evan, Monitoring Editor Address correspondence to: Kurt W. Kohn (kohnk/at/dc37a.nci.nih.gov). Received September 1, 2005; Accepted October 21, 2005. This article has been cited by other articles in PMC.Abstract A standard for bioregulatory network diagrams is urgently needed in the same way that circuit diagrams are needed in electronics. Several graphical notations have been proposed, but none has become standard. We have prepared many detailed bioregulatory network diagrams using the molecular interaction map (MIM) notation, and we now feel confident that it is suitable as a standard. Here, we describe the MIM notation formally and discuss its merits relative to alternative proposals. We show by simple examples how to denote all of the molecular interactions commonly found in bioregulatory networks. There are two forms of MIM diagrams. “Heuristic” MIMs present the repertoire of interactions possible for molecules that are colocalized in time and place. “Explicit” MIMs define particular models (derived from heuristic MIMs) for computer simulation. We show also how pathways or processes can be highlighted on a canonical heuristic MIM. Drawing a MIM diagram, adhering to the rules of notation, imposes a logical discipline that sharpens one's understanding of the structure and function of a network. INTRODUCTION In recent years, we have been inundated with ever more detailed and comprehensive information on the molecular interactions that govern cell behavior, such as cell division, differentiation, and death. It will be a major task in coming years to organize this information in an accurate, complete, and comprehendible manner (Kohn, 1999 ; Pirson et al., 2000 ; Strogatz, 2001 ). To this end, there is an urgent need for a generally accepted graphical notation for diagrams of bioregulatory networks that could be used in a manner akin to electronic circuit diagrams. As noted by Ideker et al. (2001a ), diagrams can be “a tremendous aid in thinking clearly about a model, in predicting possible experimental outcomes, and in conveying the model to others.” Kurata et al. (2003 ) point out that “without consistent and unambiguous rules for representation, not only is information lost but misinformation could be disseminated.” The diagrams commonly used to show bioregulatory schemes, however, are often incomplete and ambiguous (Pirson et al., 2000 ). Bioregulatory networks are much more difficult to diagram than classical metabolic pathways, because of the large role played by multimolecular complexes, protein modifications, and multidomain proteins (Kohn, 1999 , 2001 ). To address this problem, a graphical notation for molecular interaction maps (MIMs) was devised (Kohn, 1999 ; Aladjem et al., 2004 ) and was used to create maps of several bioregulatory networks (Kohn, 1999 , 2001 ; Kohn and Bohr, 2002 ; Kohn et al., 2003 , 2004 ; Aladjem et al., 2004 ; Pommier et al., 2004 ; Kohn and Pommier, 2005 ; http://discover.nci.nih.gov/mim/). Here, we describe the notation in full detail with examples, and we review the published literature relating to this and other proposed notations.A unique aspect of the MIM notation is that it can show all of the known interactions and allow the unknown contingencies (effects of one interaction on another) to be left unspecified until those details become available. In this sense, MIM diagrams are “heuristic.” A heuristic MIM therefore may not provide all the information required for computer simulation. Particular models for computer simulation can, however, be extracted from heuristic MIMs and formulated in “explicit” diagrams using a subset of the MIM symbols (Kohn, 2001 ). Heuristic MIMs are “canonical” in that they are not restricted to a particular cell type or cell state, and they do not indicate a particular sequence of events. Rather, they show the interactions that can occur if the relevant molecules are present in the same place at the same time. A diagram specific to a particular cell type or cell state can be derived from a canonical map by deleting the molecules that are not expressed as well as the interactions that do not occur because of lack of colocalization. A particular pathway or sequence of events can be depicted on a canonical map by numbering and/or highlighting the relevant interactions, as we describe and illustrate here.Even when a network is depicted in a clear diagram, understanding how it functions may require computer simulation of plausible models (Ideker et al., 2001b ). Paraphrasing E. O. Wilson (quoted by Strogatz, 2001 ), “the greatest challenge today in cell biology is the accurate and complete description of complex systems. The next task is to assemble mathematical models that capture the key system properties.” The MIM notation can be used both to describe what is known about a system and to define explicit models for computer simulation (Kohn, 1998 , 2001 ; Kohn et al., 2004 ; http://discover.nci.nih.gov/mim/).General Principles and Rules of the MIM Notation
THE MIM NOTATION Molecular Species MIM diagrams have two kinds of molecular species: “elementary” and “complex” (Figure 2 Elementary protein species are associated with a cartouche (a rectangle with rounded corners) and are named. The name may be inside the cartouche, as in Figure 2a DNA elements, such as promoters, are represented by a box. The name of the element or promoter can be inside the box, as in Figure 2c Complex species are indicated by filled circles (“nodes”) placed on an interaction line. A node represents the molecular species that is produced as a consequence of the interaction. For example, binding interactions produce multimers (such as nodes x and y in Figure 2d To indicate a homodimer, we use the isolated node convention (Figure 2f Noncovalent Binding Noncovalent (reversible) binding between molecular species is denoted by a line with barbed arrowheads at both ends (Figure 2d ) or http://discover.nci.nih.gov.Figure 2d Mutual exclusion due to competition for the same site is shown using a branched binding line (Figure 3c Regulatory proteins often are composed of domains that can function independently. The interaction details of individual domains can be shown as illustrated in Figure 3d Binding between domains within the same molecule is represented as shown in Figure 3e Contingencies of binding Figure 1 Figure 4
Figure 4, d–f Covalent Modifications and Their Contingencies Covalent modification (phosphorylation, acetylation, myristoylation, ubiquitination, and so on) is represented by a line with a barbed arrowhead at one end pointing to the modification site (Figure 1b
Figure 5, c–f Occasionally, the same site can be modified in different ways. For example, a given lysine in a protein might be either acetylated or ubiquitinated, as has been reported for a lysine in p53 (see Kohn and Pommier, 2005 ). This situation can be represented using the branched line convention (Figure 5jCovalent binding between proteins or between sites within the same protein sometimes require a symmetrical symbol, for which purpose we have recently adopted the double-line symbol shown in Figure 1b′
Kinase Phosphorylation Cascade: Contingency Notation and Compact Notations Two ways to represent the effects of protein modification are illustrated with an example of a protein kinase phosphorylation cascade (Figure 6
Compound Contingencies When a contingency is controlled by multiple nodes, a complicated diagram can become excessively cluttered. As an alternative to the full representation of those situations (Figure 7
Transcription Control Figure 8
Node x in Figure 8a Figure 8b Translocation Translocation from one compartment of the cell to another is like a stoichiometric reaction in that molecules disappear from one place and an equal number of the same molecules appear at another place. We therefore represent translocation with the same symbol that is used for stoichiometric reactions: a filled triangle arrowhead. The example in Figure 9
Control by Protein Cleavage Induced by Specific Proteases The function of one domain of a protein is sometimes regulated (stimulated or inhibited) by another domain in the same protein. Control is sometimes implemented by a specific protease that cuts the two domains apart, thereby abrogating the influence one domain upon the other. Classic examples are found in the control of apoptosis by caspases. Figure 10
The alternative in Figure 10b Interactions at the Plasma membrane: Signaling via G Proteins Figure 11 ), or as an electronic link to hypertext (http://discover.nci.nih.gov/mim/). This example shows how the MIM notation organizes into a single diagram a process that previously required multiple panels in cartoon-like diagrams.Figure 11 Whereas process diagrams (sometimes presented in cartoon-like panels) usually show a particular order of events, this is not the case for MIM diagrams. For example, process diagrams generally show Gα(GTP) binding to adenylyl cyclase before the GTPase step, whereas this need not be the case, for example if there is high GAP activity. Moreover, the exchange between GDP and GTP can go in either direction, the predominance of one direction over the other depending on the GTP/GDP concentration ratio. MIM diagrams do not specify order of events and therefore cover a greater range of circumstances in a canonical format. Intramolecular Control: Calmodulin Kinase A classic example of intramolecular control is calmodulin (CaM)-dependent protein kinase (CaMK) (Alberts et al., 1994, or a later edition of this textbook). This system is diagrammed in MIM notation in Figure 12
The end-effect is the phosphorylation of various substrates by the kinase domain of CaMK (interaction-1). This action is inhibited (interaction-2) by the intramolecular binding between the catalytic domain and the regulatory domain of CaMK (interaction-3). This intramolecular bond can be opened by the competitive binding of calmodulin (CaM) to the regulatory domain (interaction-4). Binding of CaM to the regulatory domain requires (interaction-5) that CaM be bound by calcium (interaction-6). The steps so far describe how calcium activates the kinase. CaMK can autophosphorylate in trans (one molecule of CaMK phosphorylating another molecule of CaMK) (interaction-7). This phosphorylation prevents the binding between the kinase and the regulatory domain (interaction-8). Phosphorylated CaMK therefore retains its activity even when it dissociates from CaM. Eventually, CaMK is inactivated by dephosphorylation (interaction-9), which restores the ability of the regulatory domain to bind and block the kinase domain intramolecularly. Another, more complex case of intramolecular control is that of the nonreceptor tyrosine kinase, Src. A molecular interaction map of that system has been published (Kohn, 2001 ), and an animated version of the process can be viewed at (http://discover.nci.nih.gov).The activation of CaMK, Src, and G proteins behave similarly, in that they all exhibit amplified and prolonged action. Intramolecular Covalent Binding: Reactions of SH Groups in Response to Reactive Oxygen An interesting pathway involving intramolecular disulfide bond formation has recently been described for the response of budding yeast to oxidative stress (Temple et al., 2005 ). Figure 13We now describe the molecular interaction map of this system, as shown in Figure 13 ). The transcription factor Yap1p in the budding yeast Saccharomyces cerevisiae is normally kept at low levels by rapid export from the nucleus (interaction-1). This export would be inhibited (interaction-2) by formation of an intramolecular disulfide bond in Yap1p (interaction-3, note new symbol for covalent bonds). This disulfide bond blocks the nuclear export signal in Yap1p. Intracellular reducing conditions however usually prevent the production of disulfide bonds. Oxidative stress can generate the Yap1p disulfide by the following mechanism. Reactive oxygen species add a hydroxyl to the Cys36 SH group of the peroxidase Gpx3p (interaction-4, covalent bond between OH and S), generating a sulfene. The activated Gpx3p reacts with Yap1p, producing the disulfide and concurrently converting the sulfene back to the sulfhydryl form of Gpx3p (interaction-5). (To show that these two conversions are stoichiometrically linked as parts of the same reaction, we have introduced a small circle at the branch point –for the moment, this is an ad hoc symbol, not yet formally adopted.) The disulfide form of Yap1p accumulates in the nucleus and retains its ability to stimulate transcription. One of its gene products is thioredoxin, which cleaves the Yap1p disulfide (interaction-6), thereby forming a negative feedback loop. This example illustrates how the MIM notation may evolve to accommodate new requirements.Pathways within a Canonical Map: from Ataxia Telangiectasia Mutated (ATM) to p53 The MIM notation provides compact diagrams within which various reaction pathways and processes can be traced. As mentioned, heuristic MIM diagrams are canonical in the sense that they do not specify a particular process or sequence of events. A heuristic map may contain the ingredients for multiple processes or event sequences (pathways), which may function simultaneously or may be specific to particular conditions or cell types. Particular pathways however can be highlighted on a canonical map (http://discover.nci.nih.gov/mim/). Figure 14
p53 levels in cells are normally kept very low, by rapid degradation induced by Mdm2. In response to DNA damage, p53 increases in amount and activity, and functions to transcribe genes that arrest the cell cycle or that initiate programmed cell death (apoptosis). Certain types of DNA damage lead to increased levels of the ATM gene product. The four panels in Figure 14 ; http://discover.nci.nih.gov/mim/).EXPLICIT DIAGRAMS FOR COMPUTER SIMULATION To see why heuristic maps may not contain all the information needed for computer simulation, consider the following example, derived from Kohn et al. (2004 ). Suppose that molecules A and B bind to each other and that the resulting dimer binds to a promoter, thereby activating a gene. Suppose further that A can be phosphorylated, and that this phosphorylation causes A to be degraded. This heuristic description leaves open some questions for which a computer simulation model needs answers: Does phosphorylated A bind B? If it does, can the complex bind the promoter? If it can, does it activate the gene? Furthermore, if phosphorylated A binds B (either alone or promoter-bound), does this affect (stimulate or inhibit) A's degradation? For this simple heuristic map, there are 12 possible explicit models. Simulation studies require judgment about which explicit models are most plausible.We have approached computer simulation studies from the point of view of “microworld models” (Kholodenko and Westerhoff, 1995 ), which are based solely on molecular interactions, avoiding arbitrary functions for stimulation or inhibition contingencies. Our explicit diagrams use the MIM notation, but without stimulation or inhibition symbols. These diagrams can be translated directly into an input file for computer simulation (Kohn, 1998 , 2001 ; Kohn et al., 2004 ).In this implementation, inhibition can be expressed simply by omitting the reactions that do not occur. Alternatively, inhibition may be represented by a mechanism, such as competitive binding by another molecular species or production of an inactive complex. Likewise, stimulation must be represented by a specific mechanism. Enzymatic reactions are represented in terms of the component reactions: enzyme–substrate association; enzyme–substrate dissociation; conversion of enzyme–substrate complex to products (Figure 15a
ELECTRONIC MIMs Large heuristic MIMs need an easy way to find any desired molecule on the map. This is accomplished in printed versions by a coordinate grid and an alphabetical list of molecules, analogous to the way towns are found on a roadmap (Kohn, 1999 ; Aladjem et al., 2004 ). Ancillary information is provided through numbering of the interactions; each number refers to an annotation that contains cogent information and references.In electronic MIMs, the annotations are automatically brought up by clicking on an interaction number (http://discover.nci.nih.gov/mim/). Clicking on a molecular species name activates links to related databases. Electronic MIMs provide links to ancillary information and to other databases. We are often asked what tools are available for generating MIMs. At this time it is not possible to generate satisfactory MIMs automatically. Moreover, we think there are significant advantages to preparing these maps manually (aided only by a symbols toolkit). The process of manual production encourages critical thinking about the structure and function of the network. New questions and possibilities emerge as one decides exactly how to arrange a map to make it easiest to comprehend and how best to group the interactions in a functionally integrated manner. In general, we think it unwise to assign too much responsibility to the computer, because today's software may insulate users from the objects they wish to understand. BIOREGULATORY NETWORK DIAGRAMS: PROPOSALS AND CRITIQUE Our MIM notation has been widely discussed (Pirson et al., 2000 ; Strogatz, 2001 ; Uetz et al., 2001 ; Kitano, 2003 ; Kurata et al., 2003 ). We now consider the critique and the alternative proposals. The two main limitations of MIMs are the absence of a fully automated way to produce them and the fact that some effort is required to learn the notation. These limitations are, for the most part, shared by all of the alternative notations that have been proposed.Computer-generated diagram methods have been developed, such as BIOCARTA's connection diagrams (http://www.biocarta.com). However, the resulting diagrams lack important molecular details, such as protein phosphorylations. The graphical language described by Cook et al. (2001 ) may be more refined from an engineering standpoint, whereas the MIM language may be more intuitive from a biologist's perspective.Kitano (2003 ) proposed a variant of the MIM notation in which interaction and modification sites of a protein are marked on the border of the protein's symbol instead of at the end of a line extending from the border. We retain the modification symbol at the far end of an external line, however, because a given site may be modified in different ways (for example, by acetylation or ubiquitination at the same lysine residue, as in Figure 5jKitano and colleagues have also developed CellDesigner, a form of computer-aided design (CAD) for generating biomolecular network diagrams (Funahashi et al., 2003 ). It may be possible to develop an analogous facility for MIMs. The manual production of MIMs, however, is the best way to display networks in a functionally revealing manner, and it imposes a discipline of logic that often gives new insight and highlights gaps in our knowledge.Kurata et al. (2003 ) used a slight modification of the MIM notation to develop a software suite called CADLIVE to design and simulate signal transduction models. They described notation for two types of models: their “semantic models” correspond to our heuristic maps; their “mechanistic models” correspond to our explicit diagrams and associated computer implementation. Following our approach, Kurata et al. (2003 ) start with the principle that “each molecular species should ideally occur only once in a diagram, and all interactions involving those species should emanate from a single symbolic object” and that an extensible representation of multimolecular assemblies is a fundamental requirement. They also note, as we did, that “the potential number of modification and/or multimerization combinations is tremendous, and the representation of all possible combinations of multimers and modifications in a single diagram is not practical.” Their symbol list is very similar to ours (http://www.bse.kyutech.ac.jp/~kurata/NARwww/cadlive.html). Although they provide a computer implementation, its merits remain to be determined.Protein interaction network diagramming methods based on large-scale data sets are receiving considerable attention (Kelley et al., 2003 ; Gagneur et al., 2004 ; Vazquez et al., 2004 ). However, such diagrams do not contain comprehensive information about protein modifications and their consequences. Koike et al. (2003 ) described a protein kinase database that includes protein interaction data, but does not include details at the level of modification sites.Although different notations may in time find their optimal areas of application, we think that the MIM notation would be the most immediately useful for biologists. To gather the information for a molecular interaction map, it is necessary to scan a large number of journal articles. Computer-assisted search programs have been developed (Tanabe et al., 1999 ; Corney et al., 2004 ), including MedMiner (http://discover.nci.nih.gov) from our own laboratory. However, the best up-to-date product requires direct culling of information from papers selected and scanned by knowledgeable persons, who can extract evidence for direct interaction between proteins and identify the domains and modification states that are involved.MIMs have been faulted for not indicating dependence on cell type. The number of different cell types and cell states of interest, however, is very large. Heuristic MIMs are designed to show the molecular interactions that can occur if the interacting molecules are in the same place at the same time. We are developing tools to allow the user to delete molecules and pathways that may be absent in particular cases due to lack of expression of particular genes or protein species. In this way, maps specific to a particular cell type or state can be generated from a canonical map that includes all of the possible interactions. Another criticism is that MIMs do not specify the order of events. Kurata et al. (2003 ), for example, state that “Kohn's diagram accurately describes the detailed relationships among components, but it does not provide the stepwise view of specific biological processes”. Similarly, Kitano (2003 ) states that “MIM is a good basis for a standard to represent interactions between molecular species, however, it does not explicitly show temporal sequences of biological events.”However, MIMs intentionally avoid assumptions about order of events, because networks may operate in various ways involving different event sequences. Nevertheless, particular event sequences can be highlighted on a canonical map, as illustrated in Figure 14 CONCLUSIONS AND PERSPECTIVES There is an urgent, widely recognized need for standard notation capable of describing bioregulatory networks the way circuit diagrams describe electronic networks. Although several notations have been proposed, the molecular interaction map (MIM) notation is arguably the best suited to the purpose. It is the only extensively tested notation that can fully describe the known molecular details, such as the intricacies of protein modifications and complex formation, while allowing the unknown contingencies (of which there usually are many) to remain unspecified. Explicit, fully specified models for computer simulation can be extracted from the incompletely specified “heuristic” maps. Heuristic MIMs can encompass many explicit models and provide a foundation for testing these models. They are well suited to the biologist's perspective. We have found that, once it is mastered, the notation becomes invaluable as a diagrammatic shorthand that imposes a logical discipline and reveals the biologically relevant aspects of a network. MIMs often show the richness of interconnections that presumably confers extraordinary fluidity and robustness to bioregulatory networks. In addition to their heuristic character, another attribute of MIMs is that they are canonical, in the sense that a single diagram can encompass schema for a variety of cell types and cell states. The maps describe the interactions that can occur when the relevant molecules exist at the same time in the same place. Diagrams for specific cell types and cell states are derived from canonical maps by deleting the molecules that are not expressed, as well as the interactions that do not occur due to lack of colocalization in time or place. We are developing on-line tools that will allow users to carry out these deletions. A toolbox is also being provided to assist in manual map production (http://discover.nci.nih.gov/mim/). The MIM notation may prove useful in other fields of study, such as ecologic systems, and could become a general rubric for systems biology. Acknowledgments We thank Drs. Silvio Parodi, Stephania Pasa, Sohyoung Kim, and Hiroaki Kitano for many valuable comments, suggestions, and discussion during the development of the MIM notation. We are grateful to David Kane, Margot Sunshine, and Hong Cao, on contract to J.W.'s group from SRA, International, for help in implementing electronic forms of MIMs (i.e., eMIMs). This research was supported by the Intramural Research Program of the Center for Cancer Research, National Cancer Institute, National Institutes of Health. Notes This article was published online ahead of print in MBC in Press (http://www.molbiolcell.org/cgi/doi/10.1091/mbc.E05–09–0824) on November 2, 2005. References
|
PubMed related articles
Your browsing activity is empty. Activity recording is turned off. |
|||||||||||||||||||||||||||||||||
Mol Biol Cell. 1999 Aug; 10(8):2703-34.
[Mol Biol Cell. 1999]Trends Cell Biol. 2000 Oct; 10(10):404-8.
[Trends Cell Biol. 2000]Nature. 2001 Mar 8; 410(6825):268-76.
[Nature. 2001]Annu Rev Genomics Hum Genet. 2001; 2():343-72.
[Annu Rev Genomics Hum Genet. 2001]Nucleic Acids Res. 2003 Jul 15; 31(14):4071-84.
[Nucleic Acids Res. 2003]Chaos. 2001 Mar; 11(1):84-97.
[Chaos. 2001]Science. 2001 May 4; 292(5518):929-34.
[Science. 2001]Nature. 2001 Mar 8; 410(6825):268-76.
[Nature. 2001]Oncogene. 1998 Feb 26; 16(8):1065-75.
[Oncogene. 1998]Chaos. 2001 Mar; 11(1):84-97.
[Chaos. 2001]Mol Biol Cell. 2004 Jul; 15(7):3042-52.
[Mol Biol Cell. 2004]Biochem Biophys Res Commun. 2005 Jun 10; 331(3):816-27.
[Biochem Biophys Res Commun. 2005]Mol Biol Cell. 1999 Aug; 10(8):2703-34.
[Mol Biol Cell. 1999]Chaos. 2001 Mar; 11(1):84-97.
[Chaos. 2001]Trends Cell Biol. 2005 Jun; 15(6):319-26.
[Trends Cell Biol. 2005]Trends Cell Biol. 2005 Jun; 15(6):319-26.
[Trends Cell Biol. 2005]Biochem Biophys Res Commun. 2005 Jun 10; 331(3):816-27.
[Biochem Biophys Res Commun. 2005]Mol Biol Cell. 2004 Jul; 15(7):3042-52.
[Mol Biol Cell. 2004]Trends Biochem Sci. 1995 Feb; 20(2):52-4.
[Trends Biochem Sci. 1995]Oncogene. 1998 Feb 26; 16(8):1065-75.
[Oncogene. 1998]Chaos. 2001 Mar; 11(1):84-97.
[Chaos. 2001]Mol Biol Cell. 2004 Jul; 15(7):3042-52.
[Mol Biol Cell. 2004]Mol Biol Cell. 1999 Aug; 10(8):2703-34.
[Mol Biol Cell. 1999]Trends Cell Biol. 2000 Oct; 10(10):404-8.
[Trends Cell Biol. 2000]Nature. 2001 Mar 8; 410(6825):268-76.
[Nature. 2001]Nucleic Acids Res. 2003 Jul 15; 31(14):4071-84.
[Nucleic Acids Res. 2003]Genome Biol. 2001; 2(4):RESEARCH0012.
[Genome Biol. 2001]Nucleic Acids Res. 2003 Jul 15; 31(14):4071-84.
[Nucleic Acids Res. 2003]Proc Natl Acad Sci U S A. 2003 Sep 30; 100(20):11394-9.
[Proc Natl Acad Sci U S A. 2003]Genome Biol. 2004; 5(8):R57.
[Genome Biol. 2004]Proc Natl Acad Sci U S A. 2004 Dec 28; 101(52):17940-5.
[Proc Natl Acad Sci U S A. 2004]Genome Res. 2003 Jun; 13(6A):1231-43.
[Genome Res. 2003]Biotechniques. 1999 Dec; 27(6):1210-4, 1216-7.
[Biotechniques. 1999]Bioinformatics. 2004 Nov 22; 20(17):3206-13.
[Bioinformatics. 2004]Nucleic Acids Res. 2003 Jul 15; 31(14):4071-84.
[Nucleic Acids Res. 2003]Trends Cell Biol. 2005 Jun; 15(6):319-26.
[Trends Cell Biol. 2005]