![]() | ![]() |
Formats:
|
||||||||
Copyright Klamt et al. This is an open-access article distributed under the
terms of the Creative Commons Attribution License, which permits unrestricted use,
distribution, and reproduction in any medium, provided the original author and
source are credited. Hypergraphs and Cellular Networks 1Max Planck Institute for Dynamics of Complex Technical Systems,
Magdeburg, Germany 2Institute for Mathematical Optimization, Faculty of Mathematics,
Otto-von-Guericke University Magdeburg, Magdeburg, Germany 3Institute for Bioinformatics and Systems Biology, Helmholtz Zentrum
München—German Research Center for Environmental Health,
Neuherberg, Germany 4Max Planck Institute for Dynamics and Self-Organization,
Göttingen, Germany Jörg Stelling, Editor ETH Zürich, Switzerland * E-mail: klamt/at/mpi-magdeburg.mpg.de This article has been cited by other articles in PMC.Background The understanding of biological networks is a fundamental issue in computational
biology. When analyzing topological properties of networks, one often tends to
substitute the term “network” for “graph”,
or uses both terms interchangeably. From a mathematical perspective, this is often
not fully correct, because many functional relationships in biological networks are
more complicated than what can be represented in graphs. In general, graphs are combinatorial models for representing relationships (edges)
between certain objects (nodes). In biology, the nodes typically describe proteins,
metabolites, genes, or other biological entities, whereas the edges represent
functional relationships or interactions between the nodes such as “binds
to”, “catalyzes”, or “is converted
to”. A key property of graphs is that every edge connects two nodes. Many
biological processes, however, are characterized by more than two participating
partners and are thus not bilateral. A metabolic reaction such as
A+B→C+D (involving four species), or a protein complex
consisting of more than two proteins, are typical examples. Hence, such multilateral
relationships are not compatible with graph edges. As illustrated below,
transformation to a graph representation is usually possible but may imply a loss of
information that can lead to wrong interpretations afterward. Hypergraphs offer a framework that helps to overcome such conceptual limitations. As
the name indicates, hypergraphs generalize graphs by allowing edges to connect more
than two nodes, which may facilitate a more precise representation of biological
knowledge. Surprisingly, although hypergraphs occur ubiquitously when dealing with
cellular networks, their notion is known to a much lesser extent than that of
graphs, and sometimes they are used without explicit mention. This contribution does by no means question the importance and wide applicability of
graph theory for modeling biological processes. A multitude of studies proves that
meaningful biological properties can be extracted from graph models (for a review
see [1]). Instead, this contribution aims to increase the
communities' awareness of hypergraphs as a modeling framework for network
analysis in cell biology. We will give an introduction to the notion of hypergraphs,
thereby highlighting their differences from graphs and discussing examples of using
hypergraph theory in biological network analysis. For this Perspective, we propose
using hypergraph statistics of biological networks, where graph analysis is
predominantly used but where a hypergraph interpretation may produce novel results,
e.g., in the context of a protein complex hypergraph. Like graphs, hypergraphs may be classified by distinguishing between undirected and
directed hypergraphs, and, accordingly, we divide the introduction to hypergraphs
given below into two major parts. Undirected Hypergraphs An undirected hypergraph H = (V,E)
consists of a set V of vertices or nodes and a set
E of hyperedges. Each hyperedge
e E may contain arbitrarily many
vertices, the order being irrelevant, and is thus defined as a subset of
V. For this reason, undirected hypergraphs can also be interpreted
as set systems with a ground set V and a family
E of subsets of V. If no hyperedge is a subset
of another hyperedge, H is also called a Sperner hypergraph, or
clutter.Undirected graphs are special cases of hypergraphs in which every
hyperedge contains two nodes (i.e., has a cardinality of two).
Protein–protein interaction (PPI) networks provide a nice example
illustrating the differences that may arise in modeling biological facts with graphs
and hypergraphs. Various technologies for measuring protein interactions have been
developed, but we concentrate here on data obtained, e.g., by tandem affinity
purification (TAP, [2],[3]) delivering protein complexes (with possibly more
than two partners) instead of direct binary interactions. A small-scale example
mimicking experimental data derived by TAP is shown in Figure 1A
Another application of undirected hypergraphs is minimal hitting
sets (MHSs), also known as generalized vertex covers or hypergraph
transversals [6],[7]. For example, in a given hypergraph model of a PPI
network, an interesting problem related to experimental design [5] is to determine minimal
(irreducible) subsets of bait proteins that would cover or “hit”
all complexes in a minimal way; i.e., no proper subset of an MHS would hit all
complexes. In Figure 1A Hypergraphs are also closely related to the concept of independence
systems. An independence system I = (V,U)
is a collection U of subsets of a ground set V in
which for each set u U all subsets of
u are part of the collection. Any Sperner hypergraph H = (V,E)
can be extended to an independence system I = (V,U)
in which V is still the set of vertices and U
contains all hyperedges of E plus all subsets of these hyperedges.
The hyperedges of the original hypergraph are then the maximal independent sets
(also called bases) of the independence system I. For example, the family of sets of the independence system induced by the
protein complex hypergraph in Figure
1A = 1,
B = 2, C = 3,
D = 4, E = 5. A
greedy strategy (operating on the vertices) would first select protein E because it
has the highest weight. This reduces the search space to complex
C2 and C3. For the next
protein we choose C because its molecular weight is larger than that of A. The
algorithm finishes at that point as it has found a maximal independent set (complex
C3) whose weight is 8, which is apparently not
the optimum (note that this is not due to the larger size of complex
C1; choosing A = 8,
B = 1, C = 1,
D = 9, E = 8, the
greedy algorithm would deliver the four-protein complex
C1, although the true optimum is then the two-protein
complex C2). The reason that the greedy algorithm fails
in this simple example is that the independence system spanned by the complex
hypergraph is not a matroid.Given how frequently greedy-type algorithms on hypergraphs are applied as heuristics
in practice, it appears important to study the deviation of the hypergraph under
consideration from being a matroid [13]. A recent study on
algorithms for measuring phylogenetic diversity underlines this point [14]. Directed Hypergraphs The definition of directed hypergraphs is similar to undirected
hypergraphs, D = (V,A),
but each hyperedge a A—here
also called hyperarc—is assigned a direction, implying that one has to
define where it starts and where it ends. Directed hypergraphs allow us to connect
several start nodes (the tail ) with several end nodes (the head
H). A hyperarc is thus defined as a = ( ,H) with and H being subsets of the vertices
V. Again, directed graphs are special cases of directed hypergraphs
where both and H contain exactly one node limiting their
scope to 1 1 relationships. In contrast, directed hypergraphs can
represent arbitrary n:m relationships.Typical examples are (bio)chemical reactions, which are often bi-molecular, such as
the example A+B→C+D. The tail of this hyperarc consists of the reactants A and B, whereas the
head H contains the product C and D. However, for an exact
description of stoichiometric reactions we need to include the stoichiometric
coefficients (which can be different from unity) in the hypergraph model. For this
purpose, one adds into each hyperarc two functions : →N and cH:
H→N, assigning the stoichiometric
coefficients for the nodes in and H, respectively. Each hyperarc
a then reads a = ( , , H, cH). This
completes the description of a stoichiometric network, which is in practice often
conveniently described by a stoichiometric matrix (Figure 1CDirected hypergraphs can be drawn as shown in the example in Figure 1C Another application of directed hypergraphs in computational biology is the
representation of logical relationships in signaling and regulatory networks.
Interaction graphs (signed directed graphs) are commonly used topological models for
causal relationships and signal flows in cellular networks. For example, in Figure 1D 1 relationships, we cannot decide which combinations of input
signals of D will eventually activate D itself. With additional information, a
refined hypergraph representation might be constructed as in the right part of Figure 1D:Algorithmic Considerations The concept of hypergraphs provides such a rich modeling framework that algorithms
necessarily will be problem-specific, and will differ in complexity from similar
algorithms for graphs. Clearly, since graphs are special cases of hypergraphs,
algorithms for hypergraphs are at least as hard as its specialized implementations
in the graph case. Generally, when discussing algorithms in graphs and hypergraphs,
one has to distinguish between two types of problems. The first type encompasses
algorithms determining a particular (e.g., optimal) solution. One
example, as noted above, are shortest-path algorithms for graphs that are of low
complexity (and thus applicable in large-scale networks) and which can also be used
to find the connected components or to determine spanning trees in a hypergraph.
This is due to the fact that the graph representation as in Figure 1C The second type of problem is enumeration problems such as computing
all paths and cycles in a graph or all minimal hitting sets in a hypergraph. These
problems typically require enormous computational effort and are often limited to
networks of moderate size. For example, the hardness of computing the minimal
hitting sets (transversal of a hypergraph) is an open question in complexity theory
[11].
The theoretically fastest currently known algorithm is quasi-polynomial [24], used
successfully, e.g., in [12], whereas variants of Berge's method
[6] are
often faster in practice [10]. In general, it turns out that the particular
topology of cellular networks renders enumeration problems often feasible where one
would expect infeasibility in random networks with comparable size (see, e.g., [10],[25]). Network Statistics in Hypergraphs With the increasing availability of large-scale molecular interaction graphs such as
PPI or gene regulatory networks, more and more researchers have begun asking not
only for single specific elements of a graph but instead for its statistical
properties or significant building blocks. Examples are the neural network of
C. elegans, which satisfies the small-world property, implying
shorter mean shortest paths and higher clustering coefficients than one would expect
in random networks [26], and the PPI network of yeast, which may be
modeled using a scale-free topology and whose node connectivity is correlated with
essentiality of the corresponding protein [27]. Key novelties in these
approaches are that properties of the graphs are now interpreted as statistical
distributions, which can be correlated with other variables and asked for
significance within an appropriate class of random graphs [28],[29]. In the following, we
will first shortly outline some existing extensions of graph statistics to
hypergraph statistics and corresponding random models and afterward indicate
applications in computational biology. We will focus on undirected hypergraphs,
although extensions to directed ones are possible. The degree d(ν) of a vertex
ν V of an undirected hypergraph H = (V,E)
is the number of hyperedges that contain ν. Similarly, the degree
d′(e) of an hyperedge e H is the number of vertices of that hyperedge. If G is a
graph, then
d′(e) = 2.
In the more general hypergraph setting, however, we can consider distributions both
of vertex and hyperedge degrees. We can ask for mean degrees or more general
properties of the distributions. In social network analysis, this has already been
done: For instance, an actor–movie hypergraph obeys power-law
distributions in both degrees whereas an author–publication hypergraph
shows a power law only in the number of co-authored papers, but not in the author
degree [30]—which is simply due to the fact that the
number of authors on a paper is relatively limited.The natural next step in defining hypergraph statistics is to correlate vertex and
hyperedge connectivity, a major ingredient for determining, e.g., the small-world
property known from the graph case [26]. Here, the commonly used graph clustering
coefficient may be extended. For this, let denote the neighborhood of a vertex, which is defined as the set
of hyperedges that contain ν. Then the (hypergraph) clustering coefficient
cc defined for a pair of vertices (u,ν)
is given by , which quantifies overlap between neighborhoods. By analogy, it
can be defined for hyperedges as well, and, by averaging over all vertices, a
univariate clustering coefficient may be defined. In the
author–publication hypergraph, clustering coefficients of both vertices
and hyperedges are higher than expected by chance [30]. Another proposal for
clustering coefficients in hypergraphs can be found in [31]. In addition to such
local measures, we may also ask for global or semi-global properties. A common
question in the graph case is to identify clusters, often denoted as communities,
within the graph. Various methods have been proposed in this context, with
normalized cut [32] and graph modularity [33] being two of the most
popular ones, resulting in applications such as the search for modular structures,
ideally protein complexes, in PPI networks [34]. The former method has
already been extended to hypergraphs [35].In order to test for significance of certain structures, e.g., network motifs [36] or
scaling structures [26],[27], good null models are important. Such null models
describe random occurrences of structures. One typically wants to keep some
statistics of the network fixed while at the same time randomly sampling from its
representational class. This results in the notion of random graphs with certain
additional properties such as Erdös-Rényi [37] or
Barabási-Albert [38]. Extensions of random models, in particular to
hypergraphs, would focus on generative models, which increasingly find applications
at least in the graph case [26],[39]. In the context of hypergraphs, first models have
already been proposed [40]. What could be potential biological applications of hypergraph statistics? Given the
fact that in gene regulatory networks statistical properties are decisive [27], it
stands to reason that if one wants to combine two types of regulations or
interactions, e.g., gene and microRNA regulation, the resulting hypergraph ought to
be analyzed from a hypergraph statistics point of view. Another example is the
human–disease network [41], consisting of disease genes and related
diseases. Often, analysis and visualization are done on the projected versions,
either onto diseases or genes. However, node statistics or motif detection [36] may be
performed in the hypergraph itself. The latter is already implemented, e.g., in
FANMOD [42], a motif-finding tool ready to deal with n-partite
networks. Finally, we want to mention a hypergraph analysis of a mammalian protein
complex hypergraph acquired from the CORUM database [43]. The hypergraph shows
scale-free behavior in both vertex degree and hyperedge degree distribution [44]. As
illustrated schematically in Figure
2
Conclusions To summarize, hypergraphs generalize graphs by allowing for multilateral
relationships between the nodes, which often results in a more precise description
of biological processes. Hypergraphs thus provide an important approach for
representing biological networks, whose potential has not been fully exploited yet.
We therefore expect that applications of hypergraph theory [6],[22] in computational biology
will increase in the near future. Acknowledgments FT thanks Florian Blöchl and SK is grateful to Regina Samaga and Axel von
Kamp for helpful comments during the preparation of the manuscript. Footnotes The authors have declared that no competing interests exist. This work was supported by the German Federal Ministry of Education and Research
(HepatoSys and FORSYS-Centre MaCS (Magdeburg Centre for Systems Biology)), the
Ministry of Education and Research of Saxony-Anhalt (Research Center
“Dynamic Systems”), and the Helmholtz Alliance on Systems
Biology (project “CoReNe”). The funders had no role in study
design, data collection and analysis, decision to publish, or preparation of the
manuscript. References 1. Aittokallio T, Schwikowski V. Graph-based methods for analysing networks in cell biology. Brief Bioinform. 2006;7:243–255. [PubMed] 2. Gavin AC, Bösche M, Krause R, Grandi P, Marzioch M, et al. Functional organization of the yeast proteome by systematic
analysis of the protein complexes. Nature. 2002;415:141–147. [PubMed] 3. Gagneur J, Krause R, Bouwmeester T, Casari G. Modular decomposition of protein-protein interaction networks. Genome Biol. 2004;5:R57. [PubMed] 4. Wuchty S, Almaas E. Peeling the yeast proteome network. Proteomics. 2005;5:444–449. [PubMed] 5. Ramadan E, Tarafdar A, Pothen A. A hypergraph model for the yeast protein complex network. In: Proceedings of the Sixth IEEE Workshop on High Performance
Computational Biology; April 26, 2004; Santa Fe, New Mexico, United
States. 2004 6. Berge C. Hypergraphs: Combinatorics on finite sets. Amsterdam: Elsevier–North Holland; 1989. 7. Fijany A, Vatan F, Barrett A, Mackey R. New approaches for solving the diagnosis problem. 2002. IPN Progress Report 42–149. 8. Klamt S. Generalized concept of minimal cut sets in biochemical networks. Biosystems. 2006;83:233–247. [PubMed] 9. Klamt S, Saez-Rodriguez J, Lindquist J, Simeoni L, Gilles ED. A methodology for the structural and functional analysis of
signaling and regulatory networks. BMC Bioinformatics. 2006;7:56. [PubMed] 10. Haus UU, Klamt S, Stephen T. Computing knock-out strategies in metabolic networks. J Comput Biol. 2008;15:259–268. [PubMed] 11. Eiter T, Makino K, Gottlob G. Computational aspects of monotone dualization: A brief survey. Discrete Appl Math. 2008;156:2035–2049. 12. Haus U-U, Niermann K, Truemper K, Weismantel R. Logic Integer Programming Models for Signaling Networks. J Comput Biol. In press. 2009 13. Schrijver A. Combinatorial optimization. Polyhedra and efficiency. Berlin: Springer; 2003. 14. Moulton V, Semple C, Steel M. Optimizing phylogentic diversity under constraints. J Theor Biol. 2007;246:186–194. [PubMed] 15. Reed JL, Famili I, Thiele I, Palsson BO. Towards multidimensional genome annotation. Nat Rev Genet. 2006;7:130–141. [PubMed] 16. Jeong H, Tombor B, Albert R, Oltvai ZN, Barabasi AL. The large-scale organization of metabolic networks. Nature. 2000;407:651–654. [PubMed] 17. De Figueiredo LF, Schuster S, Kaleta C, Fell DA. Can sugars be produced from fatty acids? A test case for pathway
analysis tools. Bioinformatics. 2008;24:2615–2621. [PubMed] 18. Schuster S, Dandekar T, Fell DA. Detection of elementary flux modes in biochemical networks: A
promising tool for pathway analysis and metabolic engineering. Trends Biotechnol. 1999;17:53–60. [PubMed] 19. Acuna V, Chierichetti F, Lacroix V, Marchetti-Spaccamela A, Sagot MF, et al. Modes and cuts in metabolic networks: Complexity and algorithms. Biosystems. 2009;95:51–60. [PubMed] 20. Christensen TC, Oliveira AP, Nielsen J. Reconstruction and logical modeling of glucose repression
signaling pathways in Saccharomyces cerevisiae. BMC Syst Biol. 2009;3:7. [PubMed] 21. Saez-Rodriguez J, Simeoni L, Lindquist JA, Hemenway R, Bommhardt U, et al. A logical model provides insights into T cell receptor signaling. PLoS Comput Biol. 2007;3:e163. doi:10.1371/journal.pcbi.0030163. [PubMed] 22. Gallo G, Longo G, Pallottino S, Nguyen S. Directed hypergraphs and applications. Discrete Appl Math. 1993;42:177–201. 23. Lovász L. Coverings and coloring of hypergraphs. 1973. pp. 3–12. Proceedings of the Fourth Southeastern Conference on Combinatorics,
Graph Theory, and Computing; Florida Atlantic University, Boca Raton,
Florida, United States. 24. Fredman ML, Khachiyan L. On the complexity of dualization of monotone disjunctive normal
forms. J Algorithms. 1996;21:618–628. 25. Klamt S, Stelling J. Combinatorial complexity of pathway analysis in metabolic
networks. Mol Biol Rep. 2002;29:233–236. [PubMed] 26. Watts DJ, Strogatz SH. Collective dynamics of small-world networks. Nature. 1998;393:440–442. [PubMed] 27. Jeong H, Mason S, Barabási A, Oltvai Z. Lethality and centrality in protein networks. Nature. 2001;411:41–42. [PubMed] 28. Albert R, Barabasi A-L. Statistical mechanics of complex networks. Rev Mod Phys. 2002;74:47–97. 29. Newman MEJ. The structure and function of complex networks. SIAM Rev. 2003;45:167–256. 30. Latapy M, Magnien C, Vecchio ND. Basic notions for the analysis of large two-mode networks. Soc Networks. 2008;30:31–48. 31. Estrada E, Rodriguez-Velazquez JA. Subgraph centrality and clustering in complex hyper-networks. Physica A. 2006;364:581–594. 32. Shi J, Malik J. Normalized cuts and image segmentation. IEEE T Pattern Anal. 2000;22:888–905. 33. Newman M. Modularity and community structure in networks. Proc Natl Acad Sci U S A. 2006;103:8577–8582. [PubMed] 34. Wang Z, Zhang J. In search of the biological significance of modular structures in
protein networks. PLoS Comput Biol. 2007;3:e107. doi:10.1371/journal.pcbi.0030107. [PubMed] 35. Zhou D, Huang J, Schoelkopf B. Proc NIPS 19. Cambridge (Massachusetts): MIT Press; 2007. Learning with hypergraphs: Clustering, classification, and
embedding. 36. Milo R, Shen-Orr S, Itzkovitz S, Kashtan N, Chklovskii D, et al. Network motifs: Simple building blocks of complex networks. Science. 2002;298:824–827. [PubMed] 37. Erdős P, Rényi A. On random graphs. Publicationes Mathematicae. 1959;6:290–297. 38. Barabási A, Albert R. Emergence of scaling in random networks. Science. 1999;286:509–512. [PubMed] 39. Kim WK, Marcotte EM. Age-dependent evolution of the yeast protein interaction network
suggests a limited role of gene duplication and divergence. PLoS Comput Biol. 2008;4:e1000232. doi:10.1371/journal.pcbi.1000232. [PubMed] 40. Guillaume J, Latapy M. Bipartite structure of all complex networks. Inform Process Lett. 2004;90:215–221. 41. Goh K-I, Cusick ME, Valle D, Childs B, Vidal M, et al. The human disease network. Proc Natl Acad Sci. 2007;104:8685–8690. [PubMed] 42. Wernicke S, Rasche F. FANMOD: A tool for fast network motif detection. Bioinformatics. 2006;22:1152–1153. [PubMed] 43. Ruepp A, Brauner B, Dunger-Kaltenbach I, Frishman G, Montrone C, et al. Corum: The comprehensive resource of mammalian protein complexes. Nucleic Acids Res. 2008;36:D646–D650. [PubMed] 44. Wong P, Althammer S, Hildebrand A, Kirschner A, Pagel P, et al. An evolutionary and structural characterization of mammalian
protein complex organization. BMC Genomics. 2008;9:629. [PubMed] |
PubMed related articles
Your browsing activity is empty. Activity recording is turned off. |
|||||||
Brief Bioinform. 2006 Sep; 7(3):243-55.
[Brief Bioinform. 2006]Nature. 2002 Jan 10; 415(6868):141-7.
[Nature. 2002]Genome Biol. 2004; 5(8):R57.
[Genome Biol. 2004]Proteomics. 2005 Feb; 5(2):444-9.
[Proteomics. 2005]Biosystems. 2006 Feb-Mar; 83(2-3):233-47.
[Biosystems. 2006]BMC Bioinformatics. 2006 Feb 7; 7():56.
[BMC Bioinformatics. 2006]J Comput Biol. 2008 Apr; 15(3):259-68.
[J Comput Biol. 2008]J Theor Biol. 2007 May 7; 246(1):186-94.
[J Theor Biol. 2007]Nat Rev Genet. 2006 Feb; 7(2):130-41.
[Nat Rev Genet. 2006]Nature. 2000 Oct 5; 407(6804):651-4.
[Nature. 2000]Nat Rev Genet. 2006 Feb; 7(2):130-41.
[Nat Rev Genet. 2006]Bioinformatics. 2008 Nov 15; 24(22):2615-21.
[Bioinformatics. 2008]Trends Biotechnol. 1999 Feb; 17(2):53-60.
[Trends Biotechnol. 1999]Biosystems. 2009 Jan; 95(1):51-60.
[Biosystems. 2009]BMC Bioinformatics. 2006 Feb 7; 7():56.
[BMC Bioinformatics. 2006]BMC Syst Biol. 2009 Jan 14; 3():7.
[BMC Syst Biol. 2009]PLoS Comput Biol. 2007 Aug; 3(8):e163.
[PLoS Comput Biol. 2007]J Comput Biol. 2008 Apr; 15(3):259-68.
[J Comput Biol. 2008]Mol Biol Rep. 2002; 29(1-2):233-6.
[Mol Biol Rep. 2002]Nature. 1998 Jun 4; 393(6684):440-2.
[Nature. 1998]Nature. 2001 May 3; 411(6833):41-2.
[Nature. 2001]Nature. 1998 Jun 4; 393(6684):440-2.
[Nature. 1998]Proc Natl Acad Sci U S A. 2006 Jun 6; 103(23):8577-82.
[Proc Natl Acad Sci U S A. 2006]PLoS Comput Biol. 2007 Jun; 3(6):e107.
[PLoS Comput Biol. 2007]Science. 2002 Oct 25; 298(5594):824-7.
[Science. 2002]Nature. 1998 Jun 4; 393(6684):440-2.
[Nature. 1998]Nature. 2001 May 3; 411(6833):41-2.
[Nature. 2001]Science. 1999 Oct 15; 286(5439):509-12.
[Science. 1999]PLoS Comput Biol. 2008 Nov; 4(11):e1000232.
[PLoS Comput Biol. 2008]Nature. 2001 May 3; 411(6833):41-2.
[Nature. 2001]Proc Natl Acad Sci U S A. 2007 May 22; 104(21):8685-90.
[Proc Natl Acad Sci U S A. 2007]Science. 2002 Oct 25; 298(5594):824-7.
[Science. 2002]Bioinformatics. 2006 May 1; 22(9):1152-3.
[Bioinformatics. 2006]Nucleic Acids Res. 2008 Jan; 36(Database issue):D646-50.
[Nucleic Acids Res. 2008]BMC Genomics. 2008 Dec 23; 9():629.
[BMC Genomics. 2008]