![]() | ![]() |
Formats:
|
||||||||||||||||||||||||||||||||||||||||||
Copyright Royer et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Unraveling Protein Networks with Power Graph Analysis Biotechnology Center, Technische Universität Dresden, Germany Johannes Berg, Editor University of Cologne, Germany * E-mail: ms/at/biotec.tu-dresden.de Conceived and designed the experiments: LR MR BA MS. Performed the experiments: LR MR. Analyzed the data: LR MR MS. Contributed reagents/materials/analysis tools: BA. Wrote the paper: LR MR MS. Received October 19, 2007; Accepted May 29, 2008. This article has been cited by other articles in PMC.Abstract Networks play a crucial role in computational biology, yet their analysis and representation is still an open problem. Power Graph Analysis is a lossless transformation of biological networks into a compact, less redundant representation, exploiting the abundance of cliques and bicliques as elementary topological motifs. We demonstrate with five examples the advantages of Power Graph Analysis. Investigating protein-protein interaction networks, we show how the catalytic subunits of the casein kinase II complex are distinguishable from the regulatory subunits, how interaction profiles and sequence phylogeny of SH3 domains correlate, and how false positive interactions among high-throughput interactions are spotted. Additionally, we demonstrate the generality of Power Graph Analysis by applying it to two other types of networks. We show how power graphs induce a clustering of both transcription factors and target genes in bipartite transcription networks, and how the erosion of a phosphatase domain in type 22 non-receptor tyrosine phosphatases is detected. We apply Power Graph Analysis to high-throughput protein interaction networks and show that up to 85% (56% on average) of the information is redundant. Experimental networks are more compressible than rewired ones of same degree distribution, indicating that experimental networks are rich in cliques and bicliques. Power Graphs are a novel representation of networks, which reduces network complexity by explicitly representing re-occurring network motifs. Power Graphs compress up to 85% of the edges in protein interaction networks and are applicable to all types of networks such as protein interactions, regulatory networks, or homology networks. Author Summary Networks play a crucial role in biology and are often used as a way to represent experimental results. Yet, their analysis and representation is still an open problem. Recent experimental and computational progress yields networks of increased size and complexity. There are, for example, small- and large-scale interaction networks, regulatory networks, genetic networks, protein-ligand interaction networks, and homology networks analyzed and published regularly. A common way to access the information in a network is though direct visualization, but this fails as it often just results in “fur balls” from which little insight can be gathered. On the other hand, clustering techniques manage to avoid the problems caused by the large number of nodes and even larger number of edges by coarse-graining the networks and thus abstracting details. But these also fail, since, in fact, much of the biology lies in the details. This work presents a novel methodology for analyzing and representing networks. Power Graphs are a lossless representation of networks, which reduces network complexity by explicitly representing re-occurring network motifs. Moreover, power graphs can be clearly visualized: they compress up to 90% of the edges in biological networks and are applicable to all types of networks such as protein interaction, regulatory networks, or homology networks. Introduction In recent years, novel high-throughput methods, such as yeast two-hybrid assays [1] and affinity purification techniques [2],[3], have been used to characterize protein interactions at a large scale and have produced a wealth of data in the form of networks of interacting proteins. Comprehensive protein interaction networks have been assembled for several species: S. cerevisiae [4]–[6], C. elegans [7], D. melanogaster [8],[9], H. pylori [10], H. sapiens [11],[12], and P. falciparum [13]. Networks are also obtained with other high-throughput data collection methods, either experimentally or in silico, such as ChIP-on-chip [14] experiments, whole interactome scanning experiments (WISE) [15], sequence homology networks [16] and others. The challenge remains to obtain biological insights through the analysis of these networks. In the case of protein interaction networks, their topology has been explored through the clustering of proteins into groups that share the same biological function, are similarly localized in the cell, or are part of a complex. To this end, several algorithms have been developed, such as socio-affinity clustering [4], the Restricted Neighborhood Search Clustering (RNSC) algorithm [17], the MCODE algorithm [18], statistical sub-complexes [19], modular decomposition [20] or the MULIC clustering algorithm [21]. How does the underlying biology manifest itself in protein interaction networks? Fig. 1
The abundance of stars, cliques, and bicliques suggests that modeling protein interaction networks as a collection of binary interactions is an obstacle toward a detailed analysis of the wealth of information contained in high-throughput networks. These networks have many edges that redundantly diffuse the information instead of highlighting it. In this study we introduce a new network representation and analysis paradigm that not only groups proteins into biologically relevant modules but also conveys in all detail–without loss of information–and with fewer symbols, the subtle connection patterns within and between groups of proteins. Results and Discussion Power Graph Analysis Power Graphs are novel representations of graphs that rely on two abstractions: power nodes and power edges. Power nodes are sets of nodes brought together and power edges connect two power nodes thus signifying that all nodes contained in the first power node are connected to all the nodes contained in the second power node. These language primitives allow for the succinct representation of stars, bicliques and cliques. As Fig. 1 Power Graph Analysis is the computation and analysis of power graphs. We propose an algorithm that computes power graphs. Node clustering, module detection, network motif composition, network visualization, and network models can be recast in terms of Power Graph Analysis. In the following we demonstrate how power graphs facilitate the task of uncovering underlying biology. Understanding Interactions within Molecular Complexes with Power Graphs Some recent large-scale experiments [4] specifically aim at identifying complexes instead of binary interactions. Complexes are difficult to interpret from the point of view of binary interactions: are two proteins p1 and p2 participating in a complex C but not in direct physical contact, interacting? This point is crucial for the interpretation of results from pull-down assays where whole complexes are identified rather than binary interactions [2],[3]. In a pull-down assay, a purified and tagged protein, the bait, is used to capture other proteins: the preys. These observed complexes are either modelled as cliques in the matrix model, or as stars in the spoke model [40]. In the case of the spoke model the bait is at the centre of the star, and the preys are linked to it. In the matrix model, all proteins are linked together, signifying that they belong to the same observed complex. The problem with this perspective is that the spoke model underestimates, and the matrix model overestimates the number of true physical interactions between the members of a complex. For both models the use of binary interactions does not convey succinctly an otherwise simple connection pattern. Let n be the number of proteins in the complex. The matrix model represents the complex with a quadratic number of interacting pairs: n(n−1)/2. The spoke model requires only n−1 interacting pairs to represent the same complex. Fig. 1 Example 1—Casein kinase II complex A recent survey of the yeast proteome investigated the modularity of the yeast cell machinery [4]. Fig. 2
Other complexes are visible in the power graph representation. For example, the proteins POB3 and SPT16 are grouped together in one power node. They form a complex known as the heterodimeric FACT complex SPT16/POB3, a complex involved in the transcription elongation on chromatin templates. It is known that the casein kinase II complex activates the FACT complex [41]. Finally, a group of two power nodes linked by a power edge, all of them interacting with the protein PAF1, form the PAF1 complex–a complex that associates with RNA polymerase II [42]. Overall we see that the power graph representation manages to give an insightful picture of the underlying biology. It should be stressed that these representations are obtained without the addition of biological background knowledge but instead based on the network topology alone. Power Graphs thus provide useful hints into the existence of complexes, their internal organization, and their relationships. Importantly, the power graph representation is a lossless representation, meaning that all and only interactions from the original network are represented faithfully, which is usually not the case for most clustering methods. Example 2—Untangling the nucleosome Similarly to the survey of the yeast proteome by Gavin et al. [4], Krogan et al. [6] have investigated protein interactions using tandem affinity purification (TAP). Fig. 3A Interacting with histones is the ORC Complex (Origin Recognition Complex) responsible for marking origin regions prior to DNA replication. On Fig. 3B Surprisingly, histones HTA2, HTB2 and HHF1 are segregated from their twin subtypes HTA1, HTB1 and HHF2, as subunits ORC2 and ORC6 interact with HTA2, HTB2 and HHF1 and not with the HTA1, HTB1, and HHF2. This is contradictory to the identity/near identity of these pairs of histones. The power graphs shows the separation between these two types of histones. Why have these mostly identical proteins different interaction partners? In the case of H2A histones, each subtype has been shown to be sufficient for cell viability, and no clear functional difference were reported apart from homozygous strains for hta1− exhibiting a slower growth [43]. Despite the near identity of these proteins, their interaction profiles are different which suggests that the interactions with ORC2 and ORC6 are false positives or false negatives–all or none of the histones interact with ORC2 and ORC6. Yet, this hypothesis does not explain that co-regulated HTA2 and HTB2 are both seen interacting with ORC2 and ORC6, whereas the differently co-regulated HTA1 and HTB1 do not [44]. Moran et al. [45] show that the promoter region of HTA2 and HTB2 is regulated by the amount of effective H2A+H2B expression. This mechanism is essential for ensuring a sufficient and balanced amount of histones during the S phase–when DNA replication takes place. An excess of H2A+H2B induces a 10-fold decrease in RNA production for HTA1 and HTB1. Thus, a possible explanation for not observing interactions between ORC2/ORC6 and HTA1/HTB1 is that under some circumstances–that might be triggered by the TAP methodology (the fusion of the TAP tag to the C-terminus)–the production of subtypes HTA1 is depressed. Moran et al. argue that the same regulation feed-back takes place for HTB1 as well as for all variants of HHT and HHF [45]. Power Graph Analysis helps to analyze high-throughput data by automatically highlighting the important information: in this case the separation of histones proteins into two differentially co-regulated groups, the P-loop domain containing subunits of the ORC complex and the FACT complex. Interaction Profiles of Motif Binding Domains Example 3—Power Graph Analysis of a domain-peptide binding network In reference [15], Landgraf et al. have used a combination of phage display and SPOT synthesis to discover peptides in the yeast proteome that have the potential to bind to eight SH3 domains. Fig. 4A
Domain-interaction profiles correlate to sequence similarity We investigated how the interaction profiles of these eight SH3 carrying proteins relate to the domain sequences. Fig. 4B The pair of SH3-carrying proteins YHR016C/YFR024 that are grouped in one power node in Fig. 4A Power Graph Analysis Reveals Hidden Structures in Protein Interaction Networks As we have seen previously on specific examples, power graph analysis can help disentangle complex protein interaction networks. A quantitative analysis requires the definition of measures. Here we introduce the edge reduction measure:
From a visual complexity standpoint, trading edges for a hierarchy of sets of nodes is advantageous since the edges of a clique or biclique necessarily cross in two dimensions, whereas the circles delineating power nodes–by definition–do not. Table 1 shows the results for 13 protein interaction networks [4], [6], [9], [12], [13], [46]–[53]. The conversion rate is correlated to both the average degree and edge reduction and thus adds little extra information. To evaluate how significant these edge reduction values are, we randomly rewired these networks and then recomputed the corresponding power graphs–thus providing us with a convenient null-model (see methods for random rewiring). Fig. 5
The edge reduction and conversion rate are dependent on the abundance of stars, cliques and bicliques in the network–as these motifs require just one power edge to represent arbitrarily many edges. In particular, from the example previously discussed (casein kinase II complex, nucleosome) we would expect cliques and bicliques to be the culprit. To ascertain that their abundance is indeed the explanation for the higher edge reductions, we examine the count of power edges of different sizes. Fig. 6
Having observed an abundance of cliques and bicliques, there remains the possibility that this is solely caused by experimental or methodological artifacts. However, we know of at least one case for which this cannot be the explanation: the Structural Interaction Network (SIN) by Kim et al. is a set of interactions carefully curated using structural information: all interactions reported are direct physical interactions explained by a known structural binding [48]. This network exhibits a z-score of 54, Fig. 7
These results corroborate studies that looked at network motifs identified as functional units in the context of biological networks [55]. Network motifs have been shown to admit generalizations composed of bicliques and stars [56]. These patterns of interaction - characterized by a high connectivity - have been shown to be evolutionary conserved in the yeast protein interaction network [57]. Questioning the scale-free hypothesis It has been argued recently that other distributions than the power-law are a better fit to the observed degree distributions of protein interaction networks [26],[58]. It has also be shown that the scale-free property is not necessarily an intrinsic property of the networks, but could be an artifact caused by selection regularities in the sampling procedures [59],[60]. Other models for protein interaction networks, such as geometric random networks [61] have been shown to be a better fit when looking at the motif composition of protein interaction networks. Our results show that the degree distribution does not characterize completely the idiosyncrasies of protein interaction networks: abundance of stars, cliques and bicliques is an important signature. Domain and Gene Ontology Term Enrichment of Power Nodes To further support the idea that power nodes are not artifacts of the networks topology but have in fact a biological interpretation, we analyzed the enrichment of power nodes in InterPro domains [62],[63] and Gene Ontology (GO) terms [54]. In the previous example on histone proteins, we have an example of a power node of three proteins: ORC1, ORC4, and ORC5, that have in common a P-loop domain. Our null hypothesis is that “annotations are randomly distributed” following an hyper-geometric distribution. In order to take into account missing domain annotations, only power nodes for which more than two thirds of the proteins are annotated with at least one term or domain are considered. Moreover we use the Bonferroni correction since we do multiple hypothesis testing. Table 2 shows that sufficiently annotated power nodes are significantly enriched in domains, with most p-values below 0.001. Similarly, Table 3 shows the distribution of e-values for the enrichment in GO terms. The p-values for GO terms are not as low as for domains, which would indicate that domains are a better explanation for the occurrence of cliques and bicliques as identified by power graph analysis. Interestingly, when comparing the z-scores found previously and the levels of enrichment both seem to be correlated. For example, the Gavin, Krogan and Kim networks that have the highest z-scores also have the highest overall enrichments of domains and go terms. The Kim et al. network (SIN) has the best overall enrichments for both domains and GO terms, this is in line with the fact that this network is known to be of high quality. Conversely, the power graphs for the Lacount and Lim networks have low z-scores and their power nodes are poorly enriched in InterPro domains or GO terms. These results further confirm the relevance of power graph analysis for analyzing protein interaction networks, in particular the relationship between protein domains and protein interactions.
Beyond Protein Interactions Other biological networks benefit from Power Graph Analysis, too. Examples are protein homology networks [16] in which nodes are proteins and edges represent BLAST E-values below a given threshold. These networks are geometric networks defined on the space of sequences with the BLAST E-value as a distance. Geometric networks are known to be saturated in cliques and bicliques [61]. Another example is the analysis of raw gene regulatory networks that also benefits from the Power Graph representation - in particular since gene duplication events tend to create biclique motifs [55],[64]. Fig. 8
Example 4—Bipartite Regulatory Networks Beyer et al. presented an integrative approach for assigning transcription factors to target genes in S. cerevisiae using data from chIP-chip experiments, known binding motifs, clusters of co-expression and other evidences [65]. The result is a probabilistic model with high prediction accuracy, and thus a bipartite network between transcription factors and target genes. The authors identified–among others–YAP1, YAP7 and MSN2 as part of a transcription factor module related to the stress response of S. cerevisiae. To investigate if a similar module could be identified with Power Graph Analysis, we computed the power graph of the whole network and searched the region of the power graph containing YAP1, YAP7 and MSN2. As shown on Fig. 9
The transcription factors MSN2, MSN4, and SKN7 are known to regulate the expression of genes in response to stresses, such as heat and osmotic shock, oxidative stress, low pH, glucose starvation, sorbic acid and high ethanol concentrations [66]. YAP1, YAP2 and YAP7 are similar bZIP proteins of the YAP family characterised by unusual amino acid substitutions of their bZIP domains [67]. It is known that YAP1 and YAP2 are involved in the transcriptional response to drugs, oxidative stress and metal detoxification [66]. YAP7 is however a poorly characterised transcription factor most similar–within the YAP family–to YAP6 whose over expression increases sodium and lithium tolerance [68]. The strong overlap of gene targets of YAP1, YAP2, and YAP7 and the common metal detoxification function of YAP1/YAP2 and YAP6, suggests that YAP7 also plays a role in metal detoxification. Power Graph Analysis is useful for its ability to decompose a bipartite network into an union of bicliques. This decomposition leads naturally to a hierarchy of clusters of transcription factors linked to a hierarchy of clusters of target genes. Example 5—Human Protein Tyrosine Phosphatase Homology Network The protein tyrosine phosphatase (PTP) family [69] has a central role in signal transduction by controlling the phosphorylation state of tyrosine residues. Tyrosine-specific protein phosphatases (EC:3.1.3.48) catalyse the removal of a phosphate group attached to a tyrosine residue. The power graph of the protein tyrosine phosphatase homology network is shown in Fig. 10A
The choice of a threshold for the E-value has an impact on the representation. We observe that for the value of 10−46 the power graph reveals the most details. In this case, the lossless reduction in complexity achieved by the power graph representation reaches 95% edge reduction–from 4849 edges to 209 with 95 power nodes. The clustering of proteins in the power graph corresponds to the known classification of PTPs: 82% of leaf power nodes (that do not contain power nodes) have all of their proteins belonging to exactly the same sub-family. While the previous results could have been obtained through the hierarchical clustering of the sequences, Power Graph Analysis reveals additional details. The cross-links between different regions of the hierarchy constitute a new insight with respect to traditional clustering methods. For example, a group of 6 type B receptor PTPs are linked by a power edge to two type 2 non-receptor PTPs. Fig. 10B = 1.014) to a non-receptor phosphatase domain listed in ProDom–a database of automatically generated clusters of homologous sequence fragments [70]. To verify that this region is responsible for the high similarity (E-value<10−46) between the type G receptor PTPs and type 22 non-receptor PTP, we compared the sequences of type G PTPs to a group of proteins to which they are not connected in the power graph: type 20 PTPs. As Fig. 10BThe detection of similarity cross-links in the hierarchy is the contribution of Power Graph Analysis to the analysis of homology networks. These cross-links constitute a weak signal in networks and are difficult to detect. In this case the evidence for this domain erosion is carried by only eight similarity links between four and two proteins whereas the original network has 4849 edges. In the power graph representation it is one power edge among only 209. Robustness Analysis Protein networks, and in particular protein interaction networks from high-throughput measurements are known to suffer from many false positives and negatives. To investigate the robustness of power graph analysis, we compare a network's power graph to the power graphs with increasing levels of noise modelled with the addition, removal or rewiring of edges. Fig. 11
Summary and Conclusion Power Graph Analysis lies at the crossing point of clustering, network motif analysis, information compression, and visualisation. In the previous results, we showed that Power Graph Analysis reveals known underlying biology when applied to protein interaction networks, regulatory and homology networks. It also leads to new insights and new hypotheses. In particular, we presented evidence that the similarity of interaction profiles for peptide-binding SH3 domains correlates with the sequence similarity of these domains. We also discussed how the difference of interaction profiles of otherwise near-identical histone subtypes–visible in the power graph representation–suggests that the TAP methodology interfered with the histone regulatory mechanisms and led to low expression levels of histones subtypes HTA1 and HTB1. Examining other types of networks, we showed that Power Graph Analysis of predicted transcription factors for target genes by Beyer et al. [65] led to the hypothesis that YAP7 is involved in metal detoxification. Finally, Power Graph Analysis, applied to a human phosphatase homology network, reveals similarity cross-links in the hierarchy that are used to spot domain erosion in type 22 non-receptor protein phosphatases. The main reason behind the usefulness of Power Graph Analysis is the observation that experimental protein interaction networks, bipartite regulatory networks, protein homology networks, and other biological networks have an abundance of cliques and bicliques. Moreover, for small-scale interaction networks and some high quality networks, such as SIN [48] the cliques and bicliques are not solely attributable to noise. The significant enrichment of power nodes in protein domains and Gene Ontology terms further confirms that the cliques and bicliques, that Power Graph Analysis detects, are relevant in the networks. In the case of bipartite regulatory networks, the bipartite nature of the network is ideal for Power Graph Analysis. Cliques and bicliques in biological networks have been noticed in the past [25]–[27],[71]. Here we argue that this abundance constitutes an important aspect of biological networks in general. Power Graph Analysis distinguishes itself from clustering techniques (socio-affinity clustering [4], RNSC algorithm [17], MCODE algorithm [18], statistical sub-complexes [19]) in that it is specifically designed to identify these cliques and bicliques. Clustering algorithms on graphs often rely on the identification of highly connected regions, abstracting the patterns of connection between groups of nodes. This approach works well for the detection of complexes and other regions of higher connectivity, but it fails for example in the case of the bipartite regulatory networks. In the case of transcriptional regulatory networks, meaningful clusters of transcription factors are not connected to each other but only to target genes. In protein interaction networks, it is also the case that interesting clusters of proteins are defined by their neighbouring proteins and not by their connectivity. For homology networks, we saw that the group of type G receptor PTPs was found because of its similarity to type 22 non-receptor PTPs and not because of a higher level of connectivity. With Power Graph Analysis it is possible to decompose and represent biological networks as combinations of two simple elements: cliques and bicliques. New analysis methodologies and algorithms can be developed to leverage the information compression made possible by Power Graphs. These directly operate on Power Graphs instead of traditional node-and-edge-graphs. Indeed, one important finding is that the information contained in diverse biological networks, such as protein interaction networks, regulatory networks, and homology networks is highly compressible–even up to 95% for some homology networks. We argue that avoiding this excess of redundant information is possible and desirable. The advantages and uses of Power Graph Analysis are:
Other graph formalisms have been proposed, such as hypergraphs in which hyper-edges are n-tuples of nodes [72],[73], or compound graphs and metagraphs in which nodes are collapsed into metanodes [74]. Despite the similarities–such as the collapsing of nodes into metanodes–Power Graphs are different. First, Power Graphs are about decomposing networks using cliques and bicliques. Second, this decomposition is done without loss of information which is usually not the case of compound graphs or metagraphs. As we showed, Power Graph Analysis is a novel network analysis paradigm that provides a basis for new methodologies. One immediate example is visualisation. Several tools exist to visualise biological networks, such as Cytoscape [75], Pajek [76], Osprey [77], Navigator [78], VisANT [74], ProViz [79], MOVE [80] and GraphViz [81]. However, it is often the case that the amount of information being visualised–the number of edges and edge crossings–makes it difficult to visually explore the networks and mine the desired information. By removing redundant information in the networks, Power Graphs lead to clearer and insightful visualisations. Tools, such as VisANT [74] support the grouping of nodes into clusters which would make the integration of Power Graph Analysis possible. Power graph based visualisation is already available as a plugin for Cytoscape using the described algorithm. Software for computing Power Graphs is available at: http://www.biotec.tu-dresden.de/schroeder/group/powergraphs. Methods Formal Definition of Power Graphs Given a graph G = (V,E) where V is the set of nodes and E V×V is the set of edges, a power graph G′ = (V′,E′) is a graph defined on the power set of nodes V′ P(V) whose elements–power nodes–are connected to each other by power edges: E′ V′×V′. Hence Power Graphs are defined on the power sets of nodes and power set of edges. The semantics of Power Graphs are as follows: if two power nodes are connected by a power edge in G′, this means that in G all nodes of the first power node are connected to all nodes of the second power node. Similarly, if a power node is connected to itself by a power edge in G′, this signifies that all nodes in the power node are connected to each other by edges in G.The following two conditions are required for simplifying the representations:
Power Graph Algorithm We have developed an algorithm for computing near-minimal power graph representations from graphs. The first phase of the algorithm collects candidate power nodes and the second phase uses these to search and add power edges abstracting a maximum number of edges from G, which are successively added to the power graph G′. First phase: Identifying potential power nodes with hierarchical clustering based on neighbourhood similarity A set of nodes is a candidate power node if its nodes have neighbours in common. We use a hierarchical clustering algorithm [82] based on neighbourhood similarity to identify such sets. The similarity of two neighbourhoods is the Jaccard index of these two sets [83] (other neighbourhood similarity measures are conceivable). It is always between zero and one: it is zero if the sets U and V have no common neighbours, and one if both have identical neighbourhoods. Neighbourhood similarity clustering is an intuitive way to identify candidate power nodes. Fig. 12
To detect stars and other highly asymmetric bicliques in phase two, additional to the hierarchy of sets of nodes achieved with the clustering we add to the candidate power nodes for each node u two sets: Its neighbourhood set N(u) and the set of common neighbours of the nodes in N(u) that contain at least u. Second phase: Greedy power edge search The minimal power graph problem is to be seen as an optimization problem in which the power graph achieving the highest edge reduction is searched. The greedy power edge search follows the heuristic of making the locally optimum decision at each step with the hope of finding the global optimum, or at least a close approximation [84]. Among the candidate power nodes found in phase one each pair that corresponds to a power edge is a candidate power edges. The candidates abstracting the most edges are added successively to the power graph. Related algorithms The power graph algorithm shares similarities to existing algorithms, such as modular decomposition [2],[85] and spectral clustering [86]. Modular decomposition identifies modules as sets of nodes having exactly the same neighbours and builds a tree representation of modules. Algorithms used for modular decompositions can be used for computing Power Graphs, yet they do not achieve as much edge reduction since only modules with strictly identical neighbourhoods are found. For example in Fig. 12 Scalability of Power Graph Analysis We have conducted experiments to understand the behaviour of the edge reduction for two important classes of networks: synthetic random networks generated according to the Erdös-Rényi model [88] (ER model) and synthetic scale-free networks generated according to the preferential-attachment model of Barabási and Albert (BA model) [24]. Fig. 13
Random Network Rewiring Network rewiring is done by choosing randomly two edges (u,v) and (w,t) and rewiring these to (u,t) and (w,v), taking care that these two new edges are not already present in the network. This rewiring step can be repeated a number of times proportional to the number of edges (in our case we chose 16 times). This preserves the degree distribution but removes all correlations between nodes, and thus allows the construction of a null-model for a given network [89]. Hypergeometric Test We evaluate the enrichment of a cluster's proteins with domains using p-values assuming an hyper-geometric distribution [17]. The p-value for a cluster of size C containing k≤C proteins with domain X is:
This is the probability that the cluster has k or more proteins with domain or GO term X, if the cluster's contents were drawn randomly from the set of known proteins. Where G is the size of the set of known proteins among which n≤G have domain X. To further take into account the fact that we do multiple tests, we use Bonferroni's correction and compute a corrected p-value pc = np, where n is the number of annotations tested for a power node.Gene and Protein Database Identifiers The biological function and complex assignments for the examples where obtained through SGD [44] online database. Table 4 recapitulates the names, description and database identifiers of the proteins mentioned in the text.
Acknowledgments Thanks to Christof Winter for detailed feedback and discussions on the biological relevance of power graphs and to Andreas Henschel, Frank Dressel and Annalisa Marsico for critique and feedback. Thanks also go to Andreas Beyer for critique and for suggesting the analysis of his transcription factor to target genes network [65]. Many thanks also to the editors and reviewers for their insights and suggestions. Footnotes The authors have declared that no competing interests exist. This work was supported by the EU project SEALIFE. References 1. Fields S, Song O. A novel genetic system to detect protein-protein interactions. Nature. 1989;340:245–246. Available: http://dx.doi.org/10.1038/340245a0. 2. Rigaut G, Shevchenko A, Rutz B, Wilm M, Mann M, et al. A generic protein purification method for protein complex characterization and proteome exploration. Nat Biotechnol. 1999;17:1030–1032. Available: http://dx.doi.org/10.1038/13732. [PubMed] 3. Mann M, Hendrickson RC, Pandey A. Analysis of proteins and proteomes by mass spectrometry. Annu Rev Biochem. 2001;70:437–473. Available: http://dx.doi.org/10.1146/annurev.biochem.70.1.437. 4. Gavin AC, Aloy P, Grandi P, Krause R, Boesche M, et al. Proteome survey reveals modularity of the yeast cell machinery. Nature. 2006;440:631–636. Available: http://dx.doi.org/10.1038/nature04532. [PubMed] 5. Ito T, Chiba T, Ozawa R, Yoshida M, Hattori M, et al. A comprehensive two-hybrid analysis to explore the yeast protein interactome. Proc Natl Acad Sci U S A. 2001;98:4569–4574. Available: http://dx.doi.org/10.1073/pnas.061034498. 6. Krogan NJ, Cagney G, Yu H, Zhong G, Guo X, et al. Global landscape of protein complexes in the yeast saccharomyces cerevisiae. Nature. 2006;440:637–643. Available: http://dx.doi.org/10.1038/nature04670. [PubMed] 7. Li S, Armstrong CM, Bertin N, Ge H, Milstein S, et al. A map of the interactome network of the metazoan C. elegans. Science. 2004;303:540–543. Available: http://dx.doi.org/10.1126/science.1091403. [PubMed] 8. Giot L, Bader JS, Brouwer C, Chaudhuri A, Kuang B, et al. A protein interaction map of Drosophila melanogaster. Science. 2003;302:1727–1736. Available: http://dx.doi.org/10.1126/science.1090289. 9. Stanyon CA, Liu G, Mangiola BA, Patel N, Giot L, et al. A drosophila protein-interaction map centered on cell-cycle regulators. Genome Biol. 2004;5:R96. Available: http://dx.doi.org/10.1186/gb-2004-5-12-r96. [PubMed] 10. Rain JC, Selig L, Reuse HD, Battaglia V, Reverdy C, et al. The protein-protein interaction map of Helicobacter pylori. Nature. 2001;409:211–215. Available: http://dx.doi.org/10.1038/35051615. [PubMed] 11. Stelzl U, Worm U, Lalowski M, Haenig C, Brembeck FH, et al. A human protein-protein interaction network: a resource for annotating the proteome. Cell. 2005;122:957–968. Available: http://dx.doi.org/10.1016/j.cell.2005.08.029. [PubMed] 12. Rual JF, Venkatesan K, Hao T, Hirozane-Kishikawa T, Dricot A, et al. Towards a proteome-scale map of the human protein-protein interaction network. Nature. 2005;437:1173–1178. Available: http://dx.doi.org/10.1038/nature04209. 13. LaCount DJ, Vignali M, Chettier R, Phansalkar A, Bell R, et al. A protein interaction network of the malaria parasite Plasmodium falciparum. Nature. 2005;438:103–107. Available: http://dx.doi.org/10.1038/nature04104. [PubMed] 14. Lee TI, Rinaldi NJ, Robert F, Odom DT, Bar-Joseph Z, et al. Transcriptional regulatory networks in saccharomyces cerevisiae. Science. 2002;298:799–804. Available: http://dx.doi.org/10.1126/science.1075090. [PubMed] 15. Landgraf C, Panni S, Montecchi-Palazzi L, Castagnoli L, Schneider-Mergener J, et al. Protein interaction networks by proteome peptide scanning. PLoS Biol. 2004;2:E14. Available: http://dx.doi.org/10.1371/journal.pbio.0020014. [PubMed] 16. Medini D, Covacci A, Donati C. Protein homology network families reveal step-wise diversification of type iii and type iv secretion systems. PLoS Comput Biol. 2006;2:e173. doi:10.1371/journal.pcbi.0020173. [PubMed] 17. King AD, Przulj N, Jurisica I. Protein complex prediction via cost-based clustering. Bioinformatics. 2004;20:3013–3020. Available: http://dx.doi.org/10.1093/bioinformatics/bth351. [PubMed] 18. Bader GD, Hogue CWV. An automated method for finding molecular complexes in large protein interaction networks. BMC Bioinformatics. 2003;4:2. [PubMed] 19. Hollunder J, Beyer A, Wilhelm T. Identification and characterization of protein subcomplexes in yeast. Proteomics. 2005;5:2082–2089. Available: http://dx.doi.org/10.1002/pmic.200401121. [PubMed] 20. Gagneur J, Krause R, Bouwmeester T, Casari G. Modular decomposition of protein-protein interaction networks. Genome Biol. 2004;5:R57. Available: http://dx.doi.org/10.1186/gb-2004-5-8-r57. [PubMed] 21. Andreopoulos B, An A, Wang X, Faloutsos M, Schroeder M. Clustering by common friends finds locally significant proteins mediating modules. Bioinformatics. 2007 Available: http://dx.doi.org/10.1093/bioinformatics/btm064. 22. Li D, Li J, Ouyang S, Wang J, Wu S, et al. Protein interaction networks of Saccharomyces cerevisiae, Caenorhabditis elegans and Drosophila melanogaster: large-scale organization and robustness. Proteomics. 2006;6:456–461. Available: http://dx.doi.org/10.1002/pmic.200500228. [PubMed] 23. Taylor JS, Raes J. Duplication and divergence: the evolution of new genes and old ideas. Annu Rev Genet. 2004;38:615–643. Available: http://dx.doi.org/10.1146/annurev.genet.38.072902.092831. [PubMed] 24. Barabasi, Albert Emergence of scaling in random networks. Science. 1999;286:509–512. [PubMed] 25. Morrison JL, Breitling R, Higham DJ, Gilbert DR. A lock-and-key model for protein-protein interactions. Bioinformatics. 20062006 Available: http://dx.doi.org/10.1093/bioinformatics/btl338. 26. Thomas A, Cannings R, Monk NAM, Cannings C. On the structure of protein-protein interaction networks. Biochem Soc Trans. 2003;31:1491–1496. Available: http://dx.doi.org/10.1042/ [PubMed] 27. Li H, Li J, Wong L. Discovering motif pairs at interaction sites from protein sequences on a proteome-wide scale. Bioinformatics. 2006;22:989–996. Available: http://dx.doi.org/10.1093/bioinformatics/btl020. [PubMed] 28. Kim WK, Park J, Suh JK. Large scale statistical prediction of protein-protein interaction by potentially interacting domain (PID) pair. Genome Inform. 2002;13:42–50. [PubMed] 29. Deng M, Mehta S, Sun F, Chen T. Inferring domain-domain interactions from protein-protein interactions. Genome Res. 2002;12:1540–1548. Available: http://dx.doi.org/10.1101/gr.153002. [PubMed] 30. Ng SK, Zhang Z, Tan SH. Integrative approach for computationally inferring protein domain interactions. Bioinformatics. 2003;19:923–929. 31. Nye TMW, Berzuini C, Gilks WR, Babu MM, Teichmann SA. Statistical analysis of domains in interacting protein pairs. Bioinformatics. 2005;21:993–1001. Available: http://dx.doi.org/10.1093/bioinformatics/bti086. [PubMed] 32. Liu Y, Liu N, Zhao H. Inferring protein-protein interactions through high-throughput interaction data from diverse organisms. Bioinformatics. 2005;21:3279–3285. Available: http://dx.doi.org/10.1093/bioinformatics/bti492. [PubMed] 33. Rhodes DR, Tomlins SA, Varambally S, Mahavisno V, Barrette T, et al. Probabilistic model of the human protein-protein interaction network. Nat Biotechnol. 2005;23:951–959. Available: http://dx.doi.org/10.1038/nbt1103. [PubMed] 34. Patil A, Nakamura H. Filtering high-throughput protein-protein interaction data using a combination of genomic features. BMC Bioinformatics. 2005;6:100. Available: http://dx.doi.org/10.1186/1471-2105-6-100. [PubMed] 35. Riley R, Lee C, Sabatti C, Eisenberg D. Inferring protein domain interactions from databases of interacting proteins. Genome Biol. 2005;6:R89. Available: http://dx.doi.org/10.1186/gb-2005-6-10-r89. [PubMed] 36. Guimaraes KS, Jothi R, Zotenko E, Przytycka TM. Predicting domain-domain interactions using a parsimony approach. Genome Biol. 2006;7:R104. Available: http://dx.doi.org/10.1186/gb-2006-7-11-r104. [PubMed] 37. Jothi R, Cherukuri PF, Tasneem A, Przytycka TM. Co-evolutionary analysis of domains in interacting proteins reveals insights into domain-domain interactions mediating protein-protein interactions. J Mol Biol. 2006;362:861–875. Available: http://dx.doi.org/10.1016/j.jmb.2006.07.072. [PubMed] 38. Nye TMW, Berzuini C, Gilks WR, Babu MM, Teichmann S. Predicting the strongest domain-domain contact in interacting protein pairs. Stat Appl Genet Mol Biol. 2006;5:Article5. Available: http://dx.doi.org/10.2202/1544-6115.1195. [PubMed] 39. Bu D, Zhao Y, Cai L, Xue H, Zhu X, et al. Topological structure analysis of the protein-protein interaction network in budding yeast. Nucleic Acids Res. 2003;31:2443–2450. [PubMed] 40. Bader GD, Hogue CWV. Analyzing yeast protein-protein interaction data obtained from different sources. Nat Biotechnol. 2002;20:991–997. Available: http://dx.doi.org/10.1038/nbt1002-991. [PubMed] 41. Keller DM, Zeng X, Wang Y, Zhang QH, Kapoor M, et al. A dna damage-induced p53 serine 392 kinase complex contains ck2, hspt16, and ssrp1. Mol Cell. 2001;7:283–292. [PubMed] 42. Mason PB, Struhl K. The fact complex travels with elongating rna polymerase ii and is important for the fidelity of transcriptional initiation in vivo. Mol Cell Biol. 2003;23:8323–8333. [PubMed] 43. Kolodrubetz D, Rykowski MC, Grunstein M. Histone h2a subtypes associate interchangeably in vivo with histone h2b subtypes. Proc Natl Acad Sci U S A. 1982;79:7814–7818. 44. [No authors listed] Saccharomyces genome database. 2008 Stanford University. Available: http://www.yeastgenome.org. 45. Moran L, Norris D, Osley MA. A yeast h2a-h2b promoter can be regulated by changes in histone gene copy number. Genes Dev. 1990;4:752–763. [PubMed] 46. Lim J, Hao T, Shaw C, Patel AJ, Szabó G, et al. A protein-protein interaction network for human inherited ataxias and disorders of Purkinje cell degeneration. Cell. 2006;125:801–814. Available: http://dx.doi.org/10.1016/j.cell.2006.03.032. [PubMed] 47. Hazbun TR, Malmström L, Anderson S, Graczyk BJ, Fox B, et al. Assigning function to yeast proteins by integration of technologies. Mol Cell. 2003;12:1353–1365. [PubMed] 48. Kim PM, Lu LJ, Xia Y, Gerstein MB. Relating three-dimensional structures to protein networks provides evolutionary insights. Science. 2006;314:1938–1941. Available: http://dx.doi.org/10.1126/science.1136174. [PubMed] 49. Gunsalus KC, Ge H, Schetter AJ, Goldberg DS, Han JDJ, et al. Predictive models of molecular machines involved in Caenorhabditis elegans early embryogenesis. Nature. 2005;436:861–865. Available: http://dx.doi.org/10.1038/nature03876. [PubMed] 50. Ewing RM, Chu P, Elisma F, Li H, Taylor P, et al. Large-scale mapping of human protein-protein interactions by mass spectrometry. Mol Syst Biol. 2007;3:89. Available: http://dx.doi.org/10.1038/msb4100134. [PubMed] 51. Ito T, Tashiro K, Muta S, Ozawa R, Chiba T, et al. Toward a protein-protein interaction map of the budding yeast: A comprehensive system to examine two-hybrid interactions in all possible combinations between the yeast proteins. Proc Natl Acad Sci U S A. 2000;97:1143–1147. [PubMed] 52. Butland G, Peregrin-Alvarez JM, Li J, Yang W, Yang X, et al. Interaction network containing conserved and essential protein complexes in Escherichia coli. Nature. 2005;433:531–537. Available: http://dx.doi.org/10.1038/nature03239. [PubMed] 53. Arifuzzaman M, Maeda M, Itoh A, Nishikata K, Takita C, et al. Large-scale identification of protein-protein interaction of Escherichia coli k-12. Genome Res. 2006;16:686–691. Available: http://dx.doi.org/10.1101/gr.4527806. [PubMed] 54. The Gene Ontology (GO) project in 2006. Nucleic Acids Research. 2005;34:D322–6. Available: http://dx.doi.org/10.1093/nar/gkj021. 55. Milo R, Shen-Orr S, Itzkovitz S, Kashtan N, Chklovskii D, et al. Network motifs: simple building blocks of complex networks. Science. 2002;298:824–827. Available: http://dx.doi.org/10.1126/science.298.5594.824. [PubMed] 56. Kashtan N, Itzkovitz S, Milo R, Alon U. Topological generalizations of network motifs. Phys Rev E Stat Nonlin Soft Matter Phys. 2004;70:031909. [PubMed] 57. Wuchty S, Oltvai ZN, Barabási AL. Evolutionary conservation of motif constituents in the yeast protein interaction network. Nat Genet. 2003;35:176–179. Available: http://dx.doi.org/10.1038/ng1242. [PubMed] 58. Khanin R, Wit E. How scale-free are biological networks. J Comput Biol. 2006;13:810–818. Available: http://dx.doi.org/10.1089/cmb.2006.13.810. [PubMed] 59. Stumpf MPH, Wiuf C, May RM. Subnets of scale-free networks are not scale-free: sampling properties of networks. Proc Natl Acad Sci U S A. 2005;102:4221–4224. Available: http://dx.doi.org/10.1073/pnas.0501179102. [PubMed] 60. Han JDJ, Dupuy D, Bertin N, Cusick ME, Vidal M. Effect of sampling on topology predictions of protein-protein interaction networks. Nat Biotechnol. 2005;23:839–844. Available: http://dx.doi.org/10.1038/nbt1116. [PubMed] 61. Przulj N, Corneil DG, Jurisica I. Modeling interactome: scale-free or geometric? Bioinformatics. 2004;20:3508–3515. Available: http://dx.doi.org/10.1093/bioinformatics/bth436. [PubMed] 62. Mulder NJ, Apweiler R, Attwood TK, Bairoch A, Bateman A, et al. New developments in the interpro database. Nucleic Acids Res. 2007;35:D224–D228. Available: http://dx.doi.org/10.1093/nar/gkl841. [PubMed] 63. Available: http://www.ebi.ac.uk/interpro/ 64. Teichmann SA, Babu MM. Gene regulatory network growth by duplication. Nat Genet. 2004;36:492–496. Available: http://dx.doi.org/10.1038/ng1340. [PubMed] 65. Beyer A, Workman C, Hollunder J, Radke D, Moller U, et al. Integrated assessment and prediction of transcription factor binding. PLoS Comput Biol. 2006;2:e70. doi:10.1371/journal.pcbi.0020070. [PubMed] 66. Gasch AP, Spellman PT, Kao CM, Carmel-Harel O, Eisen MB, et al. Genomic expression programs in the response of yeast cells to environmental changes. Mol Biol Cell. 2000;11:4241–4257. 67. Fernandes L, Rodrigues-Pousada C, Struhl K. Yap, a novel family of eight bzip proteins in Saccharomyces cerevisiae with distinct biological functions. Mol Cell Biol. 1997;17:6982–6993. [PubMed] 68. Mendizabal I, Rios G, Mulet JM, Serrano R, de Larrinoa IF. Yeast putative transcription factors involved in salt tolerance. FEBS Lett. 1998;425:323–328. 69. Pils B, Schultz J. Evolution of the multifunctional protein tyrosine phosphatase family. Mol Biol Evol. 2004;21:625–631. Available: http://dx.doi.org/10.1093/molbev/msh055. 70. Bru C, Courcelle E, Carrere S, Beausse Y, Dalmar S, et al. The prodom database of protein domain families: more emphasis on 3d. Nucleic Acids Res. 2005;33:D212–D215. Available: http://dx.doi.org/10.1093/nar/gki034. [PubMed] 71. Pati A, Vasquez-Robinet C, Heath LS, Grene R, Murali TM. Xcisclique: analysis of regulatory bicliques. BMC Bioinformatics. 2006;7:218. Available: http://dx.doi.org/10.1186/1471-2105-7-218. [PubMed] 72. BERGE. Hypergraphs. Elsevier; 1989. 73. Ramadan E, Tarafdar A, Pothen A. A Hypergraph Model for the Yeast Protein Complex Network. Proceedings of the Sixth IEEE International Workshop on High Performance Computational Biology. 2004. 74. Hu Z, Mellor J, Wu J, Kanehisa M, Stuart JM, et al. Towards zoomable multidimensional maps of the cell. Nat Biotechnol. 2007;25:547–554. Available: http://dx.doi.org/10.1038/nbt1304. [PubMed] 75. Shannon P, Markiel A, Ozier O, Baliga NS, Wang JT, et al. Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome Res. 2003;13:2498–2504. Available: http://dx.doi.org/10.1101/gr.1239303. [PubMed] 76. Batagelj V, Mrvar A. Graph Drawing Software, Springer, chapter Pajek - Analysis and Visualization of Large Networks. 2003. pp. 77–103. 77. Breitkreutz BJ, Stark C, Tyers M. Osprey: a network visualization system. Genome Biol. 2003;4:R22. [PubMed] 78. Motamed-Khorasani A, Jurisica I, Letarte M, Shaw PA, Parkes RK, et al. Differentially androgen-modulated genes in ovarian epithelial cells from brca mutation carriers and control patients predict ovarian cancer survival and disease progression. Oncogene. 2007;26:198–214. Available: http://dx.doi.org/10.1038/sj.onc.1209773. [PubMed] 79. Iragne F, Nikolski M, Mathieu B, Auber D, Sherman D. ProViz: protein interaction visualization and exploration. Bioinformatics. 2005;21:272–274. Available: http://dx.doi.org/10.1093/bioinformatics/bth494. [PubMed] 80. Bosman D, Blom E, Ogao P, Kuipers O, Roerdink J. Move: A multi-level ontology-based visualization and exploration framework for genomic networks. In Silico Biol. 2007;7:35–59. [PubMed] 81. Gansner ER, North SC. An open graph visualization system and its applications to software engineering. Software–Practice and Experience. 2000;30:1203–1233. Available: http://citeseer.ist.psu.edu/gansner99open.html. 82. Eisen MB, Spellman PT, Brown PO, Botstein D. Cluster analysis and display of genome-wide expression patterns. Proc Natl Acad Sci U S A. 1998;95:14863–14868. [PubMed] 83. Jaccard P. Bulletin del la société vaudoise des sciences naturelles. 1901;37:241–272. 84. Cormen TH, Leiserson CE, Rivest RL, Stein C. Introduction to Algorithms. MIT Press; 2001. p. 1128. 85. Gallai T. Transitiv orientierbare graphen. Acta Mathematica Academiae Scientiarum Hungaricae. 1967;18:25–66. doi: Available: http://www.ams.org/mathscinet-getitem?mr=0221974. 86. Pothen A, Simon H, Liou K. Partitioning sparse matrices with eigenvectors of graphs. SIAM Journal on Matrix Analysis and Applications. 1990;11:430–452. 87. Yosef N, Yakhini Z, Tsalenko A, Kristensen V, Borresen-Dale AL, et al. A supervised approach for identifying discriminating genotype patterns and its application to breast cancer data. Bioinformatics. 2007;23:e91–e98. Available: http://dx.doi.org/10.1093/bioinformatics/btl298. [PubMed] 88. Erdös P, Rényi A. Random Graphs. Publ Math Inst Hung Acad Sci. 1960;5 89. Maslov S, Sneppen K. Specificity and stability in topology of protein networks. Science. 2002;296:910–913. Available: http://dx.doi.org/10.1126/science.1065103. [PubMed] 90. Ye J, McGinnis S, Madden TL. Blast: improvements for better sequence analysis. Nucleic Acids Res. 2006;34:W6–W9. Available: http://dx.doi.org/10.1093/nar/gkl164. [PubMed] |
PubMed related articles
Your browsing activity is empty. Activity recording is turned off. |
|||||||||||||||||||||||||||||||||||||||||
Nat Biotechnol. 1999 Oct; 17(10):1030-2.
[Nat Biotechnol. 1999]Nature. 2006 Mar 30; 440(7084):631-6.
[Nature. 2006]Nature. 2006 Mar 30; 440(7084):637-43.
[Nature. 2006]Science. 2004 Jan 23; 303(5657):540-3.
[Science. 2004]Genome Biol. 2004; 5(12):R96.
[Genome Biol. 2004]Nature. 2006 Mar 30; 440(7084):631-6.
[Nature. 2006]Bioinformatics. 2004 Nov 22; 20(17):3013-20.
[Bioinformatics. 2004]BMC Bioinformatics. 2003 Jan 13; 4():2.
[BMC Bioinformatics. 2003]Proteomics. 2005 May; 5(8):2082-9.
[Proteomics. 2005]Genome Biol. 2004; 5(8):R57.
[Genome Biol. 2004]Proteomics. 2006 Jan; 6(2):456-61.
[Proteomics. 2006]Annu Rev Genet. 2004; 38():615-43.
[Annu Rev Genet. 2004]Science. 1999 Oct 15; 286(5439):509-12.
[Science. 1999]Biochem Soc Trans. 2003 Dec; 31(Pt 6):1491-6.
[Biochem Soc Trans. 2003]Bioinformatics. 2006 Apr 15; 22(8):989-96.
[Bioinformatics. 2006]Nature. 2006 Mar 30; 440(7084):631-6.
[Nature. 2006]Nat Biotechnol. 1999 Oct; 17(10):1030-2.
[Nat Biotechnol. 1999]Nat Biotechnol. 2002 Oct; 20(10):991-7.
[Nat Biotechnol. 2002]Nature. 2006 Mar 30; 440(7084):631-6.
[Nature. 2006]Mol Cell. 2001 Feb; 7(2):283-92.
[Mol Cell. 2001]Mol Cell Biol. 2003 Nov; 23(22):8323-33.
[Mol Cell Biol. 2003]Nature. 2006 Mar 30; 440(7084):631-6.
[Nature. 2006]Nature. 2006 Mar 30; 440(7084):637-43.
[Nature. 2006]Annu Rev Genet. 2004; 38():615-43.
[Annu Rev Genet. 2004]Genes Dev. 1990 May; 4(5):752-63.
[Genes Dev. 1990]PLoS Biol. 2004 Jan; 2(1):E14.
[PLoS Biol. 2004]PLoS Biol. 2004 Jan; 2(1):E14.
[PLoS Biol. 2004]Nature. 2006 Mar 30; 440(7084):631-6.
[Nature. 2006]Nature. 2006 Mar 30; 440(7084):637-43.
[Nature. 2006]Genome Biol. 2004; 5(12):R96.
[Genome Biol. 2004]Nature. 2005 Nov 3; 438(7064):103-7.
[Nature. 2005]Cell. 2006 May 19; 125(4):801-14.
[Cell. 2006]Science. 2006 Dec 22; 314(5807):1938-41.
[Science. 2006]Science. 2006 Dec 22; 314(5807):1938-41.
[Science. 2006]Science. 2002 Oct 25; 298(5594):824-7.
[Science. 2002]Phys Rev E Stat Nonlin Soft Matter Phys. 2004 Sep; 70(3 Pt 1):031909.
[Phys Rev E Stat Nonlin Soft Matter Phys. 2004]Nat Genet. 2003 Oct; 35(2):176-9.
[Nat Genet. 2003]Biochem Soc Trans. 2003 Dec; 31(Pt 6):1491-6.
[Biochem Soc Trans. 2003]J Comput Biol. 2006 Apr; 13(3):810-8.
[J Comput Biol. 2006]Proc Natl Acad Sci U S A. 2005 Mar 22; 102(12):4221-4.
[Proc Natl Acad Sci U S A. 2005]Nat Biotechnol. 2005 Jul; 23(7):839-44.
[Nat Biotechnol. 2005]Bioinformatics. 2004 Dec 12; 20(18):3508-15.
[Bioinformatics. 2004]Nucleic Acids Res. 2007 Jan; 35(Database issue):D224-8.
[Nucleic Acids Res. 2007]Nucleic Acids Res. 2007 Jan; 35(Database issue):D224-8.
[Nucleic Acids Res. 2007]PLoS Comput Biol. 2006 Dec 1; 2(12):e173.
[PLoS Comput Biol. 2006]Bioinformatics. 2004 Dec 12; 20(18):3508-15.
[Bioinformatics. 2004]Science. 2002 Oct 25; 298(5594):824-7.
[Science. 2002]Nat Genet. 2004 May; 36(5):492-6.
[Nat Genet. 2004]PLoS Comput Biol. 2006 Jun 16; 2(6):e70.
[PLoS Comput Biol. 2006]Mol Cell Biol. 1997 Dec; 17(12):6982-93.
[Mol Cell Biol. 1997]Nucleic Acids Res. 2006 Jul 1; 34(Web Server issue):W6-9.
[Nucleic Acids Res. 2006]Nucleic Acids Res. 2005 Jan 1; 33(Database issue):D212-5.
[Nucleic Acids Res. 2005]PLoS Comput Biol. 2006 Jun 16; 2(6):e70.
[PLoS Comput Biol. 2006]Science. 2006 Dec 22; 314(5807):1938-41.
[Science. 2006]Bioinformatics. 2006 Apr 15; 22(8):989-96.
[Bioinformatics. 2006]BMC Bioinformatics. 2006 Apr 21; 7():218.
[BMC Bioinformatics. 2006]Nature. 2006 Mar 30; 440(7084):631-6.
[Nature. 2006]Bioinformatics. 2004 Nov 22; 20(17):3013-20.
[Bioinformatics. 2004]BMC Bioinformatics. 2003 Jan 13; 4():2.
[BMC Bioinformatics. 2003]Nat Biotechnol. 2007 May; 25(5):547-54.
[Nat Biotechnol. 2007]Genome Res. 2003 Nov; 13(11):2498-504.
[Genome Res. 2003]Genome Biol. 2003; 4(3):R22.
[Genome Biol. 2003]Oncogene. 2007 Jan 11; 26(2):198-214.
[Oncogene. 2007]Nat Biotechnol. 2007 May; 25(5):547-54.
[Nat Biotechnol. 2007]Bioinformatics. 2005 Jan 15; 21(2):272-4.
[Bioinformatics. 2005]Proc Natl Acad Sci U S A. 1998 Dec 8; 95(25):14863-8.
[Proc Natl Acad Sci U S A. 1998]Nat Biotechnol. 1999 Oct; 17(10):1030-2.
[Nat Biotechnol. 1999]Bioinformatics. 2007 Jan 15; 23(2):e91-8.
[Bioinformatics. 2007]Science. 1999 Oct 15; 286(5439):509-12.
[Science. 1999]Science. 2002 May 3; 296(5569):910-3.
[Science. 2002]Bioinformatics. 2004 Nov 22; 20(17):3013-20.
[Bioinformatics. 2004]PLoS Comput Biol. 2006 Jun 16; 2(6):e70.
[PLoS Comput Biol. 2006]