![]() | ![]() |
Formats:
|
||||||||||||||||||||||||||||||||||||
Copyright Vaske et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. A Factor Graph Nested Effects Model To Identify Networks from Genetic Perturbations 1Biomolecular Engineering Department, University of California Santa Cruz, Santa Cruz, California, United States of America 2Department of Pharmacology and Physiology, The George Washington University Medical Center, Washington, D.C., United States of America 3Institute of Statistical Science, Academia Sinica, Nankang, Taipei, Taiwan, Republic of China Anand Asthagiri, Editor California Institute of Technology, United States of America * E-mail: chyeang/at/ias.edu (C-HY); Email: phmnhl/at/gwumc.edu (NHL); Email: jstuart/at/soe.ucsc.edu (JMS) Conceived and designed the experiments: CJV CHY NHL JMS. Performed the experiments: CJV CH TL BF NHL JMS. Analyzed the data: CJV CHY NHL JMS. Contributed reagents/materials/analysis tools: CJV CH TL BF NHL JMS. Wrote the paper: CJV CHY NHL JMS. Received June 30, 2008; Accepted December 12, 2008. Abstract Complex phenotypes such as the transformation of a normal population of cells into cancerous tissue result from a series of molecular triggers gone awry. We describe a method that searches for a genetic network consistent with expression changes observed under the knock-down of a set of genes that share a common role in the cell, such as a disease phenotype. The method extends the Nested Effects Model of Markowetz et al. (2005) by using a probabilistic factor graph to search for a network representing interactions among these silenced genes. The method also expands the network by attaching new genes at specific downstream points, providing candidates for subsequent perturbations to further characterize the pathway. We investigated an extension provided by the factor graph approach in which the model distinguishes between inhibitory and stimulatory interactions. We found that the extension yielded significant improvements in recovering the structure of simulated and Saccharomyces cerevisae networks. We applied the approach to discover a signaling network among genes involved in a human colon cancer cell invasiveness pathway. The method predicts several genes with new roles in the invasiveness process. We knocked down two genes identified by our approach and found that both knock-downs produce loss of invasive potential in a colon cancer cell line. Nested effects models may be a powerful tool for inferring regulatory connections and genes that operate in normal and disease-related processes. Author Summary Biological processes are the result of the actions and interactions of many genes and the proteins that they encode. Our knowledge of interactions for many biological processes is limited, especially for cancer where genomic alterations may create entirely novel pathways not present in normal tissue. Perturbing gene expression (for example, by deleting a gene) has long been used as a tool in molecular biology to elucidate interactions but is very expensive and labor intensive. The search for new genes that may participate can be a daunting “fishing expedition.” We have devised a tool that automatically infers interactions using high-throughput gene expression data. When a gene is silenced, it causes other genes to be switched on or off, which provide clues about the pathway(s) in which the gene acts. Our method uses the genomewide on/off states as a fingerprint to detect interactions among a set of silenced genes. We were able to elucidate a network of interactions for several genes implicated in metastatic colon cancer. Genes newly connected to the network were found to operate in cancer cell invasion in human cells, validating the approach. Thus, the method enables an efficient discovery of the networks that underlie biological processes such as carcinogenesis. Introduction Carcinogenesis involves a host of cell-cell communication breakdowns that include the loss of contact inhibition, an increased potential to proliferate, and the ability to invade and spread into foreign tissue [1]. The molecular events involved in this transformation are still poorly understood. New systematic methods are needed to infer the key events responsible for these disease processes. The ability to measure gene expression changes for the entire genome in the presence of molecular perturbations, such as specific gene knock-downs, provides a new opportunity to infer gene networks in a data-driven manner. Our goal is to identify the genetic mechanisms underlying a phenotype, such as cancer cell deregulation. We take a network-based approach to the problem, starting with a set of signaling genes or S-genes, known to act in a common pathway. The input to the method is a matrix in which gene expression has been measured under the knock-down of each of the S-genes. Genes exhibiting differential expression across the knock-downs, here referred to as effect genes or E-genes, are used to predict a set of interactions among the S-genes, and expand the pathway by identifying newly implicated frontier genes based on their expression changes. We hypothesize that using a structured model of the interactions among the S-genes will improve the identification of frontier genes for inclusion in the network for subsequent rounds of investigation. Previous approaches for pathway expansion have used methods based on expression correlations to a phenotype of interest. These methods search for genes with expression profiles that are highly correlated with a particular phenotype or disease state and have led to promising results [2]–[5]. Methods using Analysis of Variance [6], false-discovery [7], and non-parametric methods [8] also have been proposed. For example, one method is to measure the correlation of gene expression levels with an idealized vector representing the phenotype (e.g. indicator variables with zeroes for disease and ones for lack of disease) [9]. One disadvantage of these methods is that they make no explicit use of the known members of a pathway or how these members interact with each other. More recently, several approaches have demonstrated learning a structured model from perturbation experiments [10]–[13]. Approaches based on Bayesian Networks have also been proposed [11],[12]. However, these approaches attempt to identify networks over the E-genes rather than the S-genes and therefore require many replicated microarray experiments to distinguish signal from noise. Instead, perturbing genes of interest and constructing networks from observations of downstream changes allows powerful interventional reasoning, as well as reconstruction of interactions not directly reflected in expression levels, such as phosphorylation. In one approach, Carter et al. (2007) [14] decompose the matrix of expression changes under single- and double-gene deletions to infer a transcriptional regulation network from which phenotypes and gene expression responses following knock-downs can be predicted. An alternative approach is the Nested Effects Model (NEM) of Markowetz et al. (2005, 2007) [10],[15], which has been used to predict interactions, including non-transcriptional interactions. Rather than searching for genetic networks that explain observational data, as several Bayesian Network approaches have done [11],[16], NEMs are useful in situations in which perturbations have been carried out on a focused set of genes. In this case, NEMs assume the interest is in a finer description of the interactions among the silenced genes rather than identifying a network of unrestricted connections between potentially additional genes. The NEM approach takes as input a matrix of expression changes, X. A column of X corresponds to a single gene knock-down (or knock-out) of one S-gene; a row corresponds to the response of an E-gene to all of the knock-downs. The method searches for approximate subset relations among the expression changes of the E-genes to organize the S-genes into a network. To do this it assumes, for example, that S-gene A is above S-gene B if the set of E-genes that change under gene A's knock-down are an approximate superset of the effected genes under B's knock-down. The current NEM approach uses binary set membership relations to identify a network and thus the exact nature of interaction between S-genes (e.g. activation or inhibition) is not determined. However, an appreciable extent of inhibition occurs in real genetic networks. To estimate the amount of inhibition present in living cells, we estimated the proportion of genes up-regulated in deletion mutants relative to wild-type from a yeast knock-out compendium [17]. Over half of the genes had increasing expression changes across the deletion strains, consistent with a high degree of inhibitory interactions in the yeast genetic network (see Figure S1). Thus, the inability to distinguish between stimulatory and inhibitory interactions may be a critical shortcoming of current NEM approaches. To address this limitation, we developed a generalization of the NEM approach using a probabilistic graphical model called a factor graph that allows a broader set of S-gene interactions to be recovered from the secondary effects of E-gene expression. This paper offers three methodological contributions. First, we present a factor graph formulation called FG-NEM that allows for an efficient search over all possible NEM structures for a high-scoring model. Second, we show how FG-NEMs extend the NEM approach for expanding the network beyond the current set of S-genes. Third, we show that FG-NEMs can model a more general class of S-gene interactions than NEMs, which increases the accuracy of network identification over an approach that considers a more restricted set of interactions. We demonstrate the usefulness of FG-NEMs on both simulated and biologically relevant signaling networks that contain both inhibition and activation. We apply FG-NEMs to identify novel genes not previously implicated in colon cancer cell invasiveness. Finally, we experimentally test FG-NEM predictions and report that knock-downs of the top-scoring genes lead to a loss-of-invasion phenotype, validating the approach. Source code is available as an R library from our website: http://sysbio.soe.ucsc.edu/projects/fgnem. Methods We first describe the Nested Effects Model, derive a maximum a posteriori objective function to identify highly probable networks, and then describe how to recode the search for a network as inference on a factor graph. We then discuss how we expand the frontier of the network by identifying new genes that have high attachment probability using modified NEM attachment scoring. Finally, we describe our method for validating the involvement of these frontier genes using directed knock-down and phenotypic assays. The Nested Effects Model Our goal is to automatically identify genetic interactions among a set of signaling genes from gene expression changes observed under their knock-down. The signaling genes represent a set of genes that prior experimental evidence suggests participate in a common pathway. To infer a network, we use an extension of the Nested Effect Model (NEM) introduced by Markowetz et al. (2005) [10]. The set of silenced genes are denoted as the set S (or S-genes). An NEM is a probabilistic formulation that measures how well a directed graph of the S-genes is consistent with expression changes collected under the separate silencing of each S-gene (i.e. only single knock-downs are considered in NEM). While the method can make use of either complete deletion mutants or genes that may be partially silenced, here we use the term knock-down to refer to either case. We denote the knock-down of S-gene A as ΔA. We also refer to a set of effect genes as the set E (or E-genes), for which gene expression data is available. The expression of an E-gene e is assumed to be influenced by at most one S-gene. The key assumption of NEMs is the expression changes observed under ΔA are an approximate superset of the changes observed under ΔB if gene A acts upstream of gene B in a pathway. We use the shorthand A>B to represent this generic directed interaction. In addition to identifying A>B, the E-gene expression changes on the microarray can be used to infer the “sign” of the interaction, either activating or inhibiting. In our framework, we extend the interactions so that an upstream gene can have either an inhibitory or stimulatory effect on downstream genes. Figure 1A
The E-gene expression changes are available in a data matrix X where each column gives the difference in expression of each E-gene under the deletion of a single S-gene relative to wild-type. X may also contain replicates in the form of repeated S-gene knock-downs. The entry XeAr represents e's expression change under the rth replicate of ΔA. Furthermore, we assume that an unknown expression “state” for each E-gene under each knock-down, determines its set of expression changes observed across the {XeAr} replicates in the microarray data. The matrix, Y, records a hidden state for each E-gene under each knock-down, where entry YeA is the state of E-gene e under ΔA. We allow the states to be ternary-valued {+1, −1, 0} representing whether e is up-regulated, down-regulated, or unchanged under ΔA relative to wild-type respectively. Nested effects models include two sets of parameters. The parameter set Φ records all pair-wise interactions among the S-genes and the parameter set Θ describes how each E-gene is attached to the network of S-genes. In the original NEM formulations [10],[15],[18] Φ is a binary matrix with entry AB set to one if S-gene A acts above S-gene B and zero otherwise. If AB =![]() BA = 1 then the S-genes are assumed to operate at an equivalent position in the pathway. Note that indirect interactions are also represented in Φ so that if AB = 1 and BC = 1 it implies AC = 1. A parsimonious network among the S-genes is solved for by computing the transitive reduction of Φ.To allow for both stimulatory and inhibitory interactions in our formulation, AB can assume six possible values for each unique unordered S-gene pair {A, B}. We refer to these values as interaction modes. The possible values are: (i) A activates B, A→B; (ii) A inhibits B, A B; (iii) A is equivalent to B, A = B; (iv) A does not interact with B, A≠B; (v) B activates A, B→A; and (vi) B inhibits A, B A.Plotting the response of E-genes under ΔA and ΔB yields a scatter-plot that may provide a signature for the type of interaction between A and B. For example, Figure 1B Note that two genes are equivalent if their knock-downs lead to significantly similar expression changes, which may predict, for example, that they form a complex. Figure 1C = B. For comparison purposes, a predicted unsigned interaction was treated as activation. In the FG-NEM AVT variant, FG-NEM is run on the absolute value of the data. In the uFG-NEM method, we remove the component of FG-NEM which models induced expression, resulting in interaction modes where the top and right five regions are disallowed in all interaction modes.Probabilistic Formulation of NEMs Our goal is to find a structure among the S-genes that provides a compact description of X. To find a network that best “fits” the data, we take a maximum a posteriori approach as in [15],[18] jointly identify Φ and Θ that maximize the posterior:
As in previous NEM formulations, we assume that each E-gene is attached to a single S-gene and that each E-gene observation vector across the knock-downs is independent of other E-gene observations. The maximization function can then be written:
Previous approaches decompose Le over the knock-downs, which assume the S-gene observations are independent given the network and attachments (see [18] for an example of such a derivation). To facilitate scoring the expanded set of interaction modes mentioned earlier, we replace Le with a function proportional to Le, Le′. Le′ is defined as a product of pair-wise S-gene terms:
AB are indexed by the unordered pair, {A, B}, so that AB and BA are references for the same variable. We refer to θeAB as e's local attachment which can take on five possible values from the set {A, −A, B, −B, 0} representing that e is either up- or down-regulated by A, attached and either up- or down-regulated by B, or not affected by either S-gene. AB defines the mode of interaction between S-genes A and B. Assuming the replicates are independent given the E-gene states, P(XeA | YeA) can be written as a product over replicate terms: , where P(XeAr | YeA) is modeled with a Gaussian distribution having mean and standard deviation σ estimated from the data (see Text S1).Substituting Le′ for Le into Eq. (7) and distributing the maximization over attachment points, we obtain the maximizing function used in our approach:
AB, θeAB) have a value of one if the E-gene e is attached to either A or B and e's state is consistent with the interaction mode between A and B. If e's state is inconsistent with the interaction and attachment, then the factor has value zero. While we used hard constraints to model consistent and inconsistent expression changes (corresponding to the rigid boundaries of the regions drawn in Figure 1CAn Interaction Transitivity Prior The prior over interactions, P(Φ), can represent preferences over specific interactions in the S-gene graph, allowing the incorporation of biologically-motivated constraints to guide network search. For example, the interaction priors for genes in a common pathway or genes whose products have been detected to interact in protein-protein interaction screens could be set higher than the priors for arbitrary pairs of S-genes. In this study, we chose to test the approach both with and without external biological information. Without external biological information, the prior encodes a basic property of the S-gene graph: that it should exhibit transitivity to force pair-wise interaction modes to be consistent among all triples. Using transitivity, all paths between any two genes, A and B, are guaranteed to have the same overall effect; i.e. the product of the signs of individual links along different paths between A and B are equal. In order to preserve the transitivity of identified interaction modes, the prior is decomposed over interaction configurations into transitivity constraints on all triples of S-genes; i.e.:
B and B C, then A→C. A result of modeling transitivity is that a directed cycle of stimulatory interactions will also imply activation between any pair of S-genes in the cycle, in both directions. Therefore, the method clusters such S-genes into equivalence interactions. The product over ρ factors in Eq. (10) encode evidence from high-throughput assays, such as protein-protein binding and protein-DNA binding interactions (see “Physical Structure Priors” in Text S1).While network structures are constrained to reflect more intuitive models, the decomposition introduces interdependencies among the interactions, adding complexity to the search for high-scoring networks. Importantly, max-sum message passing in a factor graph [19] provides an efficient means for estimating highly probable S-gene configurations. We next describe how the problem is recoded into message-passing on a factor graph. Inference on Factor Graphs to Search for Candidate S-Gene Networks The formulation above provides a definition of the objective function to be maximized but says nothing about how to search for a good network. The search space of networks is very large making exhaustive search [10] intractable for networks larger than five S-genes. To apply the method to larger networks, we require a fast, heuristic approach. Markowetz et al. (2007) introduced a bottom-up technique to infer an S-gene graph. They identify sub-graphs of S-genes (pairs and triples) and then merge the sub-graphs together into a final parsimonious graph. Fröhlich et al. (2008) [18] use hierarchical clustering to first identify modules, subsets of S-genes with correlated expression changes. Networks among the modules are exhaustively searched and a final network is identified by greedily introducing interactions across modules that increase the likelihood. Here, we introduce the use of a graphical model called a factor graph to represent all possible NEM structures simultaneously. The parameters that determine the S-gene interactions, Φ, are explicitly represented as variables in the factor graph. Identifying a high-scoring S-gene network is therefore converted to the task of identifying likely assignments of the Φ variables in the factor graph. A factor graph is a probabilistic graphical model whose likelihood function can be factorized into smaller terms (factors) representing local constraints or valuations on a set of random variables. Other graphical models, such as Bayesian networks and Markov random fields, have straightforward factor graph analogs. A factor graph can be represented as an undirected, bi-partite graph with two types of nodes: variables and factors. A variable is adjacent to a factor if the variable appears as an argument of the factor. Factor graphs generalize probability mass functions as the joint likelihood function requires no normalization and the factors need not be conditional probabilities. Each factor encodes a local constraint pertaining to a few variables. The Factor Graph for Nested Effects Figure 2 AB, that takes on values equal to one of the previously mentioned interaction modes (Figure 2
Inference with message passing The Φ that maximizes the posterior is found using max-sum message passing using all terms from Eqs. (9–10) in log space. For acyclic graphs, the marginal, max-marginal and conditional probabilities of single or multiple variables can be exactly calculated by the max-sum algorithms [19]. Message-passing algorithms demonstrate excellent empirical results in various practical problems even on graphs containing cycles such as feed-forward and feed-back loops [20]–[23]. Here, the message passing schedule performs inference in two steps. In the first step, messages from observations nodes XeAr are passed through the expression factors and hidden E-gene state variables, to calculate all messages μ(YA→ AB) in a single upward pass. In the second step, messages are passed between only the interaction variables and transitivity factors until convergence (see Text S1). In the example shown in Figure 2 AB and BC (shaded red), inhibition for BD and AD (shaded green), and non-interaction for AB and BC (unshaded), which match the NEM structure from Figure 1APathway expansion with FG-NEMs Once a signaling network is identified using the message passing inference procedure above, the network can be used to search for new genes that may be part of the pathway. The NEM and FG-NEM framework predict new members that act in the pathway by “attaching” E-genes to S-genes in the network, or leaving them detached if their expression data does not fit the model. Attaching E-gene, e, to S-gene, s, asserts that the expression changes of e over all knock-downs are best explained by a network in which e is directly downstream of s. The E-genes attached to the network are collectively referred to as the frontier. Frontier genes may be good candidates for further characterization (e.g. knock-down and expression profiling) in subsequent experiments. To gain a global picture for where e is connected, we use a modified NEM scoring from Markowetz et al. (2005). The pair-wise attachments for a single E-gene connection variable θeAB, provide local “best guesses” for e's attachment. Rather than aggregate e's collection of local attachments, we use NEM scoring, modified to incorporate both stimulatory and inhibitory attachments, to estimate the attachment point using the full network learned in the previous step (see Text S1). We calculate a log-likelihood ratio that measures the degree to which e's expression data is explained by the network if it is attached to one of the S-genes compared to being disconnected from the network, i.e. its likelihood was generated entirely by the background Gaussian distribution. For E-gene e, we compute the log-likelihood of attachment ratio (LAR):
Experimental Validation Procedure for Newly Predicted Cancer Invasion Genes To validate the involvement of predicted invasiveness frontier genes, HT29 colon cancer cells were resuspended in DMEM medium containing 0.1% FBS and seeded into the top wells (2×105 per well) containing individual Matrigel inserts (BD Biosciences, San Jose, CA) according to manufacturer's protocol. The lower wells were filled with 800 µl medium with 10% fetal bovine serum as chemoattractant. Six to ten hours following seeding, the cells in the upper wells were transfected with the appropriate shRNA-expressing pSuper constructs [25] using Lipofectamine 2000 (Invitrogen, Carlsbad, CA). Final concentration of pSuper constructs was 1.6 µg/ml. The transfected cells were incubated at 37°C for 48 hours before assaying for invasion. Media was aspirated from the top wells and non-invading cells were scraped from the upper side of the inserts with a cotton swab and invading cells on the lower side were fixed and stained using DiffQuick (IMEB, Inc. San Marcos, CA). Total number of invading cells was counted for each insert using a light microscope. Invasion was assessed in quadruplicate and independently repeated at least five times. The shRNA-expressing portion of the construct was designed using the siRNA Selection Program of the Whitehead Institute for Biomedical Research (http://jura.wi.mit.edu/bioc/siRNAext/), synthesized by Invitrogen and subcloned into the XhoI and BamHI sites of pSuper plasmid. Sequences for shRNA constructs are available in the Text S1. shRNA construct MYO1G targets the myosin 1G mRNA (GenBank accession number NM _033054). shRNA construct BMPR1A targets the bone morphogenetic protein receptor, type IA mRNA (NM_004329). shRNA construct COLEC12 targets the collectin sub-family member 12 mRNA (NM_130386). shRNA construct AA099748 targets an expressed sequence tag mRNA (AA099748). shRNA construct CAPN12 targets the calpain 12 mRNA (NM_144691). shRNA construct scrambled serves as a nonsense sequence negative control. Results Results on Artificial Networks Data We evaluated FG-NEMs ability to recover artificial networks from simulated data. Data was generated by propagating signals in networks containing simulated knock-downs and then sampling expression data from activated, inhibited, or unaffected expression change distributions (see Text S1 and Figure S3). We focused on how the FG-NEM approach increased recovery of networks that contain both activation and inhibition. Because FG-NEMs explicitly incorporate inhibition, we hypothesized that they would recover networks containing an appreciable amount of inhibition more accurately than an approach lacking separate modes for inhibition and activation. We implemented a version of FG-NEM in which inhibition encoded in the FG-NEM model was removed (see Methods). We refer to this version as the “unsigned” FG-NEM (uFG-NEM). We compared uFG-NEM to the original NEM approach and found that the results were comparable on small synthetic networks of four S-genes and their associated data (see Figure S2). We therefore used uFG-NEMs as a surrogate for NEMs for the tests on larger networks on which NEM was not efficient enough to run. To make the comparison of FG-NEM to uFG-NEM fair, we measured network recovery in two ways. 1) We calculated a measure of structure recovery: a predicted interaction was called correct if it matched an interaction (of either sign) in the simulated network. In this case, whether the interaction was inhibitory or stimulatory was ignored. 2) We measured sign recovery: a predicted interaction was recorded as correct if it matched an interaction in the simulated network and had the matching sign. Influence of inhibition extent on network recovery We tested the ability of FG-NEMs and uFG-NEMs to recover the structure of networks simulated with varying fractions of inhibition, 0≤λ≤0.75, for both the amount of inhibitory connections between S-genes and inhibitory attachments of E-genes. We simulated and predicted 500 networks, calculated the area under the precision-recall curve (AUC) for each predicted network (see Text S1), and recorded the mean and standard deviation of these AUCs. As expected, when no inhibition was present, FG-NEM and uFG-NEM were equivalent in terms of AUC when run on non-transformed data (Figure 3A
We repeated the experiment of varying inhibition to match our expectations for application to the cancer invasion network discussed subsequently. In the invasion network the known S-genes were recovered in such a way that only activating S-gene connections were identified. To simulate this situation, we created networks containing only activating S-gene interactions but varied the proportion of inhibiting E-gene attachments. Even in this situation where all of the known S-genes have activating interactions, FG-NEM's performance begins to significantly surpass uFG-NEM's performance when 40–60% of the E-genes are connected with inhibitory attachments (see Figure S4C). Thus, according to our simulations, even in cases where activation predominates the S-gene interactions, incorporating sign in the model for E-gene changes can lead to higher network recovery accuracies. We expect the signed FG-NEM to also perform well for the invasion network where 40–60% of the expression changes are consistent with inhibited E-gene attachments. Expansion of artificial networks compared to Template Matching Because our goal is to elucidate the network of genes involved in the colon cancer invasiveness pathway, we measured the ability of our method to expand the network to new genes involved in the pathway compared to a correlation-based method we refer to as Template Matching (TM) used by Irby et al. (2005) [26]. Briefly, Template Matching [9] ranks genes based on the correlation of their expression profiles to an idealized profile/template that reflects a phenotype of interest. TM has been used in several studies to identify genes with expression patterns that follow a series of phenotypes [27],[28]. We found that FG-NEMs significantly outperform TM when used to expand artificial networks (Figure 3B Network Expansion on a Yeast Knock-Out Expression Compendium We hypothesized that an estimate of genetic pathway structure based on modeling observed expression changes could facilitate the identification of new pathway members. To test this, we evaluated the ability of FG-NEMs, uFG-NEMs, and TM to identify genes involved in a diverse set of pathways in S. cerevisae using the well-studied gene expression dataset from the Hughes et al. (2000) knock-out compendium elucidated by Rosetta [17]. This compendium contains whole-genome expression profiles of 276 yeast gene-deletion mutants and P values for differential gene expression. Data In each deletion strain, gene expression changes with a p-value smaller than 0.05 were selected, and then labeled as activated or inhibited according to the sign of their expression log-ratio. p-values were converted to continuous expression values using the method of Yeang et al. (2004) [13]. The method replaces a p-value with a value obtained by inverting a Chi-square distribution. The value can be interpreted as a log-likelihood ratio reflecting the probability that an E-gene is expressed in the affected distribution compared to a background distribution. Gene sets, representing proxies for pathways, were taken from Gene Ontology (GO) [29], KEGG [30] and Reactome [31] information. 25 non-redundant pathways were selected that had at least 5 genes included as knock-outs in the knock-out compendium. The largest pathway, chromosome organization and biogenesis, contained 45 S-genes. On a 2.83 GHz processor, factor graph inference using 5046 E-genes took a total of 1828 seconds. A pathway with 12 genes, such as nitrogen compound metabolism, took 38 seconds for network inference. The factor graph approach allows prior information to be incorporated. We tested a supervised variant of FG-NEMs (sFG-NEM) in which additional factors were incorporated to reward models that included known interactions. Three classes of physical data were downloaded for use as interaction priors: protein-DNA interactions, phosphorylation target data, and protein-protein interactions (PPI). Protein-DNA interactions with a p-value less than 0.001 were selected from the study of Lee et al. (2002) [32]. Data describing kinase targets was taken from the study of Ptacek et al. (2005) [33]. PPI data was downloaded from the BioGRID database [34] on July 30, 2008. For each GO category under study, we selected any interaction between S-genes in that category, resulting in 27 Protein-DNA interactions, 4 phosphorylation interactions, and 64 PPIs for the GO sets discussed in this paper. For each unique physical interaction, we added an additional factor to the corresponding interaction variable to increase the likelihood of consistent interaction modes and decrease the likelihood of inconsistent modes (see Text S1). Pathway expansion performance The accuracy of FG-NEMs for expanding each pathway to include new genes was measured. The likelihood of attachment ratio (LAR) score for each gene in the genome was calculated and the area under the precision-recall curve (AUC) was computed (see Methods). For each pathway, an AUC ratio was then calculated by dividing each method's AUC by the AUC calculated from randomly guessing E-genes for attachment to the network. Pathways sharing 25% or more of their genes with another pathway of higher AUC were ignored. Five non-redundant pathways were found that had AUCs significantly better than random guessing for at least one of the methods. While the precision of FG-NEM over uFG-NEM was not significant at any specific recall range, its overall higher precision across a broad range of recalls reflects a systematic improvement. Figure 4A
Except for ribosome biogenesis, FG-NEMs performed comparably or better than uFG-NEMs and TM (Figure 4B Incorporating physical interaction priors showed little effect on network expansion performance. For most of the pathways, the performance of sFG-NEMs was indistinguishable from its unsupervised counterpart. A slight improvement was seen for the nitrogen metabolism pathway. Incorporation of structural priors adds activation from GLN3 to YEA4, and from ARG80 to ARG5,6, and slightly boosts the predictive power of the network. Thus, FG-NEM can usually identify new pathway genes in the unsupervised setting as well as when known interactions are provided. Interestingly, the largest change in performance resulting from the use of prior information was a small drop observed for predicting genes involved in the sexual reproduction pathway. We investigated this decrease and found that using protein-DNA priors forced the placement of a transcription factor STE12 to the top of the pathway, whereas placement toward the bottom seemed to better fit the expression changes. Consequently, FG-NEM ranks the sexual reproduction E-genes higher than sFG-NEM. On average, physical interaction priors increase the compatibility of FG-NEM predictions with high-throughput physical data. A leave-one-out analysis was used to test the ability of physical interaction data to improve pair-wise interaction predictions. To compare improvement in network structure prediction, we calculated the margin of compatibility (MOC) to reflect how well predicted interactions match held-out physical evidence (see Methods). Negative MOCs are assigned to predicted interactions that are incompatible with the physical evidence, while positive MOCs assigned to compatible predictions. For each held-out physical interaction, a network was computed using all other physical interaction data. Figure 4C Of the 163 physical interactions, 104 (63%) have higher while 43 (26%) have lower MOC in sFG-NEM than FG-NEM. Of these 43, 33 have positive MOCs for both approaches (i.e. both agree with the physical evidence). Notably, of the 93 that achieved higher compatibilities in sFG-NEM, 38 (23%) became compatible only when the physical evidence was included. One example is the interaction between CDC42 and FAR1 in the sexual reproduction pathway. FAR1 acts downstream of CDC42 in the pheromone response signal cascade. The FAR1 gene deletion shows little expression change and is not placed downstream of CDC42 even though CDC42 is placed at the top of the signaling cascade by FG-NEM. With the inclusion of other structural priors, FAR1 is correctly placed downstream of CDC42. Thus, incorporating known interactions, even from possibly noisy high-throughput sources, can increase the likelihood of finding other interactions. However, the caveat is that such information may force a poorer fit to the observed expression data which could decrease the accuracy of frontier expansion. Predicted inhibition in ion homeostasis pathway FG-NEMs achieved significant improvement over the unsigned variant on the ion homeostasis pathway. To gain insights into the structural predictions underlying the difference in performance of the methods, we compared the predicted S-gene networks of the FG-NEM and uFG-NEM methods for this pathway (Figure 4D Both the FG-NEM and uFG-NEM correctly predicted the equivalence of CKA2 and CKB2 which together form a complex. Of the top fifteen frontier genes predicted by FG-NEM, eight are annotated by GO as involved in ion homeostasis (Table S2), FRE2 is involved in ion transport, YGL039W is an oxidoreductase, and ARO9 is involved in amino acid catabolism. In contrast, only one of the top uFG-NEM frontier genes, GRX4, is annotated by GO as involved in ion homeostasis. Examining the top 20 true positives predicted to be attached by FG-NEM, 19 were found to be predicted to be repressed by their S-gene. These true positives were not predicted to be attached to the network by uFG-NEM. Thus, the inability to make use of the explicit depression of E-genes may contribute to the poorer performance of the unsigned method. Application to Colon Cancer Invasiveness We applied the FG-NEM approach to a human colon cancer invasiveness network elucidated by Irby et al. (2005) [26]. In this work, the authors identified several “tiers” of genes implicated in the invasion process under the control of SRC kinase. Genes were included in a tier if their knock-downs were found to produce a significant drop in the invasive potential of HT29 colon cancer cells as defined by invasion through Matrigel. To identify additional genes involved in the invasion process, the authors measured gene expression under an RNA interference knock-down of each gene in the tier. Genes whose expression was lower in the knock-downs producing loss-of-invasiveness, and higher in knock-downs that did not produce loss-of-invasiveness, were considered candidates for inclusion in the next tier. In this fashion, each tier was formed by knocking-down each candidate gene and assaying for loss-of-invasion in Matrigel. Data We applied FG-NEMs to the five S-genes from the second tier of Irby et al. (2005). These five human genes are cytokeratin 20 (KRT20), transcription factor Dp-1 (TFDP1), DEAH (Asp-Glu-Ala-His) box polypeptide 32 (DHX32), ribosomal protein L32 (RPL32), and glutaminase (GLS). Knock-down of each second-tier S-gene has been demonstrated to significantly reduce the invasion phenotype of HT29 colon cancer cells (Irby et al., 2005). KRT20 has historically served as a diagnostic marker for colorectal carcinoma [39], whereas high expression of ribosomal protein L32, glutaminase, and DEAD/H box polypeptides has been associated with various cancers and metastatic lesions [40],[41]. For this study, S-genes from the first tier were excluded as the expression profiles from the knock-down experiments were collected on a different microarray platform and therefore cross-platform normalization issues could potentially impact the results. The Expression Factor parameters were estimated from genes found to be up- or down-regulated by running the Statistical Analysis of Microarrays algorithm (SAM) [5], with a False Discovery Rate of 1%, on gene expression data collected on a panel of knock-downs. Using the differentially expressed genes yielded an estimate of 1.75 for the mean log2 ratio of the inhibited E-gene distribution (−1.75 for the activated E-gene distribution), and a standard deviation of 0.5 for the Gaussian mixture model (see Methods). Several of these knock-downs led to loss-of-invasiveness while others produced invasive growth in the Matrigel assay as reported by Irby et al. (2005). The hybridization data and associated normalization information can be accessed from the Gene Expression Omnibus (GEO) database [42] under the series accession number GSE11848 and associated platform accession number GPL6978. A subset of this data containing the SAM-selected E-genes can be obtained from Dataset S1. Cancer invasion network identification We applied FG-NEMs to recover a network for the second-tier genes. We included E-genes that demonstrate a robust and significant effect under at least two of the knock-downs included in the Irby et al. (2005) study. We selected genes whose log2 ratios differ by less than 0.5 in replicate arrays and had an absolute log2 expression change at least equal to the mean absolute level of the activated distribution (1.75) in at least two arrays. Using these criteria, we identified 185 E-genes to use for model inference. Figure 5A
FG-NEM recovered the network shown in Figure 5B = 0.006), although a second knock-down experiment (using a silencing RNA differing from the first series that targets a different region of the KRT20 mRNA) resulted in a weaker connection (P = 0.534). Consequently, one could designate this link as deserving of follow-up functional studies (e.g. promoter analysis or chromatin immunoprecipitation). Though GLS is connected to the network, the likelihood of interaction was not strong enough to be significant (Figure 5BThe FG-NEM model predicts that TFDP1 is at the bottom of the signaling cascade, which may reflect its role as part of the E2F transcriptional complex in targeting the expression of downstream genes that promote cell proliferation and invasion [43],[44]. The ribosomal subunit, RPL32 is curiously placed upstream of the DP1 transcription factor and at an equivalent level with the structural molecule KRT20. Aberrant expression of ribosomal proteins has been noted in a variety of cancers, although the molecular consequence of these expression changes is unknown [45]. It has been postulated that ribosomal proteins may play an important extraribosomal role (i.e. beyond translation) in the oncogenic transformation process [45]. Because the number of S-genes in the second tier is small, we compared the heuristic pair-wise search employed by FG-NEM to an exhaustive model search. If the heuristic approach is reasonable, it should identify network models that are among the highest scoring models identified by brute-force enumeration. To perform a brute-force search, we generated 1000 random networks among the five second-tier genes. For each network, we calculated the data likelihood using message passing. Out of the 1000 randomly enumerated networks, the recovered network for the second-tier genes had a likelihood higher than 997 of the random networks. Interestingly, all three of the random networks with higher scores had identical structures to the network recovered by FG-NEM except that all three networks differed in their attachment of DHX32 and GLS. This result demonstrates that the pair-wise heuristic search employed by FG-NEM successfully identifies high-scoring networks in the space of all networks. While we need to test the trend for increasing network sizes, these results are promising for scaling up to larger networks in which exhaustive search will not be feasible. Cancer invasion frontier expansion We used the highest-scoring model recovered by the FG-NEM to search for additional genes involved in colon cancer invasiveness by sorting each gene by its LAR score (see Methods). We found 19 positive and 31 negative attachments with significant probabilities (Table 1 and Table S3). Significance of the attachments was assessed by permuting each E-gene's observations, relearning a FG-NEM network, and computing its LAR score to construct an empirical null distribution of LARs. The E-genes with the highest attachment probabilities and positive LAR scores found to be significant via permutation testing are shown in Table 1. Many of the genes in Table 1 have roles consistent with cancer cell invasion. For example, three E-genes encode proteases, including the metalloproteases ADAM9 and ADAM19. The metalloproteases represent a class of transmembrane proteins that are known facilitators of cell migration and invasion by proteolytic cleavage of extracellular matrix components [46]. Interestingly, ADAM21 is included among the first tier genes of Irby et al. (2005). This demonstrates that FG-NEM is able to identify two additional family members of this first tier gene even though it was not included in the S-gene set used in network learning. Glial fibrillary acid protein (GFAP) and Testes-specific protease 50 (TSP50) are also included in Table 1. GFAP is known to interact with the oncogenic tyrosine kinase SRC [47] and involved in astrocyte tumor invasiveness [48], while TSP50 has been shown to be differentially regulated in both breast and testicular cancer [49],[50]. Thus, FG-NEMs predict that an expanded set of proteases may play a role in the colon cancer invasion process. Also included among the set of genes in our expanded invasion network is a second keratin family member, keratin 13 (KRT13), which is consistent with the previous identification of KRT20 in the second tier and may reflect a structural underpinning needed for invasion. Several of the genes in Table 1 represent novel connections of genes to the colon cancer invasiveness pathway. For example STK24, is a highly conserved protein whose homolog in S. cerevisiae, STE20, is involved in signal transduction of pseudo-hyphal growth [51]. It is intriguing to consider the possibility that part of the invasiveness pathway could be due in part to the aberrant regulation of an ancient cell migration process that dates back to single-cellular organisms. The E-genes with positive LAR scores constitute the network “frontier” of the cancer invasiveness pathway in that they are predicted to directly interact with the second-tier genes. From among the 38 genes with positive and significant LAR scores, two were arbitrary selected to test for a loss-of-invasiveness phenotype in HT29 cells as defined by invasion in Matrigel. We selected CAPN12 and expressed sequence tag AA099748 from Table 1 for gene knock-down experiments. CAPN12 is a member of the calpain gene family, which has been shown to have fibrillin activity. Genbank EST accession AA099748 aligns to the genome 3′ to the gene CHMP4C, along with the EST AW440175, both from cancer tissues. Additionally, the amino acid translations of these ESTs align to the N-terminus of CHMP4C with 48% identity. The C-terminal tail of CHMP4C was recently shown [52] to be bound by the apoptosis inhibitor PDCD6IP, suggesting that the cancer-specific splice form of CHMP4C may have altered binding behavior with PDC6IP. PDC6IP also has been implicated in a broad array of membrane associated processes, including cell adhesion [53]. Serving as negative controls, we performed knock-down experiments for three E-genes that had low attachment probabilities, namely MYO1G, BMPRIA and COLEC12. As correctly predicted by FG-NEM, both E-genes with high LAR scores produced significant loss of invasion while all three E-genes with low LAR scores did not lead to loss-of-invasion in the Matrigel assay (Figure 5C Discussion The factor graph nested effects model (FG-NEM) provides a general methodology for inferring networks from knock-down phenotypes. Our results extend the nested effects models in three significant ways: 1) we provide a means for efficiently searching for large S-gene networks using inference on a factor graph that can also incorporate prior information; 2) our method distinguishes activating from inhibiting interactions; and 3) we show that NEM attachment can be used successfully to expand the network to new pathway members. Our results on simulated and yeast networks suggest explicitly modeling inhibition and activation, rather than treating as generic interactions or effects, leads to higher accuracies for recovering known interaction networks and identifying members of the a pathway. Applying FG-NEM predictions to a series of follow-up experiments in an HT29 colon cancer cell line model has identified new gene members of the tumor invasiveness pathway. Specifically, shRNA-mediated knock-down of two genes predicted to be connected to the original rudimentary network of Irby et al. [22] led to a significant loss of invasiveness whereas three genes predicted not to be connected did not result in a loss of invasive phenotype following knock-down. Our results suggest FG-NEM improves upon the iterative strategy followed by Irby et al. [26]. The iterative procedure of Irby et al. produces a graph in which genes in a tier are connected only to genes in the next tier. The graph does not necessarily reflect the signaling events underlying invasion. Rather, it encodes the chronological order by which the genes were elucidated. In contrast, FG-NEM seeks a structured model that relates the genes within and across tiers, which may provide a better understanding of the signaling and regulatory events leading to cancer cell invasion. In addition, rather than using differential expression as a criterion to expand the network, FG-NEMs search for genes that have expression changes coherent with the dependencies encoded in the learned structure. FG-NEMs were able to identify two confident relationships among the genes in the second tier that the previous iterative strategy of Irby et al. (2005) could not identify. The equivalence of RPL32 and KRT20 as well as the downstream relation of TFDP1 and DHX32 to these two genes is a first step toward refining the architecture of the colon cancer invasiveness network. Moreover, these findings suggest that RPL32 may play an important extraribosomal function by regulating TFDP1 mRNA expression. We envision applying the FG-NEM approach within an iterative computational-experimental framework. As a network is expanded, the frontier genes of one round of investigation can be included as S-genes in subsequent rounds. Iteration will therefore provide larger sets of S-genes on which to infer networks. While the primary data used for such network expansion is based on gene expression data, it will be intriguing to investigate whether a variety of transcriptional and non-transcriptional interactions can be recovered with this approach. There are many examples of coupling between transcription and non-transcriptional interactions in biological systems. An E-gene e attached to S-gene A does not necessarily imply the signaling between A and e is transcriptional in nature. Consider a metabolic cascade in which A's product produces substrate s1, which is converted to s2 by e, which is a substrate of an enzyme encoded by a second S-gene B: A→s1→e→s2→B. Furthermore, assume that the cell has a mechanism to “sense” the amount of s1 and that this mechanism controls the transcription of e. Deletion of A in this scenario will lead to a decrease in s1 which will cause e's expression to decrease. Thus, promotion of e to the network in this case could reveal a new gene involved in “signaling” via metabolic transformation. The lac operon in bacteria uses a similar coupling between the expression of the enzymes in the pathway to sense cellular concentrations of lactose [54]. As another example, consider the metazoan phosphorylation cascade in which signaling between S-genes is coupled to their own mRNA production. Phosphorylation of the transcription factor heterodimer Jun and Atf2 by Jnk then promotes transcription of the JUN gene [55]. More Jun protein is made, leading to dimerization with another protein, Fos, which activates transcription of other downstream genes. Knock-down of JNK results in transcriptional down-regulation of JUN. Thus, promotion of JUN from an E-gene to the network would reveal a member of the pathway involved in post-translational signaling even though it was detected through transcriptional perturbation. Several aspects of the method could be improved upon in the future. The method could be extended to use over-expression of S-genes in addition to knock-downs. Over-expression of an S-gene would be expected to have an opposite effect on downstream E-genes compared to the E-gene effects observed under the S-gene's knock-down. Thus, the E-gene responses could be compared to an expanded list of interaction modes, derived by flipping the scatter-plots in Figure 1 In this study of the colon cancer invasiveness pathway, S-gene interaction configurations were forced to reflect transitive connections but did not incorporate any external biological information. Additional knowledge, such as gene coexpression groups, or protein-protein interaction potentials, could be incorporated into the prior for making inferences about the cancer invasiveness pathway. For example, several gene expression experiments on invasive colon cancer cell lines are available in GEO [42]. It would be interesting to extract sets of genes that are up- or down-regulated in invasive versus non-invasive cancer cells consistently across multiple studies. Any S-genes present in such recurrent sets could be associated with higher pair-wise interaction priors than arbitrary S-gene pairs. However, since we observed a decrease in performance for pathway expansion on the yeast networks, we chose not to attempt this at this time. We modeled transitivity using deterministic factors. While this provides an intuitive interpretation of such constraints and increases the speed of convergence of message passing, relaxing these constraints to general belief potentials could allow a broader exploration of the search space. Imposing transitivity in the current framework disallows cycles of inhibitory links. However, it is possible to extend our method to incorporate such cycles, in which new interaction modes are introduced. For example, the cycle A→B C→A would imply B A, which could be modeled using a new type of interaction mode capturing A's activation on B and B's inhibition on A.The methods could be extended to incorporate richer information such as degrading signals and higher-order knock-downs (single, double, triple, etc) as in Carter et al. (2007) [14]. Our formulation assumes that the effects of a knock-down do not degrade along a pathway and also neglects combinatorial interactions of multiple genes. FG-NEMs allow higher-order knock-down combinations to be incorporated into a search for high-scoring networks. Using only single knock-downs, it may be impossible to identify certain relationships such as the synthetic effects of two parallel pathways converging to one gene. In principle, FG-NEM can handle higher-order relations by extending the pair-wise likelihood term to contain three or more genes. However, the large numbers of possible combinatorial relations and combinations of knock-down experiments required to elucidate the relations, as well as the propagation of complexity along the pathways, would make the problem more difficult. In our network expansion approach, we assumed genes whose expression levels are well-explained by the model are of more interest for subsequent rounds of experimentation, although there are other ways to approach this question from an experimental design perspective. For example, it would be conceivable to test whether selecting genes based on reducing a measure of uncertainty across models leads to better gene selection as in [13]. An “active learning” approach prioritizes knock-down experiments based on the reduction of expected entropy of high-scoring models. The “informative” experiments would effectively disambiguate the models which explain the existing data. Fewer experiments might then be needed to narrow down a unique model of the underlying system [56],[57]. Finally, the approach could be applied to the unsupervised discovery of regulatory interactions among E-genes rather than S-genes. In recent work, Sahoo et al. (2008) [58] applied a pair-wise scoring approach for detecting Boolean implications based on gene expression changes observed across hundreds of microarray studies. Similarly, FG-NEMs could use the expression changes measured across a diverse array of conditions to score gene pairs against interaction mode templates (Figure 1C Conclusions We applied FG-NEMs to discover a human signaling network among genes involved in colon cancer cell invasiveness. The method formalizes and extends analysis of genetic interactions using high-dimensional quantitative phenotype data in the form of gene expression changes observed under specific perturbations. It makes explicit use of the knock-downs of known members of a pathway to identify how the members interact with one another and for identifying new members. The method predicts several genes with new roles in the cancer invasiveness process, two of which were verified to act in the pathway based on an ex vivo invasion assay. Thus, the FG-NEM approach may be a powerful tool for inferring regulatory connections and for identifying new partners of genes known to operate in a process of interest. The application of structured causal models for pathway identification and expansion promises to greatly accelerate the discovery of genetic pathways from genetic knock-downs and other intervention-based experiments. Dataset S1 Colon cancer invasion data for SAM-selected E-genes. Sheet 1. Selected E-genes and their expression. Sheet 2. Input to SAM for determining parameters of the Gaussian mixture in the Expression Factors. Sheet 3. SAM results used for determining the parameters of the Gaussian mixtures in the Expression Factors. (6.09 MB XLS) Click here for additional data file.(5.8M, xls) Figure S1 Observed inhibitory effects and signaling in yeast compendiums Evidence for inhibition from measured responses of knockdown, and from annotation in curated pathways. (0.09 MB PDF) Click here for additional data file.(89K, pdf) Figure S2 Comparison of uFG-NEM and exhaustive NEM model search for structure recovery. (0.07 MB PDF) Click here for additional data file.(68K, pdf) Figure S3 Estimating difference in gene expression between activation and inhibition. (0.08 MB PDF) Click here for additional data file.(82K, pdf) Figure S4 Accuracy of network recovery as a function of S-gene knowledge and number of microarray replicates, and E-gene inhibition. (0.10 MB PDF) Click here for additional data file.(102K, pdf) Table S1 Yeast Knockout Compendium Pathway AUC. AUC and AUC-ratios for expansion of Yeast pathways. (0.03 MB XLS) Click here for additional data file.(31K, xls) Table S2 Ion Homeostasis Frontier Genes. Genes most likely to be attached to the ion homeostasis network for both the FG-NEM and uFG-NEM methods. Genes are sorted by LAR. (2.75 MB XLS) Click here for additional data file.(2.6M, xls) Table S3 Invasiveness E-gene LAR Scores. Connection point, connection strength, and connection significance of E-genes in colon cancer network. (0.10 MB XLS) Click here for additional data file.(98K, xls) Text S1 Supplemental methods and results. (0.12 MB DOC) Click here for additional data file.(118K, doc) Acknowledgments We would like to thank M.T. Weirauch for helpful discussions regarding early drafts of the manuscript. Footnotes The authors have declared that no competing interests exist. CJV was supported by a National Science Foundation (NSF) Graduate Research Fellowship. JMS was supported by a grant from the NSF's Division of Biological Infrastructure DBI-0543197 and by a fellowship from the Alfred P. Sloan foundation. JMS and NHL were supported by National Institutes of Health grant CA120316. CHY was supported by the Leon Levy Foundation and the Simons Foundation at the Institute for Advanced Study. References 1. Hanahan D, Weinberg RA. The hallmarks of cancer. Cell. 2000;100:57–70. [PubMed] 2. Allison DB, Cui X, Page GP, Sabripour M. Microarray data analysis: from disarray to consolidation and consensus. Nat Rev Genet. 2006;7:55–65. [PubMed] 3. Barrier A, Lemoine A, Boelle PY, Tse C, Brault D, et al. Colon cancer prognosis prediction by gene expression profiling. Oncogene. 2005;24:6155–6164. [PubMed] 4. Dudoit S, Yang Y, Callow M, Speed T. Statistical methods for identifying differentially expressed genes in replicated cDNA microarray experiments. Stat Sin. 2002;97:111–139. 5. Tusher VG, Tibshirani R, Chu G. Significance analysis of microarrays applied to the ionizing radiation response. Proc Natl Acad Sci U S A. 2001;98:5116–5121. [PubMed] 6. Kerr MK, Martin M, Churchill GA. Analysis of variance for gene expression microarray data. J Comput Biol. 2000;7:819–837. [PubMed] 7. Storey JD, Tibshirani R. Statistical methods for identifying differentially expressed genes in DNA microarrays. Methods Mol Biol. 2003;224:149–157. [PubMed] 8. Troyanskaya OG, Garber ME, Brown PO, Botstein D, Altman RB. Nonparametric methods for identifying differentially expressed genes in microarray data. Bioinformatics. 2002;18:1454–1461. [PubMed] 9. Pavlidis P, Noble WS. Analysis of strain and regional variation in gene expression in mouse brain. Genome Biol. 2001;2:RESEARCH0042. [PubMed] 10. Markowetz F, Bloch J, Spang R. Non-transcriptional pathway features reconstructed from secondary effects of RNA interference. Bioinformatics. 2005;21:4026–4032. [PubMed] 11. Pe'er D, Regev A, Elidan G, Friedman N. Inferring subnetworks from perturbed expression profiles. Bioinformatics. 2001;17:S215–224. [PubMed] 12. Sachs K, Perez O, Pe'er D, Lauffenburger DA, Nolan GP. Causal protein-signaling networks derived from multiparameter single-cell data. Science. 2005;308:523–529. [PubMed] 13. Yeang CH, Ideker T, Jaakkola T. Physical network models. J Comput Biol. 2004;11:243–262. [PubMed] 14. Carter GW, Prinz S, Neou C, Shelby JP, Marzolf B, et al. Prediction of phenotype and gene expression for combinations of mutations. Mol Syst Biol. 2007;3:96. [PubMed] 15. Markowetz F, Kostka D, Troyanskaya OG, Spang R. Nested effects models for high-dimensional phenotyping screens. Bioinformatics. 2007;23:i305–i312. [PubMed] 16. Friedman N. Inferring cellular networks using probabilistic graphical models. Science. 2004;303:799–805. [PubMed] 17. Hughes TR, Marton MJ, Jones AR, Roberts CJ, Stoughton R, et al. Functional discovery via a compendium of expression profiles. Cell. 2000;102:109–126. [PubMed] 18. Fröhlich H, Fellmann M, Sultmann H, Poustka A, Beissbarth T. Estimating Large scale signaling networks through nested effect models with intervention effects from microarray data. Bioinformatics. 2008;24:2650–2656. [PubMed] 19. Kschischang FR, Frey BJ, Loeliger H. Factor graphs and the sum-product algorithm. IEEE Trans Inf Theory. 2001;47:498–519. 20. Frey B, MacKay D. A revolution: belief propagation in graphs with cycles. Adv Neural Inf Process Syst. 1997;10:479–485. 21. Frey BJ, Dueck D. Clustering by passing messages between data points. Science. 2007;315:972–976. [PubMed] 22. MacKay D, Neal R. Good error-correcting codes based on very sparse matrices. Cryptography and Coding. 1995 23. Yedidia JS, Freeman WT, Weiss Y. Generalized belief propagation. Adv Neural Inf Process Syst. 2000;13:689–695. 24. Wagner A. Reconstructing pathways in large genetic networks from genetic perturbations. J Comput Biol. 2004;11:53–60. [PubMed] 25. Brummelkamp TR, Bernards R, Agami R. A system for stable expression of short interfering RNAs in mammalian cells. Science. 2002;296:550–553. [PubMed] 26. Irby RB, Malek RL, Bloom G, Tsai J, Letwin N, et al. Iterative microarray and RNA interference-based interrogation of the SRC-induced invasive phenotype. Cancer Res. 2005;65:1814–1821. [PubMed] 27. Balasenthil S, Gururaj AE, Talukder AH, Bagheri-Yarmand R, Arrington T, et al. Identification of Pax5 as a target of MTA1 in B-cell lymphomas. Cancer Res. 2007;67:7132–7138. [PubMed] 28. Letwin NE, Kafkafi N, Benjamini Y, Mayo C, Frank BC, et al. Combined application of behavior genetics and microarray analysis to identify regional expression themes and gene-behavior associations. J Neurosci. 2006;26:5277–5287. [PubMed] 29. Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, et al. Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat Genet. 2000;25:25–29. [PubMed] 30. Ogata H, Goto S, Sato K, Fujibuchi W, Bono H, et al. KEGG: Kyoto Encyclopedia of Genes and Genomes. Nucleic Acids Res. 1999;27:29–34. [PubMed] 31. Joshi-Tope G, Gillespie M, Vastrik I, D'Eustachio P, Schmidt E, et al. Reactome: a knowledgebase of biological pathways. Nucleic Acids Res. 2005;33:D428–D432. [PubMed] 32. Lee TI, Rinaldi NJ, Robert F, Odom DT, Bar-Joseph Z, et al. Transcriptional regulatory networks in Saccharomyces cerevisiae. Science. 2002;298:799–804. [PubMed] 33. Ptacek J, Devgan G, Michaud G, Zhu H, Zhu X, et al. Global analysis of protein phosphorylation in yeast. Nature. 2005;438:679–684. [PubMed] 34. Breitkreutz BJ, Stark C, Reguly T, Boucher L, Breitkreutz A, et al. The BioGRID Interaction Database: 2008 update. Nucleic Acids Res. 2008;36:D637–D640. [PubMed] 35. Cook JG, Bardwell L, Kron SJ, Thorner J. Two novel targets of the MAP kinase Kss1 are negative regulators of invasive growth in the yeast Saccharomyces cerevisiae. Genes Dev. 1996;10:2831–2848. [PubMed] 36. de Nadal E, Clotet J, Posas F, Serrano R, Gomez N, et al. The yeast halotolerance determinant Hal3p is an inhibitory subunit of the Ppz1p Ser/Thr protein phosphatase. Proc Natl Acad Sci U S A. 1998;95:7357–7362. [PubMed] 37. Stathopoulos-Gerontides A, Guo JJ, Cyert MS. Yeast calcineurin regulates nuclear localization of the Crz1p transcription factor through dephosphorylation. Genes Dev. 1999;13:798–803. [PubMed] 38. Kafadar KA, Zhu H, Snyder M, Cyert MS. Negative regulation of calcineurin signaling by Hrr25p, a yeast homolog of casein kinase I. Genes Dev. 2003;17:2698–2708. [PubMed] 39. Moll R. Cytokeratins in the histological diagnosis of malignant tumors. Int J Biol Markers. 1994;9:63–69. [PubMed] 40. Causevic M, Hislop RG, Kernohan NM, Carey FA, Kay RA, et al. Overexpression and poly-ubiquitylation of the DEAD-box RNA helicase p68 in colorectal tumours. Oncogene. 2001;20:7734–7743. [PubMed] 41. Zacharias DP, Lima MM, Souza AL, Jr, de Abranches Oliveira Santos ID, Enokiara M, et al. Human cutaneous melanoma expresses a significant phosphate-dependent glutaminase activity: a comparison with the surrounding skin of the same patient. Cell Biochem Funct. 2003;21:81–84. [PubMed] 42. Barrett T, Edgar R. Gene expression omnibus: microarray data storage, submission, retrieval, and analysis. Methods Enzymol. 2006;411:352–369. [PubMed] 43. Iaquinta PJ, Lees JA. Life and death decisions by the E2F transcription factors. Curr Opin Cell Biol. 2007;19:649–657. [PubMed] 44. Zhang SY, Liu SC, Johnson DG, Klein-Szanto AJ. E2F-1 gene transfer enhances invasiveness of human head and neck carcinoma cell lines. Cancer Res. 2000;60:5972–5976. [PubMed] 45. Naora H. Involvement of ribosomal proteins in regulating cell growth and apoptosis: translational modulation or recruitment for extraribosomal activity? Immunol Cell Biol. 1999;77:197–205. [PubMed] 46. Bauvois B. Transmembrane proteases in cell growth and invasion: new contributors to angiogenesis? Oncogene. 2004;23:317–329. [PubMed] 47. Rual JF, Venkatesan K, Hao T, Hirozane-Kishikawa T, Dricot A, et al. Towards a proteome-scale map of the human protein-protein interaction network. Nature. 2005;437:1173–1178. [PubMed] 48. Chen MH, Yang WK, Whang-Peng J, Lee LS, Huang TS. Differential inducibilities of GFAP expression, cytostasis and apoptosis in primary cultures of human astrocytic tumours. Apoptosis. 1998;3:171–182. [PubMed] 49. Xu HP, Yuan L, Shan J, Feng H. Localization and expression of TSP50 protein in human and rodent testes. Urology. 2004;64:826–832. [PubMed] 50. Yuan L, Shan J, De Risi D, Broome J, Lovecchio J, et al. Isolation of a novel gene, TSP50, by a hypomethylated DNA fragment in human breast cancer. Cancer Res. 1999;59:3215–3221. [PubMed] 51. Dan I, Watanabe NM, Kusumi A. The Ste20 group kinases as regulators of MAP kinase cascades. Trends Cell Biol. 2001;11:220–230. [PubMed] 52. McCullough J, Fisher RD, Whitby FG, Sundquist WI, Hill CP. ALIX-CHMP4 interactions in the human ESCRT pathway. Proc Natl Acad Sci U S A. 2008;105:7687–7691. [PubMed] 53. Schmidt MH, Chen B, Randazzo LM, Bogler O. SETA/CIN85/Ruk and its binding partner AIP1 associate with diverse cytoskeletal elements, including FAKs, and modulate cell adhesion. J Cell Sci. 2003;116:2845–2855. [PubMed] 54. Wilson CJ, Zhan H, Swint-Kruse L, Matthews KS. The lactose repressor system: paradigms for regulation, allosteric behavior and protein folding. Cell Mol Life Sci. 2007;64:3–16. [PubMed] 55. Karin M. The regulation of AP-1 activity by mitogen-activated protein kinases. J Biol Chem. 1995;270:16483–16486. [PubMed] 56. King RD, Whelan KE, Jones FM, Reiser PG, Bryant CH, et al. Functional genomic hypothesis generation and experimentation by a robot scientist. Nature. 2004;427:247–252. [PubMed] 57. Yeang CH, Mak HC, McCuine S, Workman C, Jaakkola T, et al. Validation and refinement of gene-regulatory pathways on a network of physical interactions. Genome Biol. 2005;6:R62. [PubMed] 58. Sahoo D, Dill DL, Gentles AJ, Tibshirani R, Plevritis SK. Boolean implication networks derived from large scale, whole genome microarray datasets. Genome Biol. 2008;9:R157. [PubMed] |
PubMed related articles
Your browsing activity is empty. Activity recording is turned off. |
|||||||||||||||||||||||||||||||||||
Cell. 2000 Jan 7; 100(1):57-70.
[Cell. 2000]Nat Rev Genet. 2006 Jan; 7(1):55-65.
[Nat Rev Genet. 2006]Proc Natl Acad Sci U S A. 2001 Apr 24; 98(9):5116-21.
[Proc Natl Acad Sci U S A. 2001]J Comput Biol. 2000; 7(6):819-37.
[J Comput Biol. 2000]Methods Mol Biol. 2003; 224():149-57.
[Methods Mol Biol. 2003]Bioinformatics. 2002 Nov; 18(11):1454-61.
[Bioinformatics. 2002]Bioinformatics. 2005 Nov 1; 21(21):4026-32.
[Bioinformatics. 2005]J Comput Biol. 2004; 11(2-3):243-62.
[J Comput Biol. 2004]Bioinformatics. 2001; 17 Suppl 1():S215-24.
[Bioinformatics. 2001]Science. 2005 Apr 22; 308(5721):523-9.
[Science. 2005]Mol Syst Biol. 2007; 3():96.
[Mol Syst Biol. 2007]Cell. 2000 Jul 7; 102(1):109-26.
[Cell. 2000]Bioinformatics. 2005 Nov 1; 21(21):4026-32.
[Bioinformatics. 2005]Bioinformatics. 2008 Nov 15; 24(22):2650-6.
[Bioinformatics. 2008]Cell. 2000 Jul 7; 102(1):109-26.
[Cell. 2000]Bioinformatics. 2005 Nov 1; 21(21):4026-32.
[Bioinformatics. 2005]Bioinformatics. 2007 Jul 1; 23(13):i305-12.
[Bioinformatics. 2007]Bioinformatics. 2008 Nov 15; 24(22):2650-6.
[Bioinformatics. 2008]Cell. 2000 Jul 7; 102(1):109-26.
[Cell. 2000]Bioinformatics. 2007 Jul 1; 23(13):i305-12.
[Bioinformatics. 2007]Bioinformatics. 2008 Nov 15; 24(22):2650-6.
[Bioinformatics. 2008]Bioinformatics. 2008 Nov 15; 24(22):2650-6.
[Bioinformatics. 2008]Bioinformatics. 2005 Nov 1; 21(21):4026-32.
[Bioinformatics. 2005]Bioinformatics. 2008 Nov 15; 24(22):2650-6.
[Bioinformatics. 2008]J Comput Biol. 2004; 11(1):53-60.
[J Comput Biol. 2004]Science. 2002 Apr 19; 296(5567):550-3.
[Science. 2002]Cancer Res. 2005 Mar 1; 65(5):1814-21.
[Cancer Res. 2005]Genome Biol. 2001; 2(10):RESEARCH0042.
[Genome Biol. 2001]Cancer Res. 2007 Aug 1; 67(15):7132-8.
[Cancer Res. 2007]J Neurosci. 2006 May 17; 26(20):5277-87.
[J Neurosci. 2006]Cell. 2000 Jul 7; 102(1):109-26.
[Cell. 2000]J Comput Biol. 2004; 11(2-3):243-62.
[J Comput Biol. 2004]Nat Genet. 2000 May; 25(1):25-9.
[Nat Genet. 2000]Nucleic Acids Res. 1999 Jan 1; 27(1):29-34.
[Nucleic Acids Res. 1999]Nucleic Acids Res. 2005 Jan 1; 33(Database issue):D428-32.
[Nucleic Acids Res. 2005]Science. 2002 Oct 25; 298(5594):799-804.
[Science. 2002]Nature. 2005 Dec 1; 438(7068):679-84.
[Nature. 2005]Nucleic Acids Res. 2008 Jan; 36(Database issue):D637-40.
[Nucleic Acids Res. 2008]Genes Dev. 1996 Nov 15; 10(22):2831-48.
[Genes Dev. 1996]Proc Natl Acad Sci U S A. 1998 Jun 23; 95(13):7357-62.
[Proc Natl Acad Sci U S A. 1998]Genes Dev. 1999 Apr 1; 13(7):798-803.
[Genes Dev. 1999]Genes Dev. 2003 Nov 1; 17(21):2698-708.
[Genes Dev. 2003]Cancer Res. 2005 Mar 1; 65(5):1814-21.
[Cancer Res. 2005]Int J Biol Markers. 1994 Apr-Jun; 9(2):63-9.
[Int J Biol Markers. 1994]Oncogene. 2001 Nov 22; 20(53):7734-43.
[Oncogene. 2001]Cell Biochem Funct. 2003 Mar; 21(1):81-4.
[Cell Biochem Funct. 2003]Proc Natl Acad Sci U S A. 2001 Apr 24; 98(9):5116-21.
[Proc Natl Acad Sci U S A. 2001]Methods Enzymol. 2006; 411():352-69.
[Methods Enzymol. 2006]Curr Opin Cell Biol. 2007 Dec; 19(6):649-57.
[Curr Opin Cell Biol. 2007]Cancer Res. 2000 Nov 1; 60(21):5972-6.
[Cancer Res. 2000]Immunol Cell Biol. 1999 Jun; 77(3):197-205.
[Immunol Cell Biol. 1999]Oncogene. 2004 Jan 15; 23(2):317-29.
[Oncogene. 2004]Nature. 2005 Oct 20; 437(7062):1173-8.
[Nature. 2005]Apoptosis. 1998; 3(3):171-82.
[Apoptosis. 1998]Urology. 2004 Oct; 64(4):826-32.
[Urology. 2004]Cancer Res. 1999 Jul 1; 59(13):3215-21.
[Cancer Res. 1999]Proc Natl Acad Sci U S A. 2008 Jun 3; 105(22):7687-91.
[Proc Natl Acad Sci U S A. 2008]J Cell Sci. 2003 Jul 15; 116(Pt 14):2845-55.
[J Cell Sci. 2003]Cancer Res. 2005 Mar 1; 65(5):1814-21.
[Cancer Res. 2005]Cell Mol Life Sci. 2007 Jan; 64(1):3-16.
[Cell Mol Life Sci. 2007]J Biol Chem. 1995 Jul 14; 270(28):16483-6.
[J Biol Chem. 1995]Methods Enzymol. 2006; 411():352-69.
[Methods Enzymol. 2006]Mol Syst Biol. 2007; 3():96.
[Mol Syst Biol. 2007]J Comput Biol. 2004; 11(2-3):243-62.
[J Comput Biol. 2004]Nature. 2004 Jan 15; 427(6971):247-52.
[Nature. 2004]Genome Biol. 2005; 6(7):R62.
[Genome Biol. 2005]Genome Biol. 2008 Oct 30; 9(10):R157.
[Genome Biol. 2008]