![]() | ![]() |
Formats:
|
||||||||||||||||||||||||||||||||||
Copyright Lee et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. An Improved, Bias-Reduced Probabilistic Functional Gene Network of Baker's Yeast, Saccharomyces cerevisiae 1Center for Systems and Synthetic Biology, Institute for Cellular and Molecular Biology, University of Texas at Austin, Austin, Texas, United States of America 2Department of Chemistry and Biochemistry, Institute for Cellular and Molecular Biology, University of Texas at Austin, Austin, Texas, United States of America Andrea Califano, Academic Editor Columbia University, United States of America * To whom correspondence should be addressed. E-mail: marcotte/at/icmb.utexas.edu Conceived and designed the experiments: EM IL ZL. Performed the experiments: IL ZL. Analyzed the data: IL. Wrote the paper: EM IL. Received June 15, 2007; Accepted September 10, 2007. This article has been cited by other articles in PMC.Abstract Background Probabilistic functional gene networks are powerful theoretical frameworks for integrating heterogeneous functional genomics and proteomics data into objective models of cellular systems. Such networks provide syntheses of millions of discrete experimental observations, spanning DNA microarray experiments, physical protein interactions, genetic interactions, and comparative genomics; the resulting networks can then be easily applied to generate testable hypotheses regarding specific gene functions and associations. Methodology/Principal Findings We report a significantly improved version (v. 2) of a probabilistic functional gene network [1] of the baker's yeast, Saccharomyces cerevisiae. We describe our optimization methods and illustrate their effects in three major areas: the reduction of functional bias in network training reference sets, the application of a probabilistic model for calculating confidences in pair-wise protein physical or genetic interactions, and the introduction of simple thresholds that eliminate many false positive mRNA co-expression relationships. Using the network, we predict and experimentally verify the function of the yeast RNA binding protein Puf6 in 60S ribosomal subunit biogenesis. Conclusions/Significance YeastNet v. 2, constructed using these optimizations together with additional data, shows significant reduction in bias and improvements in precision and recall, in total covering 102,803 linkages among 5,483 yeast proteins (95% of the validated proteome). YeastNet is available from http://www.yeastnet.org. Introduction Gene networks provide a simple basis for organizing thousands of cellular components and their associations with each other, as well as for generating testable hypotheses about the components and the system as a whole. A number of research efforts have demonstrated that heterogeneous functional genomics and proteomics data can be integrated into gene (or protein) networks (e.g., [1]–[12]), thus organizing and relating highly complex data sets, as well as simplifying the prediction of new gene functions and associations on basis of the network connections. In such network integration approaches, relationships between genes are detected by various experimental or computational methods, and then combined in a bottom-up fashion in order to build a network model. As high-throughput biological experiments advance, we expect corresponding gains in network models derived from these data. Such improvements, however, are often tempered by the already extreme and growing complexity of the biological data. There are three major problems in integrating diverse genomics data into network models. First, the genomics data are heterogeneous in their sensitivity and specificity for relationships between genes. For example, experimental methods such as mass spectrometry preferentially observe abundant proteins, while comparative genomics methods apply only to evolutionarily conserved genes. Increasing the sensitivity of detection usually carries a cost of increasing false positive identifications. Thus, the systematic bias for each method should be understood and considered during data integration. Second, genomics data sets vary widely in their utility for reconstructing gene networks. Thus, we need robust benchmarking methods that can evaluate each data set and allow comparison of their relative merits. Third, data sets are often correlated, complicating integration. However, the correlation can be difficult to measure because of both data incompleteness (a common problem) and sampling biases. Probabilistic functional gene networks represent a class of gene network models that attempt to solve these problems, allowing integrative network models to be built from heterogeneous genomics data (e.g., [1], [3], [8], [10]–[13]). One key idea of such network models is the reinterpretation of genomics data as providing evidence for “functional coupling” between genes [1]. This non-mechanistic, but nonetheless useful, high level notion of gene association enables the integration of many different types of data, capturing diverse types of associations (e.g., direct physical interactions, regulatory interactions, membership in the same physical protein complex, etc.) precisely because the definition of gene association is inclusive. Such associations can be discovered using Bayesian statistical methods which allow robust evaluations to be made of functional associations between genes in a supervised learning framework, such as by measuring known pathways and cellular systems for their recapitulation by the data sets being analyzed. We previously reported such a probabilistic genome-wide gene network for yeast genes (dubbed YeastNet v.1) [1]. Here, we present optimized methods that improve our probabilistic functional gene network models. Table 1 summarizes the major improvements. In particular, optimization of three major areas is highlighted, illustrating their effects on network quality. First, we reduced functional bias toward the dominant gold standard reference annotation during training. For example, most yeast gene functional annotation sets show biases towards genes of “protein biosynthesis” or “ribosomal proteins” [14], [15]. This bias inflates scores in a manner that does not generalize for other functions. Second, we apply a simple probability model for calculating confidence in protein physical interaction and genetic interaction data sets. We find the hypergeometric probability of an interaction occurring at random chance provides an excellent error confidence model for the interactions and simplifies their integration. Third, we introduce two thresholds that significantly improve the derivation of functional linkages from DNA microarray experiments. The combination of these improvements with additional data results in a markedly improved overall yeast gene network, spanning 95% of the validated yeast protein-coding genes. We demonstrate the network topology is predictive of essential genes, and apply the network to predict, then experimentally confirm, the function of the yeast gene PUF6 in 60S ribosomal subunit biogenesis.
Results and Discussion We incorporated three major improvements to the yeast probabilistic gene network, beyond inclusion of additional data sets: the reduction of bias in the reference training set, the introduction of probabilistic scores for physical and genetic interactions, and the introduction of filters to remove false-positive linkages from analysis of mRNA co-expression. We first discuss each of these improvements in turn, before demonstrating the overall quality of the network. Effect of a functionally biased reference set in learning a gene network from functional genomics data The derivation of a probabilistic functional network from functional genomics and proteomics data using the log-likelihood strategy is an example of a supervised learning approach, distinguishing positive functional associations from negative associations on the basis of the performance of training associations in the data sets under analysis. The learning efficiency, however, is contingent upon the quality of the reference training sets, although the algorithms we employ are chosen for their robustness to false examples in the references. Learning efficiency also correlates with the extent of reference examples, as we cannot learn effectively using only a few examples. A third important characteristic of reference sets affecting supervised learning is the systematic bias among examples. In agreement with previous observations of yeast gene annotation [15], [16], we found that this last issue in particular was important for reconstructing a functional yeast gene network. The most comprehensive and reliable functional annotation currently available for yeast is the Gene Ontology [17] annotation set. More than 70% of validated yeast protein-encoding genes are annotated by at least one of over 1,000 Gene Ontology “biological process” terms with support derived from reliable small-scale experimental evidence. Therefore, yeast Gene Ontology “biological process” annotation meets the first two requirements of a good reference set for efficient learning. However, the frequency distribution of annotation terms is heavily biased toward the single term “protein biosynthesis” (GO:0006412). This term alone is responsible for >27% of the total reference gene pairs (Figure 1A
There are many possible reasons for such biased annotation, ranging from bias in scientific interest—yeast has historically been a major model for studying many core cellular processes including eukaryotic protein biosynthesis—to bias in technological feasibility—it is generally easier to study highly expressed proteins such as ribosomal proteins—to intrinsic bias in the cellular system themselves—core molecular machines such as the ribosome legitimately incorporate more genes than many other cellular systems. We suspect that such bias is inevitable; nonetheless, we need to minimize its adverse effects for network reconstruction. We examined the consequences of this bias by “masking” this dominant term in the annotation reference set, thereby removing all reference gene pairs linked via this term, and then testing data sets for their performance on the full and masked reference sets. For example, mRNA co-expression relationships between yeast genes across various heat-shock treatments [19] appear to strongly predict functional associations when benchmarked using the full, biased reference set (Figure 1B Probabilistic inference of gene functional associations from physical protein-protein interactions and genetic interactions Because of the generally strong correlation between protein physical or genetic interactions and functional associations, a map of such interactions among proteins is an invaluable source for learning about protein functions and pathways. Among many techniques of mapping protein physical interaction, yeast two hybrid assays and affinity purification followed by mass spectrometry have proved to be the most popular for their scalability. Two major genome-scale yeast two hybrid screens reported more than 4,000 binary interactions [20], [21]. While these interactions passed minimum quality criteria, we might not expect all to be equally informative for inferring functional associations. The original confidence measures—dividing interactions into a more reproducible “core” set and less reproducible “non-core” set [20]—is coarse-grained and may often miss functionally informative interactions. Mass-spectrometry-derived interaction data, usually provided as a list of baits of affinity purification and their identified preys, is even more complicated for inferring binary physical or functional associations. Two different models of inferring binary interactions from the lists of identifications have been widely used—the spoke and matrix models [22]. The spoke model allows pair-wise relationships only between baits and preys in the same complexes, whereas the matrix model includes additional relationships inferred by pairing preys in the same complexes. These interpretative models exhibit different trade-offs between completeness and accuracy—the spoke model achieves high accuracy at the cost of incompleteness, whereas the matrix model provides a more complete model but relatively low accuracy due to pairing all prey proteins from a given bait with each other. A probabilistic model of protein-protein interactions should bypass the limitations of these coarse descriptive models, while providing higher resolution scoring important for data integration. We found that calculating the hypergeometric probability of the protein interactions occurring at random chance in a given data set generates a very well-behaved ranking of interaction accuracy in recall-precision analyses (Figure 2A&B
Interestingly, we also observed the hypergeometric probability-based confidence scores to effectively rank genetic interactions according to their utility for functional inferences (Figure 2C Optimized method of inferring functional links by co-expression analysis For inferring functional linkages from DNA microarray evidence, we employ the divide-test-integrate approach [1] for discovering functionally informative cases of mRNA co-expression. This method is in contrast to simply concatenating the results of all DNA microarray experiments to create a single, monolithic expression vector for each gene, then measuring correlation between these vectors. A co-expression network derived in this manner indeed shows a robust correlation between the extent of expression correlation and the degree of functional association, in part because of its high dimensionality. However, it generally works as a useful model only for limited groups of genes, such as consistently co-expressed housekeeping genes. The problem facing this method is that context-specific co-expression patterns evident in only a subset of experiments are overwhelmed by stochastic or uncorrelated expression changes across the remaining experiments. For example, consider the case of combining several experiments designed to detect expression dynamics during heat shock with a large number of unrelated experiments. Linkage between genes that respond coordinately under heat shock but not in the remaining experiments are unlikely to be detected in an analysis of the monolithic expression vectors. In contrast, linkages among these genes might be detected by the divide-test-integrate approach, in which each group of biologically coherent experiments is analyzed separately for co-expression linkages, followed by integration of linkages across the sets of experiments. However, its actual practice often entails increased false positive co-expression linkages because of lower dimension expression vectors and the correspondingly increased probability of observing such correlations at random. For robust as well as sensitive co-expression linkage detection, we introduced two new parameters to filter false positive co-expression linkages. These filters operate by removing genes from the co-expression analysis that fail to show a minimum ratio of expression change (R) in a minimum number of microarray experiments (M), thereby eliminating the genes most likely to be unresponsive in the array set being analyzed. We optimized the choice of these two parameters for each set of array experiments by maximizing the area under a recall-precision curve (Table 2).
Beyond filtering genes, we also removed entire data sets that proved uninformative for reconstructing a functional network: We measured the relationship between the degree of co-expression between two genes, measured as the Pearson correlation coefficient (PCC) of their expression levels across the arrays under consideration, and the likelihood of their functional association, measured by the log likelihood of belonging to the same pathway (LLS, see Methods) between the genes in each successive bin of 1000 gene pairs ranked in descending order by PCC. Across 18 total sets of DNA microarrays from SMD [26], containing 581 individual array experiments, we found 14 sets showed a significant relationship (e.g., Cell cycle; Figure 3A
Assessment of YeastNet version 2 as a predictive model In total, ten types of functional genomics, proteomics, and comparative genomics data sets are integrated into the network (Table 4), as described in the Methods section (see the pseudo-code for an overview of the procedure). Approximately 1,800,000 individual experimental observations were integrated into the network model, optimizing a total of ~155 free parameters in order to construct the network. Using a permissive scoring threshold corresponding to the log likelihood score (LLS>0.916) of non-core genome-wide Y2H screens [20], YeastNet v. 2 contains a total of 102,803 linkages covering 5,483 yeast proteins (covering >95 % of validated yeast proteome).
The integrated model, along with the various sets of linkages derived from individual data sets, was assessed on an independent test set of gene functional linkages derived from the MIPS protein function annotation set, calculating recall and precision of the MIPS reference linkages (Figure 4
We compared the overall performance of YeastNet version 2 to that of YeastNet version 1 by recall-precision analysis on the independent test sets. We previously defined a confident sub-network by taking only the top 34,000 functional linkages (covering 4,681 yeast proteins) [1] and used that for detailed biological interpretation. We therefore selected the top 34,000 linkages of both versions of YeastNet in order to perform a fair comparison. This subset of YeastNet v. 2 covers 4,649 yeast proteins (>80% of the validated yeast proteome). In tests of the MIPS functional linkage reference set that included linkages derived using the functional category “protein synthesis”, the precision of the two networks is comparable, while coverage—for both genes and reference linkages—is significantly improved for the new network model (Figure 5A Another aspect of the predictive quality of a gene network relates to an observed correlation between a gene's tendency to be essential [27] and its centrality in a network, measured as the number of interactions in which the gene participates. This correlation was initially observed for the yeast physical protein-protein interaction network [28]. Consistent with the original observation, the high quality physical protein-protein interactions derived from small-scale experiments (here, collected from bioGRID [29] and DIP [30]) show a strong correlation between degree centrality and lethality (Spearman rank correlation (rs) = 0.94; Figure 6A = 0.95) while covering nearly all (99%) of the experimentally identified essential yeast genes (Figure 6C
Experimental validation of the top ribosome biogenesis prediction, PUF6 In addition to the above computational validation, we also experimentally validated predictions arising from the new gene network. Using the new network, we predicted new genes to be involved in the process of ribosomal biogenesis, which is a fundamental process critical for cells and widely conserved across eukaryotes. New ribosomal biogenesis genes were inferred by identifying close network neighbors to the known ribosomal biogenesis genes. Specifically, we generated a seed set of known ribosome biogenesis genes based on their Gene Ontology biological process annotation (n = 238 yeast genes annotated by the terms “ribosome assembly”, “rRNA”, or “35S”), then prioritized their network neighbors by the sum of their LLS scores to genes of the seed set. This list of genes was filtered to remove known ribosomal proteins. Table 5 lists the top 5 predictions. Two of the top 5 genes, CIC1 and ESF2 have been verified in the literature [31], [32] but had not yet been included in the ribosome biogenesis annotation set we employed, and thus can be considered true predictions already verified by published studies. Moreover, these predictions are also supported by multiple lines of evidence including inferred functional linkages based on high-throughput data (e.g., co-expression and mass spectrometry analysis; Table 5). All five genes are known to be localized to the nucleolus [33], strongly supporting a possible role in ribosome biogenesis.
We selected the top-ranked prediction, PUF6, for experimental validation. PUF6 encodes an RNA-binding protein previously known to be involved in mating-type determination via its translational repression of ASH1 mRNA prior to ASH1 mRNA localization to the bud tip [34]. While previous computational evidence associates PUF6 with ribosomal biogenesis [35], there is not yet direct experimental support for its involvement. We therefore experimentally tested PUF6 for its participation in ribosomal biogenesis. We might expect yeast strains defective in ribosomal biogenesis to show a slow growth phenotype; we tested a puf6Δ deletion strain [36] and indeed observed significant growth retardation compared to the wild-type strain when cultured at 20 °C (Figure 7A
Conclusions In this study, we present several optimizations that significantly improve the predictive power of a probabilistic functional gene network of yeast. There are three major aspects worth noting. First, our current functional genomics knowledge is severely biased. This bias leads to biased learning unless appropriately taken into account, as the effect of reference linkages from the dominant GO term “protein biosynthesis” is quite strong (Figures 1 We describe applications of the gene network for functional prediction (prediction of ribosomal biogenesis genes) and prediction of essential genes. In order to perform similar analyses of YeastNet v. 2, we have established a web site (http://www.yeastnet.org) where the network can be downloaded in full. We anticipate posting future updates of the network to this site as new data sets become available. Materials and Methods Saccharomyces cerevisiae gene set YeastNet version 2 is based on the verified 5,794 protein encoding open reading frames (ORFs) of the yeast genome downloaded from Saccharomyces cerevisiae Genome Database (SGD) [42] on March 2005. All linkages and calculations of genome coverage are based on this gene set. Reference and benchmark sets In order to benchmark the assigned functional linkages in this study, three different reference sets were used. As a major reference set for benchmarking, we used the Gene Ontology (GO) annotation, downloaded from the Saccharomyces cerevisiae Genome Database (SGD) [17] on March 2005. The GO schema lists three hierarchies of function describing “biological process” (i.e., pathways and systems), “molecular function” (i.e., biochemical activities), and “cellular component” (i.e., subcellular localization). For training the network, we used the Saccharomyces cerevisiae GO “biological process” annotation, which contains up to 14 different levels of information under the term “biological process” within the hierarchy. We used terms belonging to levels 2 through 10. We also excluded the term “protein biosynthesis” because it annotates so many genes as to significantly bias the benchmarking. To construct the reference set of linkages, we considered all gene pairs as functionally linked if they shared annotation from this set of GO terms. These pairs comprised our positive reference set for training network models. Negative examples were constructed as pairs of annotated genes not sharing any annotation terms, i.e., all other links among this annotated set of genes. Specifically, 66,174 positive reference pairs were employed, representing all gene pairs sharing any GO biological process terms between levels 2–10 (except for the biased term “protein biosynthesis”). These pairs are provided on the supporting web site (http://www.yeastnet.org). All other pairs of these genes were implicitly defined as the negative reference pairs. For example, the genes NOP1 and SIK1 represent a positive example, sharing the GO terms ‘rRNA modification’, ‘35S primary transcript processing’, ‘processing of 20S pre-rRNA’. The genes BUD5 (‘bud site selection’, ‘pseudohyphal growth’, ‘small GTPase mediated signal transduction’) and NOG1 (‘ribosome-nucleus export’) are annotated, but do not share terms, and represent a negative example. We also employed two independent functional linkage reference sets for testing functional linkages. One was derived from the Munich Information Center for Protein Sequences (MIPS) [43] protein function annotation. We used the 11 major categories from the top level MIPS functional category annotation. The second reference set was derived from the clusters of orthologous group (COG) annotation [44], which is based on reconstructing homologous groups of proteins in such a manner as to considerably enrich for orthologous proteins within each group, with the functions of genes assigned within 23 broad categories (such as “Transcription” and “Signal Transduction Mechanisms”) based on the well-annotated proteins with each COG. We use the version of COG that includes multicellular eukaryotic genomes (named eukaryotic orthologous groups, or KOG) [45]. Positive and negative linkage sets were constructed from each of these reference sets as for the GO set. Benchmarking and integrating heterogeneous functional genomics data Different types of genomics data sets differ considerably in their utility for inferring functional linkages. We standardized the contributions from heterogeneous genomic data sets by scoring using the log likelihood score (LLS) scheme previously described in [1]. In this scheme, the score for each data set (or subset; e.g., a set of gene pairs co-expressed to a certain extent) is calculated as
To avoid overtraining, we employed 0.632 bootstrapping [46], [47] for all LLS calculations. 0.632 bootstrapping has been shown to provide a robust estimate of classifier accuracy, out-performing cross-validation [48], especially for very small data sets (e.g., see [49]), and is thus appropriate even for more poorly annotated genomes. Unlike cross-validation, which uses sampling without replacement for constructing test and training data sets, 0.632 bootstrapping employs sampling with replacement, constructing the training set from data sampled with replacement and the test set from the remaining data that weren't sampled. Each linkage has a probability of 1-1/n of not being sampled, resulting in ~63.2% of the data in the training set and ~36.8% in the test set [50]. The overall LLS is the weighted average of results on the two sets, equal to 0.632*LLStest + (1-0.632)*LLStrain. For data sets in which each gene pair is associated with a continuous score (e.g., correlation coefficient, mutual information, etc.), we calculated LLS scores for bins containing equal numbers of gene pairs. Those LLS scores and their corresponding data scores (the mean data scores for a bin) were used to calculate regression models (see Figure 1B For integrating LLS scores from different data sets, we employed the weighted sum method [1] in order to take into account correlations among the data sets. The published weighted sum method was modified by using linearly decaying weights for additional datasets, and by including a new free parameter, T, which represents a minimum LLS threshold on the data sets being integrated. The weighted sum (WS) integrating multiple likelihood scores of functional association for a gene-pair was calculated as:
Regarding the choice of linear versus exponential decay of confidence in secondary evidence, we observe better performance (measured by recall-precision analysis) using the linear model when accompanied by more extensive secondary evidence and improved filtering of false positive linkages prior to integration. In YeastNet v.1, more low-scoring false-positive linkages were incorporated, and their contributions as secondary evidence were more strongly down-weighted under the exponential model. However, in YeastNet v. 2, new filters (in particular, new probabilistic scores for protein interactions and the introduction of thresholds for DNA microarray data) down-weight or remove many false positive associations prior to integration. The addition of new data sets also has the effect of increasing the quantity of secondary evidence. Thus we empirically observe that as secondary lines of evidence become more available and informative, the linear dependency model performs better. Inferring gene functional linkages from mRNA expression data Gene functional linkages were inferred from mRNA expression data deposited in the Stanford Microarray Database (SMD) by July 2005 [26]. Co-expression relationships were measured as the Pearson correlation coefficient (PCC) between pairs of genes' mRNA expression vectors, accepting only PCC values statistically significant at the 99% confidence level by t-test. From the set of gene pairs with significant PCC scores, we excluded pairs with cDNA sequence homology (defined as a BLAST E-value<10−4 and percentage nucleotide sequence identity >70% over the aligned regions [51]) in order to reduce false positive co-expression linkages caused by cross-hybridization on the DNA microarrays. As demonstrated previously [1], overall recall/precision of expression-derived linkages can be improved by analyzing subsets of arrays independently, rather than as a single composite expression vector. We tested a total of 581 DNA microarray experiments comprising 18 sets, as defined by SMD (Tables 2 and 3). We found that 14 SMD sets, containing a total of 500 array experiments, exhibited a significant correlation between PCC and the log likelihood score; we considered only these data sets further. We introduced two additional parameters to improve co-expression inferences: a threshold for the minimum observed change in mRNA levels across the set of array experiments (R in Table 2), and a threshold for the minimum number of microarray experiments with expression values greater than R (M in Table 2). Thus, only genes that are differentially expressed by at least R-fold (in either direction) on at least M microarrays in the given data set will be considered for co-expression linkages. These parameters considerably reduce the linkage false positive rate by removing genes that do not vary across the set of arrays being analyzed, under the premise that genes that are expressed at a constant level across the tested conditions are not likely to be relevant to the conditions of the experiments or to participate in strong co-expression relationships. These filters therefore remove false positive linkages derived from experimental noise and drift in otherwise unchanging baseline expression levels. We optimized the two thresholds for each set of SMD arrays, maximizing the area under a curve plotting the number of genes incorporated in the inferred linkages versus cumulative log likelihood score of the linkages (Table 2). In order to include otherwise robust co-expression linkages missed by these analyses, we also concatenated all 500 experiments derived from the 14 selected SMD data sets and derived co-expression linkages from these concatenated expression vectors. These linkages plus those from each of the 14 SMD subsets were integrated by the weighted sum method. Inferring gene functional linkages from experimental protein-protein interaction data Physical protein-protein interactions (PPI) and genetic interactions (GI) were collected from the Database of Interacting Proteins (DIP) small-scale experiment set (downloaded March 2003) [30], BioGRID (downloaded on June 2006) in which all interactions are supported by literature curation [29] and literature collection by MIPS [43]. These interactions are highly confident, because genetic interaction screens inherently provide low false-positive rates (Type I errors), and all physical interactions in these sets are derived from small-scale studies. Additional physical interactions were collected from published genome-scale screens using mass spectrometry analyses of affinity-purified protein complexes [52]–[54] or high throughput yeast two hybrid (Y2H) assays [20], [21], [55]–[57]. We applied a quantitative error model developed for PPI data sets [41], [58] in order to assign probabilistic confidence scores to each PPI or GI gene pair. Instead of modeling simple binary bait-prey interactions for yeast two hybrid assays, inferred binary interactions from mass spectrometry analysis of affinity-purified protein complex [22], or binary genetic interactions, we calculated the hypergeometric probability of interaction between two proteins by random chance, assigning a probability (p-value) to the pair as:
Inferring gene functional linkages from genome context We employ three genome context methods for inferring functional linkages from genome sequences: phylogenetic profiling (PG) [59]–[61], the Rosetta Stone protein (RS) (or gene-fusion) method [59], [62]–[64], and gene neighbors [3], [65], [66]. Linkages for each method were derived from analysis of a database of 149 genomes (117 bacteria, 16 archaea, and 16 eukaryotes). Briefly, each yeast protein sequence was compared to every other sequence using the program BLASTP with default settings [67]. Rosetta Stone linkages and gene neighbor linkages were calculated from these comparisons as in [68] and [3], respectively. Phylogenetic profiles were constructed from these comparisons and analyzed as in [69] with the following modifications. We found the profiles corresponding to major phylogenetic groups of organisms varied widely in their utility for deriving functional gene associations. In particular, inclusion of eukaryotic and archaeal genomes did not significantly improve performance. Instead, we found the best performance—measured as the performance maximizing the area under a plot of LLS versus the number of genes participating in the linkages—by inferring functional linkages from a profile constructed only from bacterial genomes. For discretizing BLAST E-values prior to calculation of mutual information between phylogenetic profiles, we binned by equal numbers of examples rather than by equal intervals of E-values, accounting for the non-uniform distribution of BLAST E-values. We observed the best results from using 3 bins. Inferring gene functional linkages from literature mining We identified functional linkages by mining the scientific literature (specifically, Medline abstracts) using the co-citation approach [70], [71] as in [1]. We analyzed a set of N = 29,135 Medline abstracts that included the word “Saccharomyces cerevisiae” in the abstract for perfect matches to either the standardized names or common names (or their synonyms) of 5,794 yeast genes.Inferring gene functional linkages from protein tertiary structure Functional linkages were also inferred from physical interactions predicted between proteins pairs based upon modeling their 3-dimensional structures into X-ray crystal structures of homologous protein complexes. We used the tertiary structure predictions reported by Aloy and Russell [72], using the reported P-values as the internal measure of confidence in the interactions. Summary of integration The final integrated gene network incorporates 10 fairly distinctive types of data: 1) small-scale protein physical interactions from literature curation, 2) co-citation evidence, 3) mRNA co-expression, 4) genetic interactions, 5) protein complexes derived from affinity-purification followed by mass spectrometry, 6) high-throughput yeast two hybrid analyses, 7) gene neighbors, 8) phylogenetic profiles, 9) Rosetta Stone protein linkages, and 10) inferred interactions from tertiary structural modeling (Table 4). The following pseudo-code summarizes the benchmarking and integration of these data:
Experimental validation of yeast ribosomal biogenesis genes Yeast strains were cultured in YPD (1% yeast extract, 2% peptone, 2% dextrose) at either 20°C or 30°C. The puf6Δ haploid MATa deletion strain [36] and PUF6, NMD3, and TDH1 TAP-tagged haploid MATa strains [40] were obtained from Open Biosystems. For polysome profile analysis, yeast strains were cultured to OD600 0.3–0.5, and 100 µg/ml cycloheximide (Sigma) was added to each culture. Cultures were immediately cooled with ice, and all subsequent steps were performed on ice or at 4°C. Each cell pellet was washed once with lysis buffer (20 mM Tris pH 7.4, 20 mM KCl, 5 mM MgCl2, 100 µg/ml cycloheximide, 12 mM β-mercaptoethanol). The cells were pelleted, resuspended in one volume lysis buffer with protease inhibitors (2 µg/ml leupeptin, 2 µg/ml aprotinin, 1 µg/ml bestatin, 1 µg/ml pepstatin A; obtained from MP Biomedicals Inc.), and lysed with glass beads. Crude lysates were centrifuged at 15,000g for 10 minutes. Fifteen OD260 units of each supernatant were loaded onto continuous 12 ml 7 to 47% sucrose gradients in lysis buffer without protease inhibitors, as in [73]. After a 2.5-h spin at 40,000 rpm in a Beckman SW40 rotor, the sucrose gradient was fractionated and absorbance at 254 nm was measured. For TAP-tagged strains, fractions were collected, and proteins were precipitated with 10% cold trichloroacetic acid and washed with 100% cold acetone. For analysis of co-sedimentation with ribosomes, precipitated proteins were resuspended in 20 µl Laemmli buffer, and 2 µl of each sample was deposited onto a nitrocellulose membrane. TAP-tagged proteins were detected with a PAP antibody (Rockland Immunochemicals, Inc.) and electrochemiluminescence (ECL; GE Amersham). Acknowledgments We thank Arlen Johnson for assistance with polysome profile analysis, and Orly Alter and Vishy Iyer for critical discussion of microarray analyses. Footnotes Competing Interests: The authors have declared that no competing interests exist. Funding: This work was supported by grants from the N.S.F. (IIS-0325116, EIA-0219061), N.I.H. (GM06779-01,GM076536-01), Welch (F-1515), and a Packard Fellowship (EMM). These agencies were not involved in the design and conduct of the study, in the collection, analysis, and interpretation of the data, or in the preparation, review, or approval of the manuscript. References 1. Lee I, Date SV, Adai AT, Marcotte EM. A probabilistic functional network of yeast genes. Science. 2004;306:1555–1558. [PubMed] 2. Bader GD, Donaldson I, Wolting C, Ouellette BF, Pawson T, et al. BIND–The Biomolecular Interaction Network Database. Nucleic Acids Res. 2001;29:242–245. [PubMed] 3. Bowers PM, Pellegrini M, Thompson MJ, Fierro J, Yeates TO, et al. Prolinks: a database of protein functional linkages derived from coevolution. Genome Biol. 2004;5:R35. [PubMed] 4. Gunsalus KC, Ge H, Schetter AJ, Goldberg DS, Han JD, et al. Predictive models of molecular machines involved in Caenorhabditis elegans early embryogenesis. Nature. 2005;436:861–865. [PubMed] 5. Jansen R, Yu H, Greenbaum D, Kluger Y, Krogan NJ, et al. A Bayesian networks approach for predicting protein-protein interactions from genomic data. Science. 2003;302:449–453. [PubMed] 6. Marcotte EM, Pellegrini M, Thompson MJ, Yeates TO, Eisenberg D. A combined algorithm for genome-wide prediction of protein function. Nature. 1999;402:83–86. [PubMed] 7. Mellor JC, Yanai I, Clodfelter KH, Mintseris J, DeLisi C. Predictome: a database of putative functional links between proteins. Nucleic Acids Res. 2002;30:306–309. [PubMed] 8. Myers CL, Robson D, Wible A, Hibbs MA, Chiriac C, et al. Discovery of biological networks from diverse functional genomic data. Genome Biol. 2005;6:R114. [PubMed] 9. Rhodes DR, Tomlins SA, Varambally S, Mahavisno V, Barrette T, et al. Probabilistic model of the human protein-protein interaction network. Nat Biotechnol. 2005;23:951–959. [PubMed] 10. von Mering C, Huynen M, Jaeggi D, Schmidt S, Bork P, et al. STRING: a database of predicted functional associations between proteins. Nucleic Acids Res. 2003;31:258–261. [PubMed] 11. Zhong W, Sternberg PW. Genome-wide prediction of C. elegans genetic interactions. Science. 2006;311:1481–1484. [PubMed] 12. Nariai N, Kolaczyk ED, Kasif S. Probabilistic protein function prediction from heterogeneous genome-wide data. PLoS ONE. 2007;2:e337. [PubMed] 13. Fraser AG, Marcotte EM. A probabilistic view of gene function. Nature Genetics. 2004;36:559–564. [PubMed] 14. Lee I, Marcotte EM. Effects of functional bias on supervised learning of a gene network model. In: McDermott J, Samudrala R, Bumgarner R, editors. Methods in Molecular Biology: Computational Systems Biology. Totowa: The Humana press Inc; 2007. 15. Myers CL, Barrett DR, Hibbs MA, Huttenhower C, Troyanskaya OG. Finding function: evaluation methods for functional genomic data. BMC Genomics. 2006;7:187. [PubMed] 16. Huttenhower C, Hibbs M, Myers C, Troyanskaya OG. A scalable method for integration and functional analysis of multiple microarray datasets. Bioinformatics. 2006;22:2890–2897. [PubMed] 17. Dwight SS, Harris MA, Dolinski K, Ball CA, Binkley G, et al. Saccharomyces Genome Database (SGD) provides secondary gene annotation using the Gene Ontology (GO). Nucleic Acids Res. 2002;30:69–72. [PubMed] 18. Kanehisa M, Goto S, Kawashima S, Nakaya A. The KEGG databases at GenomeNet. Nucleic Acids Res. 2002;30:42–46. [PubMed] 19. Gasch AP, Spellman PT, Kao CM, Carmel-Harel O, Eisen MB, et al. Genomic expression programs in the response of yeast cells to environmental changes. Mol Biol Cell. 2000;11:4241–4257. [PubMed] 20. Ito T, Chiba T, Ozawa R, Yoshida M, Hattori M, et al. A comprehensive two-hybrid analysis to explore the yeast protein interactome. Proc Natl Acad Sci U S A. 2001;98:4569–4574. [PubMed] 21. Uetz P, Giot L, Cagney G, Mansfield TA, Judson RS, et al. A comprehensive analysis of protein-protein interactions in Saccharomyces cerevisiae. Nature. 2000;403:623–627. [PubMed] 22. Bader GD, Hogue CW. Analyzing yeast protein-protein interaction data obtained from different sources. Nat Biotechnol. 2002;20:991–997. [PubMed] 23. Kelley R, Ideker T. Systematic interpretation of genetic interactions using protein networks. Nat Biotechnol. 2005;23:561–566. [PubMed] 24. Sangster TA, Lindquist S, Queitsch C. Under cover: causes, effects and implications of Hsp90-mediated genetic capacitance. Bioessays. 2004;26:348–362. [PubMed] 25. Boone C, Bussey H, Andrews BJ. Exploring genetic interactions and networks with yeast. Nat Rev Genet. 2007;8:437–449. [PubMed] 26. Gollub J, Ball CA, Binkley G, Demeter J, Finkelstein DB, et al. The Stanford Microarray Database: data access and quality assessment tools. Nucleic Acids Res. 2003;31:94–96. [PubMed] 27. Giaever G, Chu AM, Ni L, Connelly C, Riles L, et al. Functional profiling of the Saccharomyces cerevisiae genome. Nature. 2002;418:387–391. [PubMed] 28. Jeong H, Mason SP, Barabasi AL, Oltvai ZN. Lethality and centrality in protein networks. Nature. 2001;411:41–42. [PubMed] 29. Reguly T, Breitkreutz A, Boucher L, Breitkreutz BJ, Hon GC, et al. Comprehensive curation and analysis of global interaction networks in Saccharomyces cerevisiae. J Biol. 2006;5:11. [PubMed] 30. Xenarios I, Salwinski L, Duan XJ, Higney P, Kim SM, et al. DIP, the Database of Interacting Proteins: a research tool for studying cellular networks of protein interactions. Nucleic Acids Res. 2002;30:303–305. [PubMed] 31. Hoang T, Peng WT, Vanrobays E, Krogan N, Hiley S, et al. Esf2p, a U3-associated factor required for small-subunit processome assembly and compaction. Mol Cell Biol. 2005;25:5523–5534. [PubMed] 32. Horsey EW, Jakovljevic J, Miles TD, Harnpicharnchai P, Woolford JL., Jr Role of the yeast Rrp1 protein in the dynamics of pre-ribosome maturation. Rna. 2004;10:813–827. [PubMed] 33. Huh WK, Falvo JV, Gerke LC, Carroll AS, Howson RW, et al. Global analysis of protein localization in budding yeast. Nature. 2003;425:686–691. [PubMed] 34. Gu W, Deng Y, Zenklusen D, Singer RH. A new yeast PUF family protein, Puf6p, represses ASH1 mRNA translation and is required for its localization. Genes Dev. 2004;18:1452–1465. [PubMed] 35. Wade CH, Umbarger MA, McAlear MA. The budding yeast rRNA and ribosome biosynthesis (RRB) regulon contains over 200 genes. Yeast. 2006;23:293–306. [PubMed] 36. Winzeler EA, Shoemaker DD, Astromoff A, Liang H, Anderson K, et al. Functional characterization of the S. cerevisiae genome by gene deletion and parallel analysis. Science. 1999;285:901–906. [PubMed] 37. Rotenberg MO, Moritz M, Woolford JL., Jr Depletion of Saccharomyces cerevisiae ribosomal protein L16 causes a decrease in 60S ribosomal subunits and formation of half-mer polyribosomes. Genes Dev. 1988;2:160–172. [PubMed] 38. Ho JH, Johnson AW. NMD3 encodes an essential cytoplasmic protein required for stable 60S ribosomal subunits in Saccharomyces cerevisiae. Mol Cell Biol. 1999;19:2389–2399. [PubMed] 39. Adams CC, Jakovljevic J, Roman J, Harnpicharnchai P, Woolford JL., Jr Saccharomyces cerevisiae nucleolar protein Nop7p is necessary for biogenesis of 60S ribosomal subunits. Rna. 2002;8:150–165. [PubMed] 40. Ghaemmaghami S, Huh WK, Bower K, Howson RW, Belle A, et al. Global analysis of protein expression in yeast. Nature. 2003;425:737–741. [PubMed] 41. Hart GT, Lee I, Marcotte EM. A high-accuracy consensus map of yeast protein complexes reveals modular nature of gene essentiality. BMC Bioinformatics. 2007;8:236. [PubMed] 42. Cherry JM, Adler C, Ball C, Chervitz SA, Dwight SS, et al. SGD: Saccharomyces Genome Database. Nucleic Acids Res. 1998;26:73–79. [PubMed] 43. Mewes HW, Frishman D, Guldener U, Mannhaupt G, Mayer K, et al. MIPS: a database for genomes and protein sequences. Nucleic Acids Res. 2002;30:31–34. [PubMed] 44. Tatusov RL, Koonin EV, Lipman DJ. A genomic perspective on protein families. Science. 1997;278:631–637. [PubMed] 45. Tatusov RL, Fedorova ND, Jackson JD, Jacobs AR, Kiryutin B, et al. The COG database: an updated version includes eukaryotes. BMC Bioinformatics. 2003;4:41. [PubMed] 46. Efron B. Estimating the error rate of a prediction rule: improvement on cross-validation. J Am Stat Assoc. 1983;78:316–331. 47. Efron B, Tibshirani R. New York: Chapman & Hall; 1993. An introduction to the bootstrap. p. 439. 48. Sima C, Braga-Neto U, Dougherty ER. Superior feature-set ranking for small samples using bolstered error estimation. Bioinformatics. 2005;21:1046–1054. [PubMed] 49. Braga-Neto UM, Dougherty ER. Is cross-validation valid for small-sample microarray classification? Bioinformatics. 2004;20:374–380. [PubMed] 50. Witten IH, Frank E. 2005. Data Mining: Practical Machine Learning Tools and Techniques: Morgan Kaufmann. p. 560. 51. Carlson MW. University of Texas; 2002. Surveying yeast genomics diversity using cDNA microarrays. MS thesis. 52. Gavin AC, Aloy P, Grandi P, Krause R, Boesche M, et al. Proteome survey reveals modularity of the yeast cell machinery. Nature. 2006;440:631–636. [PubMed] 53. Ho Y, Gruhler A, Heilbut A, Bader GD, Moore L, et al. Systematic identification of protein complexes in Saccharomyces cerevisiae by mass spectrometry. Nature. 2002;415:180–183. [PubMed] 54. Krogan NJ, Cagney G, Yu H, Zhong G, Guo X, et al. Global landscape of protein complexes in the yeast Saccharomyces cerevisiae. Nature. 2006;440:637–643. [PubMed] 55. Fromont-Racine M, Mayes AE, Brunet-Simon A, Rain JC, Colley A, et al. Genome-wide protein interaction screens reveal functional networks involving Sm-like proteins. Yeast. 2000;17:95–110. [PubMed] 56. Newman JR, Wolf E, Kim PS. A computationally directed screen identifying interacting coiled coils from Saccharomyces cerevisiae. Proc Natl Acad Sci U S A. 2000;97:13203–13208. [PubMed] 57. Tong AH, Drees B, Nardelli G, Bader GD, Brannetti B, et al. A combined experimental and computational strategy to define protein interaction networks for peptide recognition modules. Science. 2002;295:321–324. [PubMed] 58. Lee I, Narayanaswamy R, Marcotte EM. Bioinformatic prediction of yeast gene function. In: Stansfield I, editor. Yeast Gene Analysis: Elsevier Press; 2007. pp. 597–628. 59. Huynen M, Snel B, Lathe W, 3rd, Bork P. Predicting protein function by genomic context: quantitative evaluation and qualitative inferences. Genome Res. 2000;10:1204–1210. [PubMed] 60. Pellegrini M, Marcotte EM, Thompson MJ, Eisenberg D, Yeates TO. Assigning protein functions by comparative genome analysis: protein phylogenetic profiles. Proc Natl Acad Sci U S A. 1999;96:4285–4288. [PubMed] 61. Wolf YI, Rogozin IB, Kondrashov AS, Koonin EV. Genome alignment, evolution of prokaryotic genome organization, and prediction of gene function using genomic context. Genome Res. 2001;11:356–372. [PubMed] 62. Enright AJ, Iliopoulos I, Kyrpides NC, Ouzounis CA. Protein interaction maps for complete genomes based on gene fusion events. Nature. 1999;402:86–90. [PubMed] 63. Marcotte EM, Pellegrini M, Ng HL, Rice DW, Yeates TO, et al. Detecting protein function and protein-protein interactions from genome sequences. Science. 1999;285:751–753. [PubMed] 64. Yanai I, Derti A, DeLisi C. Genes linked by fusion events are generally of the same functional category: a systematic analysis of 30 microbial genomes. Proc Natl Acad Sci U S A. 2001;98:7940–7945. [PubMed] 65. Dandekar T, Snel B, Huynen M, Bork P. Conservation of gene order: a fingerprint of proteins that physically interact. Trends Biochem Sci. 1998;23:324–328. [PubMed] 66. Overbeek R, Fonstein M, D'Souza M, Pusch GD, Maltsev N. The use of gene clusters to infer functional coupling. Proc Natl Acad Sci U S A. 1999;96:2896–2901. [PubMed] 67. Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, et al. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 1997;25:3389–3402. [PubMed] 68. Marcotte CJ, Marcotte EM. Predicting functional linkages from gene fusions with confidence. Appl Bioinformatics. 2002;1:93–100. [PubMed] 69. Date SV, Marcotte EM. Discovery of uncharacterized cellular systems by genome-wide analysis of functional linkages. Nat Biotechnol. 2003;21:1055–1062. [PubMed] 70. Jenssen TK, Laegreid A, Komorowski J, Hovig E. A literature network of human genes for high-throughput analysis of gene expression. Nat Genet. 2001;28:21–28. [PubMed] 71. Stapley BJ, Benoit G. Biobibliometrics: information retrieval and visualization from co-occurrences of gene names in Medline abstracts. Pac Symp Biocomput. 2000:529–540. [PubMed] 72. Aloy P, Russell RB. InterPreTS: protein interaction prediction through tertiary structure. Bioinformatics. 2003;19:161–162. [PubMed] 73. Baim SB, Pietras DF, Eustice DC, Sherman F. A mutation allowing an mRNA secondary structure diminishes translation of Saccharomyces cerevisiae iso-1-cytochrome c. Mol Cell Biol. 1985;5:1839–1846. [PubMed] 74. Tong AH, Lesage G, Bader GD, Ding H, Xu H, et al. Global mapping of the yeast genetic interaction network. Science. 2004;303:808–813. [PubMed] 75. Shapira M, Segal E, Botstein D. Disruption of yeast forkhead-associated cell cycle transcription by oxidative stress. Mol Biol Cell. 2004;15:5659–5669. [PubMed] 76. Eisen MB, Spellman PT, Brown PO, Botstein D. Cluster analysis and display of genome-wide expression patterns. Proc Natl Acad Sci U S A. 1998;95:14863–14868. [PubMed] 77. Brauer MJ, Saldanha AJ, Dolinski K, Botstein D. Homeostatic adjustment and metabolic remodeling in glucose-limited yeast cultures. Mol Biol Cell. 2005;16:2503–2517. [PubMed] |
PubMed related articles
Your browsing activity is empty. Activity recording is turned off. |
|||||||||||||||||||||||||||||||||
Science. 2004 Nov 26; 306(5701):1555-8.
[Science. 2004]Science. 2004 Nov 26; 306(5701):1555-8.
[Science. 2004]PLoS One. 2007 Mar 28; 2(3):e337.
[PLoS One. 2007]Science. 2004 Nov 26; 306(5701):1555-8.
[Science. 2004]Genome Biol. 2004; 5(5):R35.
[Genome Biol. 2004]Genome Biol. 2005; 6(13):R114.
[Genome Biol. 2005]Nucleic Acids Res. 2003 Jan 1; 31(1):258-61.
[Nucleic Acids Res. 2003]Nat Genet. 2004 Jun; 36(6):559-64.
[Nat Genet. 2004]BMC Genomics. 2006 Jul 25; 7():187.
[BMC Genomics. 2006]BMC Genomics. 2006 Jul 25; 7():187.
[BMC Genomics. 2006]Bioinformatics. 2006 Dec 1; 22(23):2890-7.
[Bioinformatics. 2006]Nucleic Acids Res. 2002 Jan 1; 30(1):69-72.
[Nucleic Acids Res. 2002]Nucleic Acids Res. 2002 Jan 1; 30(1):42-6.
[Nucleic Acids Res. 2002]Mol Biol Cell. 2000 Dec; 11(12):4241-57.
[Mol Biol Cell. 2000]Mol Biol Cell. 2000 Dec; 11(12):4241-57.
[Mol Biol Cell. 2000]Proc Natl Acad Sci U S A. 2001 Apr 10; 98(8):4569-74.
[Proc Natl Acad Sci U S A. 2001]Nature. 2000 Feb 10; 403(6770):623-7.
[Nature. 2000]Nat Biotechnol. 2002 Oct; 20(10):991-7.
[Nat Biotechnol. 2002]Proc Natl Acad Sci U S A. 2001 Apr 10; 98(8):4569-74.
[Proc Natl Acad Sci U S A. 2001]Nature. 2006 Mar 30; 440(7084):631-6.
[Nature. 2006]Nucleic Acids Res. 2002 Jan 1; 30(1):31-4.
[Nucleic Acids Res. 2002]Science. 2004 Feb 6; 303(5659):808-13.
[Science. 2004]Nat Biotechnol. 2002 Oct; 20(10):991-7.
[Nat Biotechnol. 2002]Nat Biotechnol. 2005 May; 23(5):561-6.
[Nat Biotechnol. 2005]Bioessays. 2004 Apr; 26(4):348-62.
[Bioessays. 2004]Nat Rev Genet. 2007 Jun; 8(6):437-49.
[Nat Rev Genet. 2007]Science. 2004 Nov 26; 306(5701):1555-8.
[Science. 2004]Nucleic Acids Res. 2003 Jan 1; 31(1):94-6.
[Nucleic Acids Res. 2003]Mol Biol Cell. 2004 Dec; 15(12):5659-69.
[Mol Biol Cell. 2004]Proc Natl Acad Sci U S A. 1998 Dec 8; 95(25):14863-8.
[Proc Natl Acad Sci U S A. 1998]Mol Biol Cell. 2005 May; 16(5):2503-17.
[Mol Biol Cell. 2005]Science. 2004 Nov 26; 306(5701):1555-8.
[Science. 2004]Proc Natl Acad Sci U S A. 2001 Apr 10; 98(8):4569-74.
[Proc Natl Acad Sci U S A. 2001]Science. 2004 Nov 26; 306(5701):1555-8.
[Science. 2004]Nature. 2002 Jul 25; 418(6896):387-91.
[Nature. 2002]Nature. 2001 May 3; 411(6833):41-2.
[Nature. 2001]J Biol. 2006; 5(4):11.
[J Biol. 2006]Nucleic Acids Res. 2002 Jan 1; 30(1):303-5.
[Nucleic Acids Res. 2002]Nature. 2002 Jul 25; 418(6896):387-91.
[Nature. 2002]Nucleic Acids Res. 2002 Jan 1; 30(1):303-5.
[Nucleic Acids Res. 2002]J Biol. 2006; 5(4):11.
[J Biol. 2006]Mol Cell Biol. 2005 Jul; 25(13):5523-34.
[Mol Cell Biol. 2005]RNA. 2004 May; 10(5):813-27.
[RNA. 2004]Nature. 2003 Oct 16; 425(6959):686-91.
[Nature. 2003]Genes Dev. 2004 Jun 15; 18(12):1452-65.
[Genes Dev. 2004]Yeast. 2006 Mar; 23(4):293-306.
[Yeast. 2006]Science. 1999 Aug 6; 285(5429):901-6.
[Science. 1999]Genes Dev. 1988 Feb; 2(2):160-72.
[Genes Dev. 1988]RNA. 2002 Feb; 8(2):150-65.
[RNA. 2002]Nature. 2003 Oct 16; 425(6959):737-41.
[Nature. 2003]Mol Cell Biol. 1999 Mar; 19(3):2389-99.
[Mol Cell Biol. 1999]Nature. 2002 Jul 25; 418(6896):387-91.
[Nature. 2002]Nature. 2003 Oct 16; 425(6959):737-41.
[Nature. 2003]Mol Cell Biol. 1999 Mar; 19(3):2389-99.
[Mol Cell Biol. 1999]BMC Bioinformatics. 2007 Jul 2; 8():236.
[BMC Bioinformatics. 2007]Nucleic Acids Res. 1998 Jan 1; 26(1):73-9.
[Nucleic Acids Res. 1998]Nucleic Acids Res. 2002 Jan 1; 30(1):69-72.
[Nucleic Acids Res. 2002]Nucleic Acids Res. 2002 Jan 1; 30(1):31-4.
[Nucleic Acids Res. 2002]Science. 1997 Oct 24; 278(5338):631-7.
[Science. 1997]BMC Bioinformatics. 2003 Sep 11; 4():41.
[BMC Bioinformatics. 2003]Science. 2004 Nov 26; 306(5701):1555-8.
[Science. 2004]Bioinformatics. 2005 Apr 1; 21(7):1046-54.
[Bioinformatics. 2005]Bioinformatics. 2004 Feb 12; 20(3):374-80.
[Bioinformatics. 2004]Science. 2004 Nov 26; 306(5701):1555-8.
[Science. 2004]Nucleic Acids Res. 2003 Jan 1; 31(1):94-6.
[Nucleic Acids Res. 2003]Science. 2004 Nov 26; 306(5701):1555-8.
[Science. 2004]Nucleic Acids Res. 2002 Jan 1; 30(1):303-5.
[Nucleic Acids Res. 2002]J Biol. 2006; 5(4):11.
[J Biol. 2006]Nucleic Acids Res. 2002 Jan 1; 30(1):31-4.
[Nucleic Acids Res. 2002]Nature. 2006 Mar 30; 440(7084):631-6.
[Nature. 2006]Nature. 2006 Mar 30; 440(7084):637-43.
[Nature. 2006]BMC Bioinformatics. 2007 Jul 2; 8():236.
[BMC Bioinformatics. 2007]Nat Biotechnol. 2002 Oct; 20(10):991-7.
[Nat Biotechnol. 2002]Genome Res. 2000 Aug; 10(8):1204-10.
[Genome Res. 2000]Genome Res. 2001 Mar; 11(3):356-72.
[Genome Res. 2001]Nature. 1999 Nov 4; 402(6757):86-90.
[Nature. 1999]Proc Natl Acad Sci U S A. 2001 Jul 3; 98(14):7940-5.
[Proc Natl Acad Sci U S A. 2001]Genome Biol. 2004; 5(5):R35.
[Genome Biol. 2004]Nucleic Acids Res. 1997 Sep 1; 25(17):3389-402.
[Nucleic Acids Res. 1997]Appl Bioinformatics. 2002; 1(2):93-100.
[Appl Bioinformatics. 2002]Genome Biol. 2004; 5(5):R35.
[Genome Biol. 2004]Nat Biotechnol. 2003 Sep; 21(9):1055-62.
[Nat Biotechnol. 2003]Nat Genet. 2001 May; 28(1):21-8.
[Nat Genet. 2001]Pac Symp Biocomput. 2000; ():529-40.
[Pac Symp Biocomput. 2000]Science. 2004 Nov 26; 306(5701):1555-8.
[Science. 2004]Bioinformatics. 2003 Jan; 19(1):161-2.
[Bioinformatics. 2003]Nature. 2006 Mar 30; 440(7084):631-6.
[Nature. 2006]Nature. 2006 Mar 30; 440(7084):637-43.
[Nature. 2006]Bioinformatics. 2003 Jan; 19(1):161-2.
[Bioinformatics. 2003]Science. 1999 Aug 6; 285(5429):901-6.
[Science. 1999]Nature. 2003 Oct 16; 425(6959):737-41.
[Nature. 2003]Mol Cell Biol. 1985 Aug; 5(8):1839-46.
[Mol Cell Biol. 1985]