Logo of narLink to Publisher's site
Nucleic Acids Res. 2006; 34(7): 2137–2150.
Published online Apr 26, 2006. doi:  10.1093/nar/gkl219
PMCID: PMC1449908

Prediction of yeast protein–protein interaction network: insights from the Gene Ontology and annotations

Abstract

A map of protein–protein interactions provides valuable insight into the cellular function and machinery of a proteome. By measuring the similarity between two Gene Ontology (GO) terms with a relative specificity semantic relation, here, we proposed a new method of reconstructing a yeast protein–protein interaction map that is solely based on the GO annotations. The method was validated using high-quality interaction datasets for its effectiveness. Based on a Z-score analysis, a positive dataset and a negative dataset for protein–protein interactions were derived. Moreover, a gold standard positive (GSP) dataset with the highest level of confidence that covered 78% of the high-quality interaction dataset and a gold standard negative (GSN) dataset with the lowest level of confidence were derived. In addition, we assessed four high-throughput experimental interaction datasets using the positives and the negatives as well as GSPs and GSNs. Our predicted network reconstructed from GSPs consists of 40 753 interactions among 2259 proteins, and forms 16 connected components. We mapped all of the MIPS complexes except for homodimers onto the predicted network. As a result, ~35% of complexes were identified interconnected. For seven complexes, we also identified some nonmember proteins that may be functionally related to the complexes concerned. This analysis is expected to provide a new approach for predicting the protein–protein interaction maps from other completely sequenced genomes with high-quality GO-based annotations.

INTRODUCTION

One of the main goals of functional genomics is to determine the function of genes predicted from the completely sequenced genomes. In the past decade, massive amounts of biological data have been accumulated from genome sequencing as well as from transcriptomes, proteomes and interactomes. It is a challenging task to integrate such relevant data sources to represent the comprehensive knowledge of genes within and between genomes, which provide specialized information to describe the biological roles of the products of genes. The Gene Ontology (GO) (1) is one such resource that is becoming the de facto standard for facilitating information search tasks across databases and for annotating gene products (2). It has been successfully used in protein classification, such as in Photobacterium profundum (3), Plasmodium falciparum (4), Drosophila, Anopheles (57), Oryza sativa (8), as well as Pan troglodytes and Homo sapiens (9,10). It can also be used in describing gene expression clustering results to explain why a cluster of genes shares a similar expression pattern (11).

The GO has been developed to offer controlled vocabularies for aiding in the annotation of molecular attributes for different model organisms. Three structured ontologies have been proposed, which allow the description of molecular function (MF), biological process (BP) and cellular component (CC). Each ontology is structured as a directed acyclic graph (DAG), which differs from hierarchies in that a ‘child’ (more specialized term) can have many ‘parents’ (less specialized terms or more general terms) and child terms are instances or components of parent terms. Thus, the information derived from the GO must be useful in developing new predictive systems, which may be integrated with other models in large-scale genomic research. Currently, originating from the GO, several functional association predictors have been constructed, which can be roughly grouped into two categories. The techniques in the first category are used to assess the functional associations between proteins in terms of the shared GO terms in a controlled vocabulary system (1215). However, they are restricted to protein pairs with the same annotations. Techniques from the second category assess the functional associations between proteins using the semantic similarity measures of pairs of terms assigned to them based on either information content (16) or GO structures (17). These two methods in the second category use very similar definitions for the similarity measure for GO annotations, although they treat the specificity of the most recent common ancestor (MRCA) of two GO terms in different ways (17). Motivated by the two methods in the latter class, in this work, we constructed a new functional predictor to systematically predict the map of potential physical interactions between yeast proteins by fully exploring the knowledge buried in two GO annotations for the yeast genome, namely, the BP and CC annotations. Our method is explicitly based upon Wu's similarity measure for GO annotations (17) and is extended to take the relative specificities of GO annotations into account within a given GO structure (see Materials and Methods). Our premise is straightforward from the following two observations: (i) interacting proteins often function in the same biological process, which assumes that two proteins acting in the same biological process are more likely to interact than two proteins involved in different processes, and moreover, proteins functioning in specific biological processes should be more likely to interact than proteins functioning in general processes (14,1820); (ii) to interact physically, proteins must exist in close proximity, at least transiently, which suggests that co-localization may serves as an useful predictor for protein interactions (19,21).

Since proteins perform their functions by interacting with one another and with other biomolecules, reconstructing a map of the protein–protein interactions of a cell is an important first step toward understanding protein function and cellular behavior (22,23). Recently, genome-scale protein interaction networks have been experimentally determined for Caenorhabditis elegans (24), Drosophila melanogaster (25), Helicobacter pylori (26), H.sapiens (12,14,27), and Saccharomyces cerevisiae (2831). Although these experimental techniques have drastically improved our knowledge of protein interactions, the datasets generated from these studies are often noisy and incomplete (32,33). The experiments are also labor-intensive, time-consuming and tedious. In addition, the number of possibly interacting protein pairs within one cell will be enormous, which makes complete experimental verification impractical. Therefore, computational methods are constantly needed to complement existing experimental approaches. Several prediction studies have been carried out by deriving information from the vast amount of biological data contained in the genomic datasets, such as gene neighborhood (3436), gene fusion events (37,38), gene co-occurrences or phylogenetic profiles (3941) and correlated mRNA expression patterns (42,43). In addition, protein interactions can also be extracted from the literature (4446). A comprehensive overview of these methods can be found elsewhere (47,48). Recently, in order to gain a more comprehensive understanding of the interactome, based on a single probabilistic framework, different genomic features were integrated to make large-scale predictions of protein–protein interactions in yeast (13,49) and human (14). As expected, prediction should be improved with the integration of more independent genomic features even if each one is a weak predictor of protein interactions (50). However, it is known that any delicate dependencies between features can confound the strength of the prediction in these integrated frameworks, although there may be no appreciable statistical dependence between the many possible pairs of these features (51).

Once protein–protein interaction networks have been reconstructed, either experimentally or computationally, they are usually analyzed to relate structural properties of networks with protein properties on a global or local topology view (48,52). In this study, we focused mainly on identifying the structural relationships among members of protein complexes. It is realized that no protein is an isolated island, but instead most seem to function by binding together in complexes (53); many important cellular functions are actually carried out by protein complexes that act as molecular ‘machines’ (54). Moreover, there may exist a higher-order organization of interacting complexes for the coordination of cellular functions (30). Meanwhile, it has been shown that many complexes in yeast and humans are nearly identical, which provides an understanding that, rather than at the protein level, they are conserved at the machine level during the course of evolution (55).

In this paper, we define a new metric for semantic similarity to score the degree of the functional association between two different proteins by comparing the relative specificity of pairs of GO terms assigned to them in similarity within a GO DAG. As mentioned above, both the CC and BP ontologies and their respective annotations were used in this study. To evaluate the method, an integrated high-quality interaction dataset was applied. Based on the evaluation, a positive and a negative dataset were selected, and then used to assess the four large-scale experiments mentioned above (2831). The result of the assessment is in agreement with that of the previous studies (32,33). In addition, we used the map reconstructed from the GSPs, which is with highest confidences in positives, to analyze the internal possible interacting relationships among the partners in the MIPS complexes. This reveals that our method may be a little biased and that the predicted map seems to be more comprehensive than those obtained from the other approaches mentioned above. Accordingly, our method may also be applied to the other completely sequenced genomes that are well annotated with the GO schemes, such as the human genome.

MATERIALS AND METHODS

The GO and yeast annotations

Yeast protein annotations were downloaded from the Organelle DB (56) and for the compatibility of computation, the September 2004 release of the GO was used. Organelle DB is the first on-line resource devoted to the identification and presentation of eukaryotic proteins localized to organelles and subcellular structures. In the simple eukaryote S.cerevisiae (yeast), Organelle DB collects and presents several large-scale protein localization projects (21,57,58) and the localization data that has been generated piecemeal from independent small-scale studies. Furthermore, to facilitate data interoperability, proteins in Organelle DB have been annotated using the three controlled vocabularies (BP, CC and MF) from the GO consortium.

Data filtering criteria

In order for computational effectiveness and clarity, we excluded the following GO terms from the analysis:

  1. GO terms that are defined as ‘biological_process unknown’ (GO: 0000004) (including 641 annotations) in BP ontology;
  2. For the CC ontology, there are six terms descending directly from its root (GO: 0005575), namely, ‘cellular_component unknown’ (GO: 0008372), ‘unlocalized’ (GO: 0005941), ‘virion’ (GO: 0019012), ‘immunoglobulin complex’ (GO: 0019814), ‘extracellular’ (GO: 0005576) and ‘cell’ (GO: 0005623). Only the term ‘cell’ was used in the analysis because we focused on the proximity of co-localized yeast protein pairs. In addition, we found that there were only eight proteins annotated with the term ‘extracellular’ and no proteins were assigned to its descendant terms. These eight proteins were also annotated with other terms descending from the term ‘cell’ (Supplementary Table S1). Therefore, the term ‘cell’ was then set to be the root of the GO cellular component in this study.

The distributions of GO terms and the respective yeast protein annotations before and after applying the procedure of filtering are listed in Supplementary Table S2. Our analysis was thereafter based on the filtered datasets.

Seven known protein–protein interaction datasets

Seven existing protein–protein interaction datasets were used for validation of our method (D5–7) and for assessment of their accuracy by our predicted interaction dataset (D1–4). They are:

D1–2: datasets ‘Gavin’ and ‘Ho’; both of them consist of binary interactions converted from the data inferred from mass spectrometry of coimmunoprecipitated complexes (30,31) using the spoke model, which has been shown to be more reliable than the matrix model in this case (59,60).

D3–4: datasets ‘Ito’ and ‘Uetz’, each from a different independent genome-scale yeast two-hybrid experiment (28,29).

D5: ‘MIPS complexes’ dataset, which comprises binary interactions converted from MIPS complexes (61) without topological information using a matrix model.

D6: ‘MIPS interactions’ dataset which is composed of those MIPS physical interactions (61) that have been inferred from small-scale experiments.

D7: ‘de Lichtenberg’ dataset, which refers to those integrated interactions involved in the processes during the yeast cell cycle (62).

The numbers of proteins and interactions of the seven known protein–protein interaction datasets are listed in Supplementary Table S3.

Relative Specificity Similarity (RSS) of two proteins annotated in a GO

Since each GO is structured as a DAG, wherein one term is a child of one or multiple parents, and child terms are instances (is-a relationship) or components (part-of relationship) of parent terms, there is often more than one path from a GO term up to the topmost level of the GO, namely, the root term of the GO. In this paper, the topmost level of a GO indicates the root term ‘biological_process’ (GO: 0008150) of the BP ontology or the term ‘cell’ (GO: 0005623) of the CC ontology. As stated in Wu's definition, the collection of paths with each one corresponding to a complete trace from the concerned GO term to the root term of the GO can be represented as a graph induced from the concerned term (17). For a given GO, let termi and termj be two terms, Paths(termi) and Paths(termj) be the paths in the graphs induced from termi and termj respectively, and dist(u, v) be the number of edges along the shortest path between term u and term v, so that its value equals zero if u and v are the same term. Three different configurations may exist for two terms, termi and termj from a given GO (Figure 1). In each configuration, the RSS of two GO terms consists of three different components. They are denoted α, β and γ, respectively. Component α is defined in formula 1; it measures how specific the MRCA of the two terms is according to the structure of the GO and is equivalent to the definition of S in Wu's work (17),

α=maxpathmPaths(termi),pathnPaths(termj){the number of common termsbetween pathmand pathn}1
1

Component β measures how relatively general termi and termj are in the GO and is defined in formula 2. The generality of a term is defined as the minimum distance between the term and all of the leaf terms descending from it. Leaf terms in a GO are those terms without any descendant. Obviously, the larger the distance, the more general is a term.

β=max{minuU{dist(termi,u)},minvV{dist(termj,v)}}
2

where U = {all leaf nodes descending from termi} and V = {all leaf nodes descending from termj}.

Figure 1
Three different configurations describing two terms in a given DAG. MRCA which is called the most recent common ancestor of termi and termj, represents the most specific of all common ancestors of the term pair. (A) Two terms overlap; (B) termj is a descendant ...

Component γ measures the local distance between two terms relative to the MRCA and is defined as follows:

γ=dist(MRCA,termi)+dist(MRCA,termj)
3

If γ is small, it implies termi and termj share much similarity locally relative to the MRCA.

Then, the RSS between two terms of a given GO, termi and termj can be quantified by combining α, β and γ together in formula 4,

RSS(termi,termj)=maxDepthGOmaxDepthGO+γαα+β
4

where maxDepthGO is the maximum distance from the root term of the GO to the leaf terms (i.e. the number of edges along the longest path in the GO). From the definition, the values of RSS are between 0 and 1. Clearly, RSS = 0 (α = 0) indicates that the MRCA of termi and termj is the root of the GO, which means that the two terms share no commonality in describing protein properties; on the other hand, RSS = 1 (γ = 0 and β = 0) indicates that termi and termj are the same leaf term, which means that the two terms are most specific in describing protein attributes.

Based on the definition of RSS between the two terms, we can formalize a metric for measuring the relationship strength, including the functional association or the location proximity, between two different proteins annotated in the GO. Let P and Q be two proteins of interest, and terms(P) and terms(Q) the sets of all the GO terms assigned to protein P and Q, respectively. We define the relationship strength between P and Q, RSSGO(P, Q), as the maximum RSS of all possible term pairs from terms(P) and terms(Q), respectively, namely,

RSSGO(P,Q)=maxuterms(P)vterms(Q){RSS(u,v)}
5

Statistical significance of protein pairs falling in various levels of RSS values

In order to determine whether the assignment of protein pairs into categories with different RSS values is statistically significant and to minimize the systematic biases intrinsically raised from the structure of a GO, a Z-score analysis was applied. Firstly, the [0,1] interval was equally divided into 10 categories, namely, {(0.1 × i, 0.1 × (i + 1)], i = 0,1,  , 9}, plus the other one with RSS equal to 0, which indicates that the MRCA of all term pairs respectively assigned to two proteins is the root of the GO. Then, Z-score values, defined as (#pairsannotated − #pairsrandom)/SDrandom, were calculated from the number of protein pairs from a given GO annotation (BP or CC) and random annotation based on the GO DAG. For computational simplicity and without loss of generality, we randomly assigned each of all distinct proteins to one term of the GO and then calculated the number of protein pairs falling in an RSS category. This process was repeated 1000 times (we have also tested with 5000 times and a similar result was found) and afterwards the corresponding average numbers (#pairsrandom) and their standard deviations (SDrandom) were calculated. As we know, the larger the Z-score value, the less probable it is that the relationship strength of a pair of proteins is due to chance from the structure of ontology. Therefore, the Z-score value for each RSS category indicates the confidence of the relationship strength of protein pairs measured by our method.

RESULTS

Distributions of all pairs of annotated proteins according to their RSS values

In all, 3832 proteins are assigned to one or more of 265 CC terms in Organelle DB, which produces 5722 annotations and 7 340 196 (3832×3831/2) pairs of different proteins in the CC ontology. Similarly, there are 5574 annotations and 5 092 836 protein pairs in the BP ontology (Supplementary Table S2). For a given GO DAG, an RSS is assigned to each pair of different proteins according to formula 5. The distributions of all annotated protein pairs in various RSS categories for the two GOs are shown in Figure 2, where the [0,1] interval of RSS values is split into 11 categories (see Materials and Methods). Each blue line represents the cumulative rate of protein pairs along with the 11 categories. In order to draw the statistical significance of protein pairs falling in various levels of RSS values for the GO, Z-score values for each of the 11 categories were calculated (see Materials and Methods for details). As shown in Figure 3, in each of the two RSS categories, (0.9, 1] and (0.9, 0.8], the number of pairs of proteins annotated on either the CC or BP ontology is more than 142 standard deviations greater than the mean number with randomized annotations. When the RSS values are equal to or less than 0.3 for the CC ontology and are equal to or less than 0.4 for the BP ontology, no statistical significance could be detected. The Z-score values are less than 1.5, indicating that most, if not all, of these pairs may not be functionally associated.

Figure 2
Distributions of the annotated protein pairs with various RSS values in the CC (A) and BP (B) ontologies. The [0,1] interval of the x-axis is equally divided into 11 categories, as defined in Materials and Methods. The histogram (relating to the left ...
Figure 3
Statistical significance of the quality scoring system. For a given GO, which is either CC (A) or BP (B), Z-score values, defined as #pairsannotated−#pairsrandom)/SDrandom, were calculated from the number of protein pairs from the given GO annotation ...

Based on the distribution analyses of Z-scores for CC and BP ontologies, the 11 categories of RSSCC could roughly be divided into three groups with high confidence (H, 0.8 < RSSCC ≤ 1), medium confidence (M, 0.3 < RSSCC ≤ 0.8) and low confidence (L, 0 ≤ RSSCC ≤ 0.3). Like RSSCC, RSSBP can also be split into three groups, with high confidence (H, 0.8 < RSSBP ≤ 1), medium confidence (M, 0.4 < RSSBP ≤ 0.8) and low confidence (L, 0 ≤ RSSBP ≤ 0.4). Therefore, there are nine (3×3) data segments (DSs) (a total of 5 010 195 protein pairs encompassing 3166 proteins) with different combinations of confidences according to the subdivisions of both RSS values, for example, a DS which consists of protein pairs with high-confidence RSSCC (H) and medium-confidence RSSBP (M) is called HM DS (Figure 4A).

Figure 4
Nine data segments (DSs) with different confidences related to CC and BP ontologies (A) and the selection of positives and negatives, as well as GSPs and GSNs (B). Nine DSs contain 5 010 195 protein pairs encompassing 3166 proteins in ...

Gold standard positive and negative protein interaction datasets

How likely is one pair of proteins from each of these nine DSs to interact with each other physically? In order to address this issue, the union of three existing protein–protein interaction datasets—binary interactions in a matrix model from the MIPS complexes, the MIPS small-scale physical interactions, and the integrated interactions by de Lichtenberg et al. (see Materials and Methods)—was chosen as trusted, and thus, is called ‘valid experimental interactions (VEIs)’. The MIPS complexes and the MIPS physical interactions are often used as or as part of ‘gold standard positives’ to validate various prediction methods (13,51,63) and are also used to assess high-throughput interaction datasets (32,33). There are 11 041 unique binary interactions among 1472 proteins in our VEIs, and Supplementary Figure S1 shows the distribution of interactions among the three datasets. The distribution of VEIs among these nine DSs is shown in Figure 5A. Interestingly, 78% (8620 out of 11 041) of interactions fall into the HH DS, suggesting that the HH DS may contain most, if not all, of yeast protein–protein interactions; whereas 0.06% (7 out of 11 041) of interactions are in the LL DS, suggesting that protein pairs in LL DS seem to be much less likely to interact physically. Similar to the process mentioned above, we applied the analysis of statistical significance using Z-score values calculated in each DS (Figure 5B). As a result, we could roughly classify the nine DSs into two groups, one called the ‘positive dataset’ (3101 proteins and 152 944 interactions) including two DSs (HH and MH) with Z-score values larger than 320, and the other called the ‘negative dataset’ (3166 proteins and 4 857 251 interactions) including the remainder of the DSs (HL, LL, LM, MM, HL, LH and HM) with Z-score values ranging from −30 to 42. In particular, the HH DS, whose Z-score values reaches 4019, was selected as a gold standard positive dataset (GSPs; 2259 proteins and 40 753 interactions), whereas two DSs (ML and LL), whose Z-score values are lower than −23, were combined as a gold standard negative dataset (GSNs; 3165 proteins and 1 460 378 interactions) in this study (Figure 4).

Figure 5
Distribution of the numbers of VEIs covered by each of the nine DSs (A), and statistical significance of VEIs in nine DSs using Z-score analysis (B). In each DS, a Z-score value (#PPisevi − #pairsrandom)/SDrandom labeled for each bar was calculated ...

Assessment of four known genome-scale experimental datasets

Four known genome-scale protein interaction datasets (D1–4) were assessed using our positives and negatives (Supplementary Figure S2A). The coverage of two ‘pull-down’ interaction datasets (30,31) in positives is 60% (1404/2324 for the Gavin dataset) and 23% (402/1766 for the Ho dataset), respectively, while the coverage of two genome-scale Y2H interaction datasets (28,29) in positives reaches 35% (126/356 for the Uetz dataset) and 14% (190/1392 for the Ito dataset), respectively. It is worth noting that, assessed by our negative dataset, Gavin, Uetz, Ho and Ito datasets contain very high proportions of false-positive interactions, of 40% (920/2324), 65% (230/356), 77% (1364/1766) and 86% (1202/1392), respectively.

The four genome-scale interaction datasets were also assessed using GSPs and GSNs (Supplementary Figure S2B) and a similar result was obtained. For the Gavin dataset, 42% (984/2324) are covered by GSPs, while only 6% (141/2324) are present as false positives. For Uetz, Ho and Ito datasets, 14% (51/356), 11% (188/1766) and 5% (72/1392) are respectively covered by GSPs, while 14% (49/356), 15% (271/1766) and 27% (371/1392) are found to be false positives, respectively.

Consequently, it appears that the Gavin dataset discovers true interactions at a larger coverage and contains a lower proportion of false-positive interactions, whereas Uetz, Ho and Ito datasets have smaller coverage of true interactions and probably populated by more false positives. These observations are in agreement with the assessment result reported in the previous studies (32,33).

Initial analysis of the topologies of the MIPS complexes with the predicted network

In order to minimize the error rate of the predicted interactions, only the highest-confidence interaction dataset called GSPs was used here in the network analysis. Based on the network reconstructed from GSPs, we could proceed to analyze it using various approaches and algorithms of graph theory to relate its structural properties to protein functions. A good overview of these analyses can be found in Xia et al. (48). Here, we focus on the identification of the topology of each of the MIPS complexes using our predicted network. There are 40 753 interactions encompassing 2259 proteins in GSPs (Figure 4), and the whole interaction network derived from the dataset consists of 16 connected components (Table 1, see also Supplementary Figure S3). Among these 16 components, the largest one (connected component ID: 1 as listed in Table 1) contains 30 899 interactions among 2093 proteins, and seven (connected component ID: 10–16 as listed in Table 1) are each composed of only one interaction between two proteins. In addition, out of 600 periodically expressed (‘dynamic’) proteins from de Lichtenberg et al. (62), we found 319 ones in our nine DSs and 228 in GSPs (Table 1, see also Supplementary Figure S3 where dynamic proteins are in red). Within interaction networks, a protein complex is ideally identified as a ‘complete subgraph’ where every pair of a complex's members tends to interact with each other (13). Therefore, various clustering techniques are suggested to detect protein complexes (64,65). However, this rarely happens in reality, such as in the Arp2/3 complex in yeast (66). Here, we are interested in the analysis of the structures of the MIPS complexes based on the predicted network. After excluding 50 homodimer complexes, 214 MIPS complexes with at least two distinct members were analyzed. There are 120 complexes, each with all members found in the predicted interaction network. Consequently, 76 out of the 120 complexes are each interconnected in the network, including 71 within the largest connected component (connected component ID: 1) and the remaining five in the other four small connected components (connected component ID: 3, 5, 9 and 11) (Supplementary Table S4). Such a topology of a complex from the 76 ones is called a connected subgraph. Very interestingly, out of the 120 complexes, we found 27 ones that each splits into two connected subgraphs in the largest connected component. Seven of them can be linked by at least one path of two interactions, with one carrying a highest RSS value (=1, either for the CC or for the BP) (Supplementary Table S5), indicating that the proteins along the paths, which are not members of the seven complexes might be biologically related to the function of the complexes.

Table 1
Numbers of proteins and interactions of 16 connected graphs found in the predicted interaction network constructed from the GSP dataset

Analysis of the five complexes in the four small connected components

Figure 6 shows five complexes each with all members forming a connected subgraph in the four small connected components. They are Aut2p/Aut7p complex (the MIPS identifier: 260.100), dynactin complex (140.30.30.30), F0/F1 ATP synthase (complex V) (420.50), succinate dehydrogenase complex (complex II) (420.20) and cytochrome bc1 complex (420.30).

Figure 6
Five complexes each with all members forming a connected subgraph. They are in four small connected components (connected component ID: 3, 5, 9 and 11) of the interaction network constructed from GSPs. Proteins as members, and the nonmember proteins of ...

Interestingly, in the mapped F0/F1 ATP synthase complex (Figure 6), we found two nonmember proteins, namely, STF1 and STF2. Both are ATPase stabilizing factors. In particular, STF1 stabilizes and facilitates the formation of the complex between mitochondrial ATP synthase and its intrinsic inhibitor protein (67), while STF2 binds to F0-ATPase and facilitates binding of inhibitor and a 9 kDa protein to F1-ATPase. In the mapped succinate dehydrogenase complex, nonmember NDI1 is an NADH2 dehydrogenase (ubiquinone) (68), while nonmember CYT, which is a member of the cytochrome bc1 complex (Figure 6), functions as an electron transporter and transfers electrons within CoQH2-cytochrome c reductase complex activity (69). Therefore, both of them are functionally related to the succinate dehydrogenase complex in terms of cellular processes and biological functions. These findings have two implications. If not new members of a complex, these identified nonmember proteins might interact with the F0/F1 ATP synthase (such as STF1 and STF2) and the succinate dehydrogenase complex (such as NDI1 and CYT). On the other hand, two complexes might interact with each other somewhere, such as the succinate dehydrogenase complex and the cytochrome bc1 complex via the mediator CYT (Figure 6).

Analysis of the split complexes

For the mapped complexes split into two connected subgraphs, we searched those paths of two interactions where at least one carries a highest RSS value (=1, either for the CC or for the BP) linking the two connected subgraphs. As a result, we found seven split complexes could be connected as a whole by adding one or multiple such paths (Supplementary Table S5) and five of them are illustrated in Figure 7. Interestingly, as shown in Figure 7A–C, the nonmember proteins in the paths in the three complexes are also members of other complexes. In Figure 7A, SPC72 is found in the SPB components complex (480.10) and its N-terminal domain interacts with the Tub4p complex on the cytoplasmic side of the SPB (70). In Figure 7B, CDC34 and the F-box protein MET30 are required for degradation of the Cdk-inhibitory kinase Swe1, and both of them as well as CDC53 and SKP1 have been found in the SCF-MET30 complex (445.30) (71). In Figure 7C, CTR9 is required for G1 cyclin expression (72) and is also found in another complex that includes CDC73 (30).

Figure 7Figure 7
Five complexes whose members are in one connected component of the network and each of which splits into two connected subgraphs that can be linked by at least one path of two interactions with one carrying a highest RSS value (=1, either for the CC or ...

Figure 7D shows the cytochrome c oxidase (complex IV) (420.40) whose two connected subgraphs are linked by 20 paths in our predicted network. Ten of these 20 paths are all through the protein PET100, while the other 10 paths are all through another protein COX20. Interestingly, both PET100 and COX20 are found to be essential for the assembly of this complex. PET100 is required for the assembly of yeast cytochrome c oxidase (420.40) into an active holoenzyme (7375), whereas COX20 acts as a membrane-bound chaperone necessary for the cleavage of pCox2p (the subunit 2 precursor) and for interaction of the mature protein with other subunits of cytochrome c oxidase in a later step of the assembly process (76).

As shown in Figure 7E, LRP1 is implicated in both non-homologous DNA end joining and homologous recombination (77), and therefore the complex DNA ligase IV (510.180.30.20) might have one more partner (LRP1) than identified currently. This is expected to provide a clue for further experimental validation.

In summary, 35% (76/214) of the MIPS complexes are completely identified from our interaction network. Furthermore, 13% (27/214) of complexes, all of whose partners are in the largest connected component of the network, are split into two connected subgraphs each. Five such split complexes are linked by at least one reliable path of two interactions, and the nonmember proteins in the paths actually physically interact with the complexes concerned. We also predict that the complex DNA ligase IV might contain one more partner, which deserves experimental confirmation (Figure 7E).

DISCUSSION

We present here a new method, which is based on semantic similarity measures, to reconstruct the map of yeast protein–protein interactions by mining the knowledge of functional associations from the GO-based annotations. As a result, a positive and a negative datasets containing 152 944 and 4 857 251 interactions, respectively were derived. Moreover, we compiled GSPs with high confidence and GSNs with low confidence of yeast protein–protein interactions, containing 40 753 and 1 460 378 interactions, respectively. It is estimated that the number of actual interactions in yeast ranges from ~30 000 to 100 000 (32,53,59). Interestingly, the number of protein–protein interactions in GSPs falls into this range. In particular, our GSPs contain 78% of those interactions in VEIs, which indicates that GSPs may be less biased than other published datasets. Furthermore, the number of protein pairs in GSNs is 35.8 times larger than the number in GSPs, which is in accordance with the expectation that the estimated number of non-interacting protein pairs is several orders of magnitude higher than the number of positives (78).

As regards the application of GSPs and GSNs, it has been found that, in addition to the method of choice of highly reliable interactions (positives), how unbiased negative examples are chosen also has a strong effect on the performance of any of the supervised machine learning methods for prediction of protein–protein interactions. Until now there have been several strategies to select negative datasets for detecting protein–protein interactions, such as choosing random pairs of interacting proteins (79) and selecting the pairs of proteins that are known to be localized in different cellular components (13,63). Since it is possible that two proteins localized distinctly (e.g. in the nucleus and cytoplasm) may sometimes physically interact (19), in this study, protein pairs both involved in weakly-related or unrelated biological processes and localized in different cellular components are considered for and compiled into our GSNs. Thus, it seems that the resulting GSN dataset is less biased compared with those constructed using either location proximity criteria or a randomization strategy.

There are several caveats of our method. First, obviously, the quality of the predicted positives and negatives is constrained by the accuracy of the yeast GO annotations or by our design approach. For instance, we have found that there are seven VEIs in the LL DS (Table 2A) and the protein annotations in the CC and BP ontologies are listed in Table 2B. Six interactions are inferred from small-scale experiments, except the one between CAF4 and MOB1, which is inferred from a matrix model that represents a MIPS complex (the CCR4 complex, ‘MIPS identifier: 510.190.110’) of unknown topology. It is anticipated that our method will be an avenue for future work following more accurate GO annotations. Second, the elimination of GO terms annotated as ‘unknown biological process’ (641 annotations) or ‘unknown cellular component’ may cause some true-positive interactions missing from our predicted network, in the case that the interacting partners are unknown in the biological process or cellular component. This problem can be solved in part as the biological knowledge of these ‘missing’ proteins accumulates in the future. Third, it has been demonstrated that protein–protein interaction networks in several eukaryotic organisms contain significantly more self-interacting proteins (homodimers) than would be expected if such interactions randomly appeared in the course of evolution (80). However, our method fails in predicting such interactions between the same proteins. Therefore, 50 MIPS homodimer complexes each containing only one protein are excluded from the analysis in this study. Finally, although the strength of relationship between two proteins from our predicted interaction dataset shows high significance, we must note that all predicted interactions should be validated for their functionality by experimental approaches.

Table 2A
Seven VEIs in the LL DS
Table 2B
CC and BP annotations of the proteins for the seven VEIs in the LL DS

As we know, cellular functions are likely to be carried out in a manner of functional modules that often encompass protein complexes (23). As analyzed in this study, out of 120 complexes whose partners are all mapped in our network, 22% (26/120) contain one or multiple dynamic proteins. These dynamic complexes may be just-in-time synthesized, such as nucleosomal protein complex (320) (Supplementary Table S4), or just-in-time assembled, such as replication complex (410.35) (Supplementary Table S4) (62). Functional modules can include transient regulated elements of a relatively distinct process, for example, various transcriptionally regulated cyclins and inhibitors associated with the Cdc28p module at their specific time of synthesis during the yeast mitotic cell cycle (62). Therefore, it is more interesting to do analysis on the temporal properties (such as modules) of the predicted networks rather than on static topological properties, because such studies will provide a basis for further prediction of detailed gene functions and prediction of biological pathways (81).

In summary, protein–protein interaction networks in various organisms are increasingly becoming the focus of understanding the functional organization of the proteome. Although only the yeast genome is demonstrated in this study, our method is expected to be applied to other completely sequenced genomes with high-quality annotations based on the GO or other ontologies, where known biological knowledge is stored and well represented, to computationally reconstruct their respective protein–protein interaction maps for functional genomic research.

SUPPLEMENTARY DATA

Supplementary Data are available at NAR Online.

Supplementary Material

[Supplementary Data]

Acknowledgments

The authors thank two anonymous reviewers for their valuable comments. This work was supported by the National High-Tech Research and Development Program of China (Grant No. 2003AA231030), the Ministry of Education of China (Grant No. 105011) and Beijing Normal University. Funding to pay the Open Access publication charges for this article was provided by Beijing Normal University.

Conflict of interest statement. None declared.

REFERENCES

1. Ashburner M., Ball C.A., Blake J.A., Botstein D., Butler H., Cherry J.M., Davis A.P., Dolinski K., Dwight S.S., Eppig J.T., et al. Gene Ontology: tool for the unification of biology. The Gene Ontology Consortium. Nature Genet. 2000;25:25–29. [PMC free article] [PubMed]
2. Azuaje F., Wang H., Bodenreider O. Ontology-driven similarity approaches to supporting gene functional assessment. Proceedings of the ISMB'2005 SIG meeting on Bio-ontologies; 2005. pp. 9–10.
3. Vezzi A., Campanaro S., D'Angelo M., Simonato F., Vitulo N., Lauro F.M., Cestaro A., Malacrida G., Simionati B., Cannata N., et al. Life at depth: photobacterium profundum genome sequence and expression analysis. Science. 2005;307:1459–1461. [PubMed]
4. Hall N., Pain A., Berriman M., Churcher C., Harris B., Harris D., Mungall K., Bowman S., Atkin R., Baker S., et al. Sequence of Plasmodium falciparum chromosomes 1, 3-9 and 13. Nature. 2002;419:527–531. [PubMed]
5. Stolc V., Gauhar Z., Mason C., Halasz G., van Batenburg M.F., Rifkin S.A., Hua S., Herreman T., Tongprasit W., Barbano P.E., et al. A gene expression map for the euchromatic genome of Drosophila melanogaster. Science. 2004;306:655–660. [PubMed]
6. Wittkopp P.J., Haerum B.K., Clark A.G. Evolutionary changes in cis and trans gene regulation. Nature. 2004;430:85–88. [PubMed]
7. Zdobnov E.M., von Mering C., Letunic I., Torrents D., Suyama M., Copley R.R., Christophides G.K., Thomasova D., Holt R.A., Subramanian G.M., et al. Comparative genome and proteome analysis of Anopheles gambiae and Drosophila melanogaster. Science. 2002;298:149–159. [PubMed]
8. Yu J., Hu S., Wang J., Wong G.K., Li S., Liu B., Deng Y., Dai L., Zhou Y., Zhang X., et al. A draft sequence of the rice genome (Oryza sativa L. ssp. indica) Science. 2002;296:79–92. [PubMed]
9. The Chimpanzee Sequencing and Analysis Consortium. Initial sequence of the chimpanzee genome and comparison with the human genome. Nature. 2005;437:69–87. [PubMed]
10. Altshuler D., Brooks L.D., Chakravarti A., Collins F.S., Daly M.J., Donnelly P. A haplotype map of the human genome. Nature. 2005;437:1299–1320. [PMC free article] [PubMed]
11. Khatri P., Draghici S. Ontological analysis of gene expression data: current tools, limitations, and open problems. Bioinformatics. 2005;21:3587–3595. [PMC free article] [PubMed]
12. Stelzl U., Worm U., Lalowski M., Haenig C., Brembeck F.H., Goehler H., Stroedicke M., Zenkner M., Schoenherr A., Koeppen S., et al. A human protein–protein interaction network: a resource for annotating the proteome. Cell. 2005;122:957–968. [PubMed]
13. Jansen R., Yu H., Greenbaum D., Kluger Y., Krogan N.J., Chung S., Emili A., Snyder M., Greenblatt J.F., Gerstein M. A Bayesian networks approach for predicting protein–protein interactions from genomic data. Science. 2003;302:449–453. [PubMed]
14. Rhodes D.R., Tomlins S.A., Varambally S., Mahavisno V., Barrette T., Kalyana-Sundaram S., Ghosh D., Pandey A., Chinnaiyan A.M. Probabilistic model of the human protein–protein interaction network. Nat. Biotechnol. 2005;23:951–959. [PubMed]
15. Martin D., Brun C., Remy E., Mouren P., Thieffry D., Jacq B. GOToolBox: functional analysis of gene datasets based on Gene Ontology. Genome Biol. 2004;5:R101. [PMC free article] [PubMed]
16. Lord P.W., Stevens R.D., Brass A., Goble C.A. Investigating semantic similarity measures across the Gene Ontology: the relationship between sequence and annotation. Bioinformatics. 2003;19:1275–1283. [PubMed]
17. Wu H., Su Z., Mao F., Olman V., Xu Y. Prediction of functional modules based on comparative genome analysis and Gene Ontology application. Nucleic Acids Res. 2005;33:2822–2837. [PMC free article] [PubMed]
18. Letovsky S., Kasif S. Predicting protein function from protein–protein interaction data: a probabilistic approach. Bioinformatics. 2003;19:i197–i204. [PubMed]
19. Schwikowski B., Uetz P., Fields S. A network of protein–protein interactions in yeast. Nat. Biotechnol. 2000;18:1257–1261. [PubMed]
20. Vazquez A., Flammini A., Maritan A., Vespignani A. Global protein function prediction from protein–protein interaction networks. Nat. Biotechnol. 2003;21:697–700. [PubMed]
21. Huh W.K., Falvo J.V., Gerke L.C., Carroll A.S., Howson R.W., Weissman J.S., O'Shea E.K. Global analysis of protein localization in budding yeast. Nature. 2003;425:686–691. [PubMed]
22. Eisenberg D., Marcotte E.M., Xenarios I., Yeates T.O. Protein function in the post-genomic era. Nature. 2000;405:823–826. [PubMed]
23. Hartwell L.H., Hopfield J.J., Leibler S., Murray A.W. From molecular to modular cell biology. Nature. 1999;402:C47–C52. [PubMed]
24. Li S., Armstrong C.M., Bertin N., Ge H., Milstein S., Boxem M., Vidalain P.O., Han J.D., Chesneau A., Hao T., et al. A map of the interactome network of the metazoan C. elegans. Science. 2004;303:540–543. [PMC free article] [PubMed]
25. Giot L., Bader J.S., Brouwer C., Chaudhuri A., Kuang B., Li Y., Hao Y.L., Ooi C.E., Godwin B., Vitols E., et al. A protein interaction map of Drosophila melanogaster. Science. 2003;302:1727–1736. [PubMed]
26. Rain J.C., Selig L., De Reuse H., Battaglia V., Reverdy C., Simon S., Lenzen G., Petel F., Wojcik J., Schachter V., et al. The protein–protein interaction map of Helicobacter pylori. Nature. 2001;409:211–215. [PubMed]
27. Rual J.F., Venkatesan K., Hao T., Hirozane-Kishikawa T., Dricot A., Li N., Berriz G.F., Gibbons F.D., Dreze M., Ayivi-Guedehoussou N., et al. Towards a proteome-scale map of the human protein–protein interaction network. Nature. 2005;437:1173–1178. [PubMed]
28. Ito T., Chiba T., Ozawa R., Yoshida M., Hattori M., Sakaki Y. A comprehensive two-hybrid analysis to explore the yeast protein interactome. Proc. Natl Acad. Sci. USA. 2001;98:4569–4574. [PMC free article] [PubMed]
29. Uetz P., Giot L., Cagney G., Mansfield T.A., Judson R.S., Knight J.R., Lockshon D., Narayan V., Srinivasan M., Pochart P., et al. A comprehensive analysis of protein–protein interactions in Saccharomyces cerevisiae. Nature. 2000;403:623–627. [PubMed]
30. Gavin A.C., Bosche M., Krause R., Grandi P., Marzioch M., Bauer A., Schultz J., Rick J.M., Michon A.M., Cruciat C.M., et al. Functional organization of the yeast proteome by systematic analysis of protein complexes. Nature. 2002;415:141–147. [PubMed]
31. Ho Y., Gruhler A., Heilbut A., Bader G.D., Moore L., Adams S.L., Millar A., Taylor P., Bennett K., Boutilier K., et al. Systematic identification of protein complexes in Saccharomyces cerevisiae by mass spectrometry. Nature. 2002;415:180–183. [PubMed]
32. von Mering C., Krause R., Snel B., Cornell M., Oliver S.G., Fields S., Bork P. Comparative assessment of large-scale datasets of protein–protein interactions. Nature. 2002;417:399–403. [PubMed]
33. Edwards A.M., Kus B., Jansen R., Greenbaum D., Greenblatt J., Gerstein M. Bridging structural biology and genomics: assessing protein interaction data with known complexes. Trends Genet. 2002;18:529–536. [PubMed]
34. Yanai I., Mellor J.C., DeLisi C. Identifying functional links between genes using conserved chromosomal proximity. Trends Genet. 2002;18:176–179. [PubMed]
35. Overbeek R., Fonstein M., D'Souza M., Pusch G.D., Maltsev N. The use of gene clusters to infer functional coupling. Proc. Natl Acad. Sci. USA. 1999;96:2896–2901. [PMC free article] [PubMed]
36. Tamames J., Casari G., Ouzounis C., Valencia A. Conserved clusters of functionally related genes in two bacterial genomes. J. Mol. Evol. 1997;44:66–73. [PubMed]
37. Enright A.J., Iliopoulos I., Kyrpides N.C., Ouzounis C.A. Protein interaction maps for complete genomes based on gene fusion events. Nature. 1999;402:86–90. [PubMed]
38. Marcotte E.M., Pellegrini M., Ng H.L., Rice D.W., Yeates T.O., Eisenberg D. Detecting protein function and protein–protein interactions from genome sequences. Science. 1999;285:751–753. [PubMed]
39. Tatusov R.L., Koonin E.V., Lipman D.J. A genomic perspective on protein families. Science. 1997;278:631–637. [PubMed]
40. Gaasterland T., Ragan M.A. Microbial genescapes: phyletic and functional patterns of ORF distribution among prokaryotes. Microb. Comp. Genomics. 1998;3:199–217. [PubMed]
41. Pellegrini M., Marcotte E.M., Thompson M.J., Eisenberg D., Yeates T.O. Assigning protein functions by comparative genome analysis: protein phylogenetic profiles. Proc. Natl Acad. Sci. USA. 1999;96:4285–4288. [PMC free article] [PubMed]
42. Qian J., Dolled-Filhart M., Lin J., Yu H., Gerstein M. Beyond synexpression relationships: local clustering of time-shifted and inverted gene expression profiles identifies new, biologically relevant interactions. J. Mol. Biol. 2001;314:1053–1066. [PubMed]
43. Jansen R., Greenbaum D., Gerstein M. Relating whole-genome expression data with protein–protein interactions. Genome Res. 2002;12:37–46. [PMC free article] [PubMed]
44. Friedman C., Kra P., Yu H., Krauthammer M., Rzhetsky A. GENIES: a natural-language processing system for the extraction of molecular pathways from journal articles. Bioinformatics. 2001;17:S74–S82. [PubMed]
45. Marcotte E.M., Xenarios I., Eisenberg D. Mining literature for protein–protein interactions. Bioinformatics. 2001;17:359–363. [PubMed]
46. Stapley B.J., Benoit G. Biobibliometrics: information retrieval and visualization from co-occurrences of gene names in Medline abstracts. Pac. Symp. Biocomput. 2000:529–540. [PubMed]
47. Valencia A., Pazos F. Computational methods for the prediction of protein interactions. Curr. Opin. Struct. Biol. 2002;12:368–373. [PubMed]
48. Xia Y., Yu H., Jansen R., Seringhaus M., Baxter S., Greenbaum D., Zhao H., Gerstein M. Analyzing cellular biochemistry in terms of molecular networks. Annu. Rev. Biochem. 2004;73:1051–1087. [PubMed]
49. Lee I., Date S.V., Adai A.T., Marcotte E.M. A probabilistic functional network of yeast genes. Science. 2004;306:1555–1558. [PubMed]
50. Marcotte E.M., Pellegrini M., Thompson M.J., Yeates T.O., Eisenberg D. A combined algorithm for genome-wide prediction of protein function. Nature. 1999;402:83–86. [PubMed]
51. Lu L.J., Xia Y., Paccanaro A., Yu H., Gerstein M. Assessing the limits of genomic data integration for predicting protein networks. Genome Res. 2005;15:945–953. [PMC free article] [PubMed]
52. Barabasi A.L., Oltvai Z.N. Network biology: understanding the cell's functional organization. Nature Rev. Genet. 2004;5:101–113. [PubMed]
53. Kumar A., Snyder M. Protein complexes take the bait. Nature. 2002;415:123–124. [PubMed]
54. Alberts B. The cell as a collection of protein machines: preparing the next generation of molecular biologists. Cell. 1998;92:291–294. [PubMed]
55. Abbott A. The society of proteins. Nature. 2002;417:894–896. [PubMed]
56. Wiwatwattana N., Kumar A. Organelle DB: a cross-species database of protein localization and function. Nucleic Acids Res. 2005;33:D598–D604. [PMC free article] [PubMed]
57. Kumar A., Agarwal S., Heyman J.A., Matson S., Heidtman M., Piccirillo S., Umansky L., Drawid A., Jansen R., Liu Y., et al. Subcellular localization of the yeast proteome. Genes Dev. 2002;16:707–719. [PMC free article] [PubMed]
58. Ross-Macdonald P., Coelho P.S., Roemer T., Agarwal S., Kumar A., Jansen R., Cheung K.H., Sheehan A., Symoniatis D., Umansky L., et al. Large-scale analysis of the yeast genome by transposon tagging and gene disruption. Nature. 1999;402:413–418. [PubMed]
59. Bader G.D., Hogue C.W. Analyzing yeast protein–protein interaction data obtained from different sources. Nat. Biotechnol. 2002;20:991–997. [PubMed]
60. Bader J.S., Chaudhuri A., Rothberg J.M., Chant J. Gaining confidence in high-throughput protein interaction networks. Nat. Biotechnol. 2004;22:78–85. [PubMed]
61. Mewes H.W., Amid C., Arnold R., Frishman D., Guldener U., Mannhaupt G., Munsterkotter M., Pagel P., Strack N., Stumpflen V., et al. MIPS: analysis and annotation of proteins from whole genomes. Nucleic Acids Res. 2004;32:D41–D44. [PMC free article] [PubMed]
62. de Lichtenberg U., Jensen L.J., Brunak S., Bork P. Dynamic complex formation during the yeast cell cycle. Science. 2005;307:724–727. [PubMed]
63. Patil A., Nakamura H. Filtering high-throughput protein–protein interaction data using a combination of genomic features. BMC Bioinformatics. 2005;6:100. [PMC free article] [PubMed]
64. King A.D., Przulj N., Jurisica I. Protein complex prediction via cost-based clustering. Bioinformatics. 2004;20:3013–3020. [PubMed]
65. Bu D., Zhao Y., Cai L., Xue H., Zhu X., Lu H., Zhang J., Sun S., Ling L., Zhang N., et al. Topological structure analysis of the protein–protein interaction network in budding yeast. Nucleic Acids Res. 2003;31:2443–2450. [PMC free article] [PubMed]
66. Robinson R.C., Turbedsky K., Kaiser D.A., Marchand J.B., Higgs H.N., Choe S., Pollard T.D. Crystal structure of Arp2/3 complex. Science. 2001;294:1679–1684. [PubMed]
67. Akashi A., Yoshida Y., Nakagoshi H., Kuroki K., Hashimoto T., Tagawa K., Imamoto F. Molecular cloning and expression of a gene for a factor which stabilizes formation of inhibitor-mitochondrial ATPase complex from Saccharomyces cerevisiae. J. Biochem. (Tokyo) 1988;104:526–530. [PubMed]
68. Marres C.A., de Vries S., Grivell L.A. Isolation and inactivation of the nuclear gene encoding the rotenone-insensitive internal NADH: ubiquinone oxidoreductase of mitochondria from Saccharomyces cerevisiae. Eur. J. Biochem. 1991;195:857–862. [PubMed]
69. Ahmad Z., Sherman F. Role of Arg-166 in yeast cytochrome C1. J. Biol. Chem. 2001;276:18450–18456. [PubMed]
70. Knop M., Schiebel E. Receptors determine the cellular localization of a gamma-tubulin complex and thereby the site of microtubule formation. EMBO J. 1998;17:3952–3967. [PMC free article] [PubMed]
71. Kaiser P., Sia R.A., Bardes E.G., Lew D.J., Reed S.I. Cdc34 and the F-box protein Met30 are required for degradation of the Cdk-inhibitory kinase Swe1. Genes Dev. 1998;12:2587–2597. [PMC free article] [PubMed]
72. Krogan N.J., Kim M., Ahn S.H., Zhong G., Kobor M.S., Cagney G., Emili A., Shilatifard A., Buratowski S., Greenblatt J.F. RNA polymerase II elongation factors of Saccharomyces cerevisiae: a targeted proteomics approach. Mol. Cell. Biol. 2002;22:6979–6992. [PMC free article] [PubMed]
73. Church C., Chapon C., Poyton R.O. Cloning and characterization of PET100, a gene required for the assembly of yeast cytochrome c oxidase. J. Biol. Chem. 1996;271:18499–18507. [PubMed]
74. Church C., Poyton R.O. Neither respiration nor cytochrome c oxidase affects mitochondrial morphology in Saccharomyces cerevisiae. J. Exp. Biol. 1998;201:1729–1737. [PubMed]
75. Forsha D., Church C., Wazny P., Poyton R.O. Structure and function of Pet100p, a molecular chaperone required for the assembly of cytochrome c oxidase in Saccharomyces cerevisiae. Biochem. Soc. Trans. 2001;29:436–441. [PubMed]
76. Hell K., Tzagoloff A., Neupert W., Stuart R.A. Identification of Cox20p, a novel protein involved in the maturation and assembly of cytochrome oxidase subunit 2. J. Biol. Chem. 2000;275:4571–4578. [PubMed]
77. Erdemir T., Bilican B., Cagatay T., Goding C.R., Yavuzer U. Saccharomyces cerevisiae C1D is implicated in both non-homologous DNA end joining and homologous recombination. Mol. Microbiol. 2002;46:947–957. [PubMed]
78. Jansen R., Gerstein M. Analyzing protein function on a genomic scale: the importance of gold-standard positives and negatives for network prediction. Curr. Opin. Microbiol. 2004;7:535–545. [PubMed]
79. Ben-Hur A., Noble W.S. Kernel methods for predicting protein–protein interactions. Bioinformatics. 2005;21:i38–i46. [PubMed]
80. Ispolatov I., Yuryev A., Mazo I., Maslov S. Binding properties and evolution of homodimers in protein–protein interaction networks. Nucleic Acids Res. 2005;33:3629–3635. [PMC free article] [PubMed]
81. von Mering C., Zdobnov E.M., Tsoka S., Ciccarelli F.D., Pereira-Leal J.B., Ouzounis C.A., Bork P. Genome evolution reveals biochemical networks and functional modules. Proc. Natl Acad. Sci. USA. 2003;100:15428–15433. [PMC free article] [PubMed]

Articles from Nucleic Acids Research are provided here courtesy of Oxford University Press
PubReader format: click here to try

Formats:

Related citations in PubMed

See reviews...See all...

Cited by other articles in PMC

See all...

Links

Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...