• We are sorry, but NCBI web applications do not support your browser and may not function properly. More information
Logo of bmcsysbioBioMed Centralsearchsubmit a manuscriptregisterthis articleBMC Systems Biology
BMC Syst Biol. 2010; 4(Suppl 2): S6.
Published online Sep 13, 2010. doi:  10.1186/1752-0509-4-S2-S6
PMCID: PMC2982693

Semi-supervised drug-protein interaction prediction from heterogeneous biological spaces

Abstract

Background

Predicting drug-protein interactions from heterogeneous biological data sources is a key step for in silico drug discovery. The difficulty of this prediction task lies in the rarity of known drug-protein interactions and myriad unknown interactions to be predicted. To meet this challenge, a manifold regularization semi-supervised learning method is presented to tackle this issue by using labeled and unlabeled information which often generates better results than using the labeled data alone. Furthermore, our semi-supervised learning method integrates known drug-protein interaction network information as well as chemical structure and genomic sequence data.

Results

Using the proposed method, we predicted certain drug-protein interactions on the enzyme, ion channel, GPCRs, and nuclear receptor data sets. Some of them are confirmed by the latest publicly available drug targets databases such as KEGG.

Conclusions

We report encouraging results of using our method for drug-protein interaction network reconstruction which may shed light on the molecular interaction inference and new uses of marketed drugs.

Background

Developing a new drug is an expensive and time-consuming process that is subject to a variety of regulations such as drug toxicity monitoring and therapeutic efficacy. Meanwhile, there are thousands of FDA-approved drugs in the market and drugs in later phases of clinical trials. Finding the potential application in other therapeutic categories of those FDA-approved drugs by predicting their targets, known as drug repositioning, is an efficient and time-saving method in drug discovery [1]. Additionally, predicting interactions between drugs and target proteins can help decipher the underlying biological mechanisms. Therefore, there is a strong incentive to develop powerful statistical methods that are capable of detecting these potential drug-protein interactions effectively.

Various methods have been proposed to address the drug-target prediction problems in silico. One common method is to predict the drugs interacting with a single given protein based on the chemical structure similarity in a classic classification framework. Keiser et al. [2,3] proposed a method to predict targets of proteins based on the chemical similarity of their ligands. This kind of approach, however, does not take advantage of the information in the protein domain. Another widely-used method is molecular docking [4] which requires the non-trivial modeling of 3D structure of the target protein. Unfortunately the 3D structures of many proteins are not available [5], e.g., very few GPCRs have been crystallized.

Recently, some new approaches are proposed to perform drug-target prediction using both the chemical (drug chemical structure) and genomic (protein structure) spaces information [3,6,7]. In [6] the two spaces are encoded together by defining a pair wise kernel which is then fed to the support vector machine (SVM) for classification. The drawback of this kernel framework is that there will be a huge number of samples to be classified (i.e., number of drugs multiplies number of proteins) which poses significant computational complexity. Another problem is that the negative drug-protein pairs are selected randomly without experimental confirmation. Yamanishi et al. [7] developed a bipartite graph model where the chemical and genomic spaces as well as the drug-protein interaction network are integrated into a pharmacological space. In the bipartite model, the known interactions in the training data are labeled as +1 while all other unknown drug-protein pairs in the training data are assumed as non-interactions with label 0. Then three different classifiers are available: new drug candidate versus known target protein, known drugs versus new target protein and new drug candidate versus new target protein candidate. More recently, Bleakley and Yamanishi [8] proposed a state-of-the-art bipartite local model (BLM) by transforming edge-prediction problems into well-known binary classification problems. Nevertheless, the first flaw of the bipartite model, like the kernel SVM method [6], is that the unknown interactions of the drugs and proteins in the training data are all assumed non-interaction and cannot be inferred. We also prefer only one classifier to predict whether one drug-protein pair interacts or not. Lastly, all the methods did not utilize a wealth of unlabeled information to assist prediction.

In this paper, a semi-supervised learning method - Laplacian regularized least square (LapRLS) [9] is employed to utilize both the small amount of available labeled data and the abundant unlabeled data together in order to give the maximum generalization ability from the chemical and genomic spaces.

Further, the standard LapRLS is improved by incorporating a new kernel established from the known drug-protein interaction network (NetLapRLS). In our framework, the known interactions are labeled as +1 and all other unknown pairs are labeled as 0 to indicate they are going to be predicted. Two classifiers are trained on the drug and protein domains respectively and then are combined together to give the final prediction. Compared with a naive weighted profiled method, the proposed drug-protein interaction methods based on LapRLS and NetLapRLS obtain better results than using the labeled data alone. And the proposed NetLapRLS which incorporates drug-protein network information provides superior performance than standard LapRLS.

Results and discussion

Cross validation results analysis

The weighted profile method, standard LapRLS and NetLapRLS were evaluated on the four classes of target proteins including enzymes, ion channels, GPCRs and nuclear receptors. We carried out a ten-fold cross-validation by splitting the golden standard interaction dataset into 10 subsets. Each fold was then taken in turn as a test set and the remaining nine folds are used as training set. For example, there are 54 drugs and 26 proteins in the nuclear receptor data set with 90 known interactions. In each cross-validation, the 80 drug-protein pairs are used as the training data while the remaining 1,324 drug-protein pairs including the 10 positive interactions are designated as the testing data set. Thus the training sample is very small compared with the testing data set. This motivates us to employ the semi-supervised method that can utilize the information from the unlabeled samples to predict drug-protein interaction. The performance is evaluated using receiver operating curve (ROC) analysis [10]. For simplicity, we set βd = βp = 0.3, γd1 = γp1 = 1, and γd2 = γp2 = 0.01 for NetLapRLS. These parameters can be better selected by a further cross validation. If γd2 and γp2 are set to be 0, NetLapRLS becomes the standard LapRLS method. Table Table11 shows the AUC (area under the ROC curve), sensitivity and specificity. The sensitivity and specificity are defined as TP/(TP+FN) and TN/(TN+FP), respectively. The cutoff for calculation of sensitivity and specificity is set to select the top pairs with the same number of the test set.

Table 1
Statistics of the prediction performance

From Table Table11 and Figure Figure1,1, we can see that LapRLS and NetLapRLS methods, which use unlabeled information, provided better performance with respect to AUC score and sensitivity. Among the four data sets, the two semi-supervised learning methods provided the highest sensitivity scores in enzyme data set because there are most known interactions. The known interaction number is a key factor of our semi-supervised methods since the testing data set is much larger than training data set in our cross-validation setup. The proposed NetLapRLS which incorporates the drug-protein interaction network information obtained better result than the standard LapRLS, especially with respective to the sensitivity which is dramatically improved. On the four data sets, the sensitivity from NetLapRLS performed better than LapRLS by 42%, 100%, 108% and 31% respectively and, demonstrated the importance of network information. The improvement in sensitivity of NetLapRLS over LapRLS is most significant in ion channel data set because the inner-connection in the ion channel drug-protein interaction network is most complete according to the proportion of unreachable paths between drugs and proteins [7]. Yildirim et al. [11] concluded that there are an overabundance of 'follow-on' drugs from the topological analyses of current drug-protein network, that is, drugs that target already known proteins, i.e., me-too drugs. With the drug-protein network being completed fastly by high-throughput experimental and computational approaches, this network information is becoming critical in drug discovery.

Figure 1
ROC curves of cross validation. ROC curves of the semi-supervised learning and the combining weighted profile methods for four classes target proteins: enzymes, ion channels, GPCRs and nuclear receptors.

Comparison with bipartite local model [8]

Recently, Bleakley and Yamanishi [8] extended Yamanishi's bipartite method [12] to bipartite local model which is considered as state-of-the-art. The predictions from the drug domain and protein domain using SVM are combined together to form a final prediction by a maximum operation. We also employed this kind of integration by a mean operation. However, we used a semi-supervised learning method to handle the classification with small samples labeled which is difficult for traditional supervised classifiers. For instance, in the above cross-validation experiment of the nuclear receptor data set, the semi-supervised classifier is trained on 80 positive samples in order to make predictions on 1,324 unlabeled samples. In the BLM, the ten-fold cross-validation is performed on the drug and protein domains separately. The known interactions between the selected drugs and proteins are labeled as interaction while interactions between the drugs and proteins for training are regarded as non-interaction. Though we consider the undetermined relationship between drug-protein pair should not be labeled as non-interaction, we adopt the cross validation method in the BLM for the sake of comparison in the same condition. The comparison is performed in terms of AUC, area under precision-recall (AUPR), sensitivity, specificity and PPV as shown in Table Table2.2. Sensitivity, specificity and PPV are calculated when the top one percentile in the prediction score is chosen as a cutoff because high-confidence prediction results are more useful in practical applications. We observed that BLM method outperformed our NetLapRLS in AUC and AUPR scores, but the performances of our NetLapRLS are comparable with BLM in sensitivity, specificity and PPV.

Table 2
Results of BLM and NetLapRLS based on cross validation experiments 5 times

Semi-supervised learning method is superior to the traditional supervised learning method when labeled samples are small along with large unlabeled samples available. In this cross validation setup, the unknown interactions are labeled as non-interaction in the training data set. So our NetLapRLS did not get good results in AUC and AUPR scores compared with BLM because most of samples are labeled. However, NetLapRLS still gave good prediction results in sensitivity, specificity and PPV. This indicated that NetLapRLS can provide a list of drug-protein interaction candidates with high confidence.

Enzyme

Table Table33 shows the list of the top 5 predicted drug-protein pairs, with annotation given in the KEGG database [13]. Searching the latest version of KEGG drug database and Drugbank [14], we found that the fifth highest scored drug-protein pair (D00097 and hsa5743) in Table Table33 is annotated as an interaction. Figure Figure22 shows the predicted top 50 scoring drug-protein interaction network on the enzyme data using the all known interactions as the training data set.

Figure 2
Predicted enzyme interaction network. Diamonds and circles represent drugs and target proteins, respectively. Blue and red lines indicate known interactions and newly predicted interactions with 50 highest scores, respectively.
Table 3
Top 5 scoring predicted drug-protein interactions for the enzyme data set

Ion channel

Table Table44 shows the list of the top five predicted drug-protein pairs on the ion channel data set, with annotation given in the KEGG database [13]. In the latest version of KEGG drug database, the targets of drug D00477 (rank 2 in table table4)4) include SCN1A, SCN2A, SCN3A, SCN4A, SCN5A, SCN8A and SCN9A. The targets of drug D00552 are SCN10A, SCN1A, SCN2A, SCN3A, SCN4A, SCN5A, SCN8A and SCN9A. Thus, our predicted target of D00552 is confirmed (rank 3 in Table Table4).4). The targets of drugs D00477 and D00552 are very similar which can be explained by their common chemical structures in Figure Figure3.3. Based on the chemical structure similarity, we predict that SCN10A is also a target of drug D00477 (rank 2 in table table4),4), as the interaction between SCN10A and D00552 is known. Rank 5 in table table44 predicts GABAR2 is one of the targets of drug D00546. This prediction is reasonable because in Drugbank D00546 is annotated to interact with GABAR1 which is very similar with GABAR2 in sequence and function. Figure Figure44 shows the predicted top 50 scoring drug-protein interaction network on the ion channel data set using the all known interactions as training data set.

Figure 4
Predicted ion channel interaction network. Predicted ion channel interaction network. diamonds and circles represent drugs and target proteins, respectively. Bule and red lines indicate known interactions and newly predicted interactions with 50 highest ...
Table 4
Top 5 scoring predicted drug-protein interactions for the ion channel data set

GPCRs

Table Table55 shows the list of the top five predicted drug-protein pairs on GPCRs data set, with annotation given in KEGG database. Based on the most recent KEGG database, the predictions of rank 2 and 3 in Table Table55 are confirmed. Additionally, six predicted new targets (hsa146, hsa147, hsa150, hsa151, hsa152 and hsa155) of drug adrenaline (D00095) from the newly predicted interactions with 50 highest scores are also annotated as an interaction in the latest KEGG drug database. Ranks 4 and 5 in Table Table55 predict both D02345 and D00283 target protein DRD3. In Drugbank, D02345 and D00283 are annotated to interact with protein DRD1, DRD2, and DRD4. Because DRD3 is very similar with those proteins in function, our method predicts DRD3 is also the target of drugs D02345 and D00283. This result demonstrated our method employed the information from protein domain. Figure Figure55 shows the predicted top 50 scoring drug-protein interaction network on the GPCRs data set using the all known interactions as the training data set.

Figure 5
Predicted GPCRs interaction network. Predicted GPCRs interaction network. diamonds and circles represent drugs and target proteins, respectively. Blue and red lines indicate known interactions and newly predicted interactions with 50 highest scores, respectively. ...
Table 5
Top 5 scoring predicted drug-protein interactions for the GPCRs data set

Nuclear receptor

Table Table66 shows the list of the top 5 predicted drug-protein pairs on nuclear receptor data set, among which four predictions are about drug D00348. In Drugbank, drug D00348 is annotated to interact with protein (retinoic acid receptor, alpha). The two predicted targets with the highest scores (hsa5915 and hsa5916) of drug D00348 are both from retinoic acid receptor class. Those proteins are probably as the targets of the same protein due to their similarity in sequence and function. Figure Figure66 shows the predicted top 50 scoring drug-protein interaction network on the nuclear receptor data set with the all known interactions as the training data set.

Figure 6
Predicted nuclear receptor interaction network. Predicted nuclear receptor interaction network. diamonds and circles represent drugs and target proteins, respectively. Blue and red lines indicate known interactions and newly predicted interactions with ...
Table 6
Top 5 scoring predicted drug-protein interactions for the nuclear receptor data set

Conclusions

In this work, we presented a semi-supervised learning method NetLapRLS for drug-protein interaction prediction by integrating information from chemical space, genomic space and drug-protein interaction network space. Our method has no use of the negative samples and predicts the interaction of each drug-protein pair. The results we obtained when predicting human drug-target interaction networks involving enzymes, ion channels, GPCRs, and nuclear receptors demonstrated the superior performance of NetLapRLS. Furthermore, recently added drug-target interactions to the KEGG immediately allowed us to confirm some strongly-predicted drug-target interactions on the four data sets obtained using our method. This enhances the strength of our proposed method for realistic drug-target prediction application.

The ideal way to use semi-supervised learning for predicting compound-protein interactions is to incorporate information from different biological spaces by a multi-task kernel and is fed to classical semi-supervised learning. However, the implementation of such a large scale semi-supervised learning method will be computationaly costly. Our future work, will incorporate more sophisticated and biologically relevant information into the kernel similarity, such as side effect [15], to improve the prediction accuracy.

Methods

Semi-supervised learning (SSL) has been attracting much research attention in the machine learning community [16]. SSL provides better prediction accuracy by using unlabeled information. Here we employ a data-dependent manifold regularization framework which uses the geometry of the probability distribution [9]. One of the implementations of this framework is the Laplacian regularized least squares (LapRLS) which is simple and has comparable performance with Laplacian regularized support vector machine.

Consider the drug dataset D={d1,,dnd} and the target protein dataset ={p1,,pnp} where nd and np are the numbers of the drugs and proteins in the study respectively. An interaction pattern of drug di and target protein pj is represented by a binary label matrix Ynd×np. If drug di is known to interact with target protein pj, Yij = 1 otherwise Yij = 0. Given the 'gold standard' drug-target interactions, the goal is to infer their unknown interactions. Two classifiers will be trained using LapRLS on the chemical and genomic spaces separately, followed by a combination of the two classifiers. A supervised learning method is suitable in this case. However the known interactions from public databases are still extremely small compared to the whole drug-target interaction space. Another issue is that we only have the information of the interactions, but do not know which drug target pair has no interaction, i.e., no negative samples in the training process. Herein we first test a simple supervised weighted profile method. Then the standard LapRLS and drug-protein interaction network incorporated NetLapRLS are extended to predict the drug-protein interaction.

Materials

The data used here is downloaded from (http://web.kuicr.kyoto-u.ac.jp/supp/yoshi/drugtarget/) [7]. Here below we provide a brief description.

• Chemical data

The chemical structure similarity between compounds are calculated by SIMCOMP [17] using chemical structures fetched from KEGG LIGAND database. SIMCOMP provides a global similarity score by the ratio between the size of common substructures and the size of the union structures of two compounds. Applying this operation to all compounds pairs, we constructed a similarity matrix denoted Ynd×np which represents the chemical space information.

• Genomic data

A normalized Smith-Waterman score is calculated to indicate the similarity between two amino acid sequences of target proteins which were obtained from the KEGG GENES database. All protein pairs similarities are computed to construct a similarity matrix denoted Spnp×np which represents the genomic space.

• Drug-protein interaction data

At the time of the paper [7] was written, Yamanishi et al. [7] found 445, 210, 223, and 54 drugs targeting 664 enzymes, 204 iron channels, 95 GPCRs, and 26 nuclear receptors, receptively, and the known interactions are 2926, 1476, 635 and 90.

Combining weighted profiles

The method of combining weighted profiles follows the idea that the label of the new sample is determined by its similarity with the training samples. For a drug di, its interaction f(di, pj) with a protein pj in P is predicted with the following formulation:

f(di,pj)=1Ndik=1ndsd(di,dk)Ykj
(1)

where sd(di, dk) is a chemical structure similarity score from Sd and Ndi is a normalization term defined as Nd=k=1ndsd(di,dk). Meanwhile, for a protein pj , its interaction f(pj , di) with a drug di can also be calculated in the genomic space by:

f(pj,di)=1Npjk=1npsp(pj,pk)Yik
(2)

where sp(pj , pk) is a genomic sequence similarity score from Sp and Npi is a normalization term defined by Npj=k=1npsp(pj,pk) Note that Equations (1) and (2) are estimating the interaction of the same drug-protein pair (di ~ pj) from different data sources. The two predictions should be combined to give the final prediction by

f¯(di,pj)=f(di,pj)+f(pj,di)2.
(3)

The drug-protein pairs (di, pj) in f¯(di, pj) with high scores are predicted to interact each other. The original weighted profile method is used in [7]. However their predictions in the two spaces are not fused. Figure Figure77 shows the method of combining weighted profiles provides better prediction than methods using the single space on the four data sets.

Figure 7
The ROC curves of combining weighted profile. The ROC curves of combining weighted profile, weighted profile from chemical and genomic spaces on GPCR data.

LapRLS and NetLapRLS for drug-protein interaction prediction

In LapRLS and NetLapRLS, the data-dependent regularization terms are normalized Laplacian operation on graphs. Herein two undirected graphs of drug domain and protein domain including both labeled and unlabeled samples are represented by Gd={Vd,d} and Gp={Vp,p}, where the set of nodes or vertices is Vd={di},Vp={pi} and the set of edges is x2130d = {edmn}, x2130p = {edmn} respectively. Each drug di or protein pj is treated as the node on the graph and the weight of edge edmn{epmn} is wdmn(wpmn).

Typically, the weight measures the similarity between two nodes. In our case, the drug domain similarity Wd = {wdmn} is obtained by combining the chemical similarity Sd and drug-target interaction network. The protein domain similarity Wp = {wpmn} is derived by combining the genomic similarity Sp and drug-protein interaction network spaces. The chemical similarity Sd and genomic similarity Sp have already been introduced in Section Materials.

Next we need to extract the information from the drug-protein interaction network space. The underlying assumption made here is that if two drugs share more target proteins, they are more similar. For example, in Figure Figure8,8, the blue line means the known drug-protein interaction while the red line represents the interaction to be predicted. So drug D2 shares 3 same proteins with drug D1 while drug D3 shares a common protein with drug D1. Drug D1 interacts with Protein P4. Based on the assumption here, we can infer that it is more probable that drug D2 interacts with protein P4 than drug D3 does. So another similarity matrix for drug domain from drug-protein interaction network Kdnd×nd can be established whose each entry is the number of proteins shared by drug di and dj. Similarly, we can also derive the network similarity matrix Kpnp×np whose each entry is the number of drugs shared by protein pj and pi. Though drug-protein interaction network was also used in [7], our method employs a different way to extract information from the network. The shortest path concept is used in [7] while we utilize the number of common nodes shared by two proteins(drugs) to indicate a new similarity measurement.

Figure 8
The example of drug-protein interaction network. The example of drug-protein interaction network.

Now the drug domain similarity Wd can be derived from the chemical similarity and drug-protein network similarity by linear combination Wd=γd1Sd+γd2Kdγd1+γd2. Similarly, the protein domain similarity Wp can be obtained by Wp=γp1Sp+γp2Kpγp1+γp2. Compared with the standard LapRLS, our NetLapRLS incorporates drug-protein network information into the prediction model. In the following paragraph, we just describe the method NetLapRLS from which the standard LapRLS can be deduced by setting γMd2 = γp2 = 0.

Given the similarity matrices of drug domain and protein domain, we first perform Laplacian operation on the two graphs which is required by our semi-supervised learning method. The node degree matrices Dd and Dp are two diagonal matrices with their (k, k)-element defined as Dd(k,k)=m=1ndwdk,m and Dp(k,k)=m=1npwpk,m. The Laplacian operation of the two graphs is defined as Δd = Dd - Wd and Δp = Dp - Wp respectively. The normalized graph Laplacians are Ld=Dd1/2ΔdDd1/2=Ind×ndDd1/2WdDd1/2andLp=Dp1/2ΔpDp1/2=Inp×npDp1/2WpDp1/2 respectively.

NetLapRLS defines a continuous classification function F that is estimated on the graph to minimize a cost function. The cost function typically enforces a trade-o between the smoothness of the function on the graph of both labeled and unlabeled data and the accuracy of the function at fitting the label information for the labeled nodes. Herein we extend NetLapRLS to the matrix form. The two continuous classification functions are defined by Fdnd×np and Fpnp×nd. Let's first address the prediction Fd on the drug domain. The cost function of NetLapRLS is defined as follows

Fd*=minFdJ(Fd)=||YFd||2+βdTrace(FdTLdFd)
(4)

where · is Frobenius norm and Trace is the trace of a matrix. Representer theorem [18] shows that the solution is a linear combination

Fd=Wdαd

Substituting this form into equation (4), we arrive at a convex differentiable objective function with respect to variable αdnd×np

αd*=argminαdnd×np{||YWdαd||2+βdTrace(αdTWdLdWdαd)}
(5)

The derivative of the objective function vanishes at the minimizer:

Wd(YWdαd)+βdWdLdKdαd=0
(6)

which leads to the following solution:

αd=(Wd+βdLdWd)1Y
(7)

Then we get the prediction from the drug domain in the following form:

Fd=Wd(Wd+βdLdWd)1Y
(8)

Similarly, we can also derive the prediction in the protein domain by

Fp=Wp(Wp+βpLpWp)1YT
(9)

In the end, the predictions from drug and protein domains are combined into

F=Fd+(Fp)T2
(10)

Competing interests

The authors declare that they have no competing interests.

Authors' contributions

Zheng Xia and Ling-Yun Wu co-developed the method, implemented the code and drafted the manuscript. Xiaobo Zhou and Stephen T.C. Wong supervised this project and gave critical revision of the manuscript. Stephen Wong provided the financial support for the work from his bioinformatics and bioengineering program grant from The Methodist Hospital Research Institute (TMHRI), Houston, Texas. All authors read and approved the manuscript.

Acknowledgements

We thank the colleagues of the Bioinformatics and bioengineering programmatic core, TMHRI for their support and discussion, and Dr.Yamanishi et al. for making their data publicly available. Ling-Yun Wu is supported by National Natural Science Foundation of China under grant number 60970091.

This article has been published as part of BMC Systems Biology Volume 4 Supplement 2, 2010: Selected articles from the Third International Symposium on Optimization and Systems Biology. The full contents of the supplement are available online at http://www.biomedcentral.com/1752-0509/4?issue=S2

References

  • Yao L, Rzhetsky A. Quantitative systems-level determinants of human genes targeted by successful drugs. Genome Res. 2008;18(2):206–13. doi: 10.1101/gr.6888208. [PMC free article] [PubMed] [Cross Ref]
  • Keiser MJ, Roth BL, Armbruster BN, Ernsberger P, Irwin JJ, Shoichet BK. Relating protein pharmacology by ligand chemistry. Nat Biotechnol. 2007;25(2):197–206. doi: 10.1038/nbt1284. [PubMed] [Cross Ref]
  • Keiser MJ, Setola V, Irwin JJ, Laggner C, Abbas AI, Hufeisen SJ, Jensen NH, Kuijer MB, Matos RC, Tran TB, Whaley R, Glennon RA, Hert J, Thomas KLH, Edwards DD, Shoichet BK, Roth BL. Predicting new molecular targets for known drugs. Nature. 2009;462(7270):175–181. doi: 10.1038/nature08506. [PMC free article] [PubMed] [Cross Ref]
  • Shoichet BK, McGovern SL, Wei B, Irwin JJ. Lead discovery using molecular docking. Curr Opin Chem Biol. 2002;6(4):439–46. doi: 10.1016/S1367-5931(02)00339-3. [PubMed] [Cross Ref]
  • Ballesteros J, Palczewski K. G protein-coupled receptor drug discovery: implications from the crystal structure of rhodopsin. Curr Opin Drug Discov Devel. 2001;4(5):561–74. [PMC free article] [PubMed]
  • Jacob L, Vert JP. Protein-ligand interaction prediction: an improved chemogenomics approach. Bioinformatics. 2008;24(19):2149–56. doi: 10.1093/bioinformatics/btn409. [PMC free article] [PubMed] [Cross Ref]
  • Yamanishi Y, Araki M, Gutteridge A, Honda W, Kanehisa M. Prediction of drug-target interaction networks from the integration of chemical and genomic spaces. Bioinformatics. 2008;24(13):i232–40. doi: 10.1093/bioinformatics/btn162. [PMC free article] [PubMed] [Cross Ref]
  • Bleakley K, Yamanishi Y. Supervised prediction of drug-target interactions using bipartite local models. Bioinformatics. 2009;25(18):2397. doi: 10.1093/bioinformatics/btp433. [PMC free article] [PubMed] [Cross Ref]
  • Belkin M, Niyogi P, Sindhwani V. Manifold regularization: A geometric framework for learning from labeled and unlabeled examples. Journal of Machine Learning Research. 2006;7:2399–2434.
  • Sing T, Sander O, Beerenwinkel N, Lengauer T. ROCR: visualizing classifier performance in R. Bioinformatics. 2005;21(20):3940–1. doi: 10.1093/bioinformatics/bti623. [PubMed] [Cross Ref]
  • Yildirim MA, Goh KI, Cusick ME, Barabsi AL, Vidal M. Drug-target network. Nat Biotechnol. 2007;25(10):1119–1126. doi: 10.1038/nbt1338. http://dx.doi.org/10.1038/nbt1338 [PubMed] [Cross Ref]
  • Yamanishi Y. Supervised bipartite graph inference. Proceedings of the Conference on Advances in Neural Information and Processing System. pp. 1433–1440.
  • Kanehisa M, Goto S, Hattori M, Aoki-Kinoshita KF, Itoh M, Kawashima S, Katayama T, Araki M, Hirakawa M. From genomics to chemical genomics: new developments in KEGG. Nucleic Acids Res. 2006. pp. D354–7. [PMC free article] [PubMed] [Cross Ref]
  • Wishart D, Knox C, Guo A, Shrivastava S, Hassanali M, Stothard P, Chang Z, Woolsey J. DrugBank: a comprehensive resource for in silico drug discovery and exploration. Nucleic Acids Research. 2006. p. D668. [PMC free article] [PubMed] [Cross Ref]
  • Campillos M, Kuhn M, Gavin AC, Jensen LJ, Bork P. Drug target identification using side-effect similarity. Science. 2008;321(5886):263–6. doi: 10.1126/science.1158140. [PubMed] [Cross Ref]
  • Chapelle O, Schölkopf B, Zien A. Semi-supervised learning. MIT press; 2006.
  • Hattori M, Okuno Y, Goto S, Kanehisa M. Development of a chemical structure comparison method for integrated analysis of chemical and genomic information in the metabolic pathways. J Am Chem Soc. 2003;125(39):11853–65. doi: 10.1021/ja036030u. [PubMed] [Cross Ref]
  • Schölkopf B, Smola AJ. Learning with kernels. MIT press Cambridge, Mass; 2002.

Articles from BMC Systems Biology are provided here courtesy of BioMed Central

Formats:

Related citations in PubMed

See reviews...See all...

Cited by other articles in PMC

See all...

Links

  • Gene (nucleotide)
    Gene (nucleotide)
    Records in Gene identified from shared sequence links
  • MedGen
    MedGen
    Related information in MedGen
  • Nucleotide
    Nucleotide
    Published Nucleotide sequences
  • PubMed
    PubMed
    PubMed citations for these articles

Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...