Logo of plosonePLoS OneView this ArticleSubmit to PLoSGet E-mail AlertsContact UsPublic Library of Science (PLoS)
PLoS One. 2010; 5(6): e10972.
Published online Jun 4, 2010. doi:  10.1371/journal.pone.0010972
PMCID: PMC2881046

Analysis and Prediction of the Metabolic Stability of Proteins Based on Their Sequential Features, Subcellular Locations and Interaction Networks

Andreas Hofmann, Editor

Abstract

The metabolic stability is a very important idiosyncracy of proteins that is related to their global flexibility, intramolecular fluctuations, various internal dynamic processes, as well as many marvelous biological functions. Determination of protein's metabolic stability would provide us with useful information for in-depth understanding of the dynamic action mechanisms of proteins. Although several experimental methods have been developed to measure protein's metabolic stability, they are time-consuming and more expensive. Reported in this paper is a computational method, which is featured by (1) integrating various properties of proteins, such as biochemical and physicochemical properties, subcellular locations, network properties and protein complex property, (2) using the mRMR (Maximum Relevance & Minimum Redundancy) principle and the IFS (Incremental Feature Selection) procedure to optimize the prediction engine, and (3) being able to identify proteins among the four types: “short”, “medium”, “long”, and “extra-long” half-life spans. It was revealed through our analysis that the following seven characters played major roles in determining the stability of proteins: (1) KEGG enrichment scores of the protein and its neighbors in network, (2) subcellular locations, (3) polarity, (4) amino acids composition, (5) hydrophobicity, (6) secondary structure propensity, and (7) the number of protein complexes the protein involved. It was observed that there was an intriguing correlation between the predicted metabolic stability of some proteins and the real half-life of the drugs designed to target them. These findings might provide useful insights for designing protein-stability-relevant drugs. The computational method can also be used as a large-scale tool for annotating the metabolic stability for the avalanche of protein sequences generated in the post-genomic age.

Introduction

Proteins are inherently dynamic molecules of marginal stability. Many marvelous biological functions of proteins are realized through their internal motions [1], [2], [3], [4]. The physicochemical stability and flexibility are balanced with each other. They are also thought as intimately correlated with their intramolecular fluctuations and various other dynamic processes [5]. Protein flexibility facilitates adaptation and recognition [6] in diverse molecular events, such as switch between active and inactive states [7], allosteric transition [8], intercalation of drugs into DNA [9], cooperative effects [10], and assembly of microtubules [11]. It is also essential for in-depth understanding the M2 proton channel gating and inhibition mechanism [3], [12], [13], [14], the switch mechanism of human Rab5a [15], the inhibition mechanism of PTP1B [16], the metabolic mechanism [17], and the action mechanism of calmodulin [18], [19]. These properties present unique challenges to the pharmaceutical scientists attempting to develop protein-stability-relevant drugs [20], [21], [22].

Traditional methods of measuring protein's metabolic stability rely on either pulse-chase metabolic labeling or administration of protein synthesis inhibitors followed by half-life biochemical analysis of the abundance of the protein concerned at multiple time points during the chase period. Highly regulated proteins tend to be present in low amounts. Since even mass spectrometry plus failed to detect low-abundance proteins, study about protein's metabolic stability remains far from complete yet although it is of critical importance for drug development. Recently, it was reported that high-throughput systematic approaches for the analysis of global metabolic stability were taken by using a fluorescence-based system to monitor metabolic stability at the single-cell level [23]. In this regard, however, computational approaches would be much more efficient not only in timely providing the information about the stability of query proteins but also in helping analyze what factors play major roles to the metabolic stability. This study was initiated in an attempt to develop a computational method for investigating the metabolic stability of proteins in terms of their biochemical and physicochemical properties or features. Our results suggest that KEGG enrichment scores, subcellular locations, polarity, amino acids composition, hydrophobicity, secondary structure propensity, and number of protein complexes, play irreplaceable roles for protein's metabolic stability. Moreover, we predicted the metabolic stability of drug target proteins using the selected features and found an intriguing correlation between the predicted metabolic stability of some proteins and the real half-life of the drugs designed to target them.

Materials and Methods

Data set

As elucidated in a recent review [24], to develop an effective statistical method for predicting protein attributes, one of the indispensable things is a valid benchmark dataset. Here, protein stability data were taken from Yen's work [23]. We downloaded ORFs from hORFeome v5.1 library (http://horfdb.dfci.harvard.edu/), and translated ORFs to protein sequences using transeq in Emboss [25]. Proteins with the length shorter than 50 and longer than 2700, were excluded. In Yen's work, protein samples were divided into four groups according to their PSI (protein stability index): (1) short half-life (PSI<2), (2) medium half-life (2≤PSI<3), (3) long half-life (3≤PSI<4), and (4) extra-long half-life (PSI≥4). After being thus processed, our dataset consist of 223 short half-life proteins, 446 medium half-life proteins, 706 long half-life proteins and 496 extra-long half-life proteins. For reader's convenience, these sequences (classified into above four groups) are given in Dataset S1.

Biochemical and physicochemical description of proteins

In order to formulate protein samples of different sequence lengths with vectors of a uniform dimension, let us adopt the concept of pseudo amino acid composition (PseAAC) [24], [26], [27]. The concrete procedures are that the biochemical and physicochemical properties [28], [29], [30], [31] are singled out from a protein sequence according to the following seven aspects: (1) amino acid composition (AAC) [32], (2) secondary structure propensity, (3) hydrophobicity, (4) polarizability, (5) solvent accessibility, (6) normalized van der Waals volume, and (7) polarity [33].

Of the above seven types of properties, except for AAC (the occurrence frequencies of the 20 native amino acids for a given protein [34]) that is an extensive quantity reflecting the global or overall feature of a protein, all the other six types are associated with a single amino acid in a given protein sequence position and hence belong to a localized quantity.

The six local types of properties can each be classified into two or three categories. For example, for the secondary structure propensity, each amino acid can be classified as: helix, strand or coil, as predicted by SSpro [35]. For solvent accessibility: buried or exposed to solvent, as predicted by ACCpro [36]. For the other four types of properties, i.e., hydrophobicity, polarizability, normalized van der Waals volume and polarity, each of the constituent amino acids can also be classified into three categories in a similar way according to their values. In terms of hydrophobicity, there are three groups of amino acid: polar (R, K, E, D, Q, N), neutral (G, A, S, T, P, H, Y) and hydrophobic (C, V, L, I, M, F, W) [37]. In terms of polarizability, there are three groups of amino acid: 0–0.108 (G, A, S, D, T), 0.128–0.186 (C, P, N, V, E, Q, I, L) and 0.219–0.409 (K, M, H, F, R, Y, W) [38]. In terms of normalized van der Waals volume, there are three groups of amino acid: 0–2.78 (G, A, S, C, T, P, D), 2.95–4.0 (N, V, E, Q, I, L) and 4.43–8.08 (M, H, K, F, R, Y, W) [38]. In terms of polarity, there are three groups of amino acid: 4.9–6.2 (L, I, F, W, C, M, V, Y), 8.0–9.2 (P, A, T, G, S) and 10.4–13.0 (H, Q, R, K, N, E, D) [39].

Now, the problem is how to generate the corresponding global quantity by integrating the localized quantities over an entire protein sequence. To realize this, let us consider the hydrophobicity first. In this study, the hydrophobicity of an amino acid is classified as: P (polar), N (neutral), or H (hydrophobic). Thus, for a protein sequence, say, “MSDKPDMAEIEKFSKETIEQEKQAGESTQEKNPLPMLLPATDKSKLKKTE”, it can be coded as “HNPPNPHNPHPPHNPPNHPPPPPNNPNNPPPPNHNHHHNNNPPNPHPPNP”.

For the above coded sequence, the following three extensive quantities can be derived: An external file that holds a picture, illustration, etc.
Object name is pone.0010972.e001.jpg (composition), An external file that holds a picture, illustration, etc.
Object name is pone.0010972.e002.jpg (transition), and An external file that holds a picture, illustration, etc.
Object name is pone.0010972.e003.jpg (distribution). An external file that holds a picture, illustration, etc.
Object name is pone.0010972.e004.jpg refers to the global percent composition of each of the three groups (i.e., P, N, and H) in the coded sequence; An external file that holds a picture, illustration, etc.
Object name is pone.0010972.e005.jpg to the percent frequencies with which the code letter changes to another along the entire length of the coded sequence; and An external file that holds a picture, illustration, etc.
Object name is pone.0010972.e006.jpg to the distribution pattern of the code letters along the sequence, measuring the percentage of the sequence length within which the first, 25%, 50%, 75%, and 100% of each of the three code letters is located.

Take the above coded sequence of 50 letters as an example. It is composed of 10 Hs, 16 Ns and 24 Ps, as shown in Figure 1. Thus, we have the composition An external file that holds a picture, illustration, etc.
Object name is pone.0010972.e007.jpg = (10/50 = 20.0%, 16/50 = 32%, 24/50 = 48%) for H, N and P respectively. For the transition feature An external file that holds a picture, illustration, etc.
Object name is pone.0010972.e008.jpg, there are totally 31 transitions in the sequence, with 8 between H and N, 16 between N and P, and 7 between H and P, so that we have An external file that holds a picture, illustration, etc.
Object name is pone.0010972.e009.jpg = (8/31 = 25.81%, 16/31 = 51.61% and 7/31 = 22.58%). As for the distribution An external file that holds a picture, illustration, etc.
Object name is pone.0010972.e010.jpg, the first, 25%, 50%, 75% and 100% of H are located at the positions of 1st, 10th, 18th, 37th, and 46th in the coded sequence, respectively. Thus, the distribution An external file that holds a picture, illustration, etc.
Object name is pone.0010972.e011.jpg for H is 1/50 = 2%, 10/50 = 20%, 18/50 = 36%, 37/50 = 74%, and 46/50 = 92%. Likewise, the distribution An external file that holds a picture, illustration, etc.
Object name is pone.0010972.e012.jpg for N is 4%, 28%, 54%, 78%, and 98%; and that for P is 6%, 24%, 44%, 64%, and 100%. Accordingly, we have An external file that holds a picture, illustration, etc.
Object name is pone.0010972.e013.jpg = (2%, 20%, 36%, 74%, 92%, 4%, 28%, 54%, 78%, 98%, 6%, 24%, 44%, 64%, and 100%). Combining An external file that holds a picture, illustration, etc.
Object name is pone.0010972.e014.jpg, An external file that holds a picture, illustration, etc.
Object name is pone.0010972.e015.jpg and An external file that holds a picture, illustration, etc.
Object name is pone.0010972.e016.jpg, we have a total of 21 elements.

Figure 1
How to compute the 21 hydrophobic feature components from protein sequence.

For the “secondary structure”, “polarizability”, “normalized van der Waals volume” and “polarity”, each of them is also classified into three categories and hence would also generate 21 elements in a similarly way as described above for the case of “hydrophobicity”.

For the “solvent accessibility”, since it is classified into two categories, the combination of An external file that holds a picture, illustration, etc.
Object name is pone.0010972.e017.jpg, An external file that holds a picture, illustration, etc.
Object name is pone.0010972.e018.jpg and An external file that holds a picture, illustration, etc.
Object name is pone.0010972.e019.jpg for the sequence coded according to the “solvent accessibility” would only generate 7 elements rather than 21.

Now for the “AAC” we have 20 elements [34]; for the “solvent accessibility”, 7 elements; and for each of all the other five types of protein properties, 21 elements. Combining all these extensive quantities together, we have an augmented extensive quantity containing (5×21+20+7) = 132 elements, as listed in Table 1 for the details. Furthermore, some more elements should also be included as will be illustrated below.

Table 1
The 132 biochemical and physicochemical feature components of proteins.

Subcellular location description of proteins

The function of a protein is closely correlated with its subcellular location [40], [41]. In view of this, the prediction power would be improved by incorporating the protein subcellular location information. Unfortunately, only a small amount of proteins have subcellular locations annotated in UniProt [42]. To make up this, the subcellular locations for most proteins were predicted based on the sequence similarity evaluated by BLAST [43]. If the BLAST score of a query protein with a location-known protein was greater than 120, they were considered similar with the query protein. The subcellular locations of the query protein were the intersection of subcellular locations of its sequence similar location-known proteins. Since there were 22 subcellular locations, the subcellular location features of each protein can be represented by a 22-dimensional vector, namely An external file that holds a picture, illustration, etc.
Object name is pone.0010972.e020.jpg, where An external file that holds a picture, illustration, etc.
Object name is pone.0010972.e021.jpg if the protein is located at the An external file that holds a picture, illustration, etc.
Object name is pone.0010972.e022.jpg subcellular location site; otherwise, An external file that holds a picture, illustration, etc.
Object name is pone.0010972.e023.jpg. It is instructive to point out that one can also use the web-server predictor Euk-mPLoc [44] to get the desired information for those proteins without subcellular location annotated in UniProt database. The updated website address for Euk-mPLoc can be found in the Cell-PLoc package [41] as well as in Table 3 of [45]. The good thing about Euk-mPLoc is that it not only can cover up to 22 subcellular location sites but is also able to identify proteins with multiple location sites, which is particularly useful for drug development as elaborated recently by Smith [46].

KEGG enrichment scores of proteins

The simplest and most direct method for predicting the function of a query protein based on the training dataset of function-known proteins is the immediate neighborhood method [47]. The information of the neighbor proteins is also an important environmental feature to the protein concerned. Actually, the neighbor proteins are in interaction with each other in the STRING network [48]. The KEGG enrichment score of the protein and its neighbors was defined as the −log10 of the p value generated by hypergeometric test on KEGG pathway. The larger enrichment score means more overrepresentation. There were 220 KEGG enrichment scores for each of the proteins investigated here.

Number of protein complexes

If a protein can form a complex with other proteins, it will be more stable and have longer half-life. Therefore, the number of this kind of complexes a protein can form is a feature relevant to its stability, and should be counted in prediction as well. We downloaded the protein complex dataset from CORUM [49], which is a comprehensive resource of mammalian protein complexes.

Feature space of proteins

As mentioned above, the 7 types of biochemical and physicochemical properties would contribute 132 components to describe a protein. In addition, its length could also be counted as a component, its occurrences in the 22 subcellular location sites as 22 components, its 220 KEGG enrichment scores as 220 components, and its number in forming protein-protein complexes as a component, the total components used in this study to represent a protein sample would be (132+1+22+220+1) = 376 components. For the list of the 376 feature components, see the Table S1.

Thus, the An external file that holds a picture, illustration, etc.
Object name is pone.0010972.e024.jpg protein sample An external file that holds a picture, illustration, etc.
Object name is pone.0010972.e025.jpg should be formulated as a vector in a 376-D (dimensional) space; i.e.,

equation image
(1)

where An external file that holds a picture, illustration, etc.
Object name is pone.0010972.e027.jpg is the An external file that holds a picture, illustration, etc.
Object name is pone.0010972.e028.jpg An external file that holds a picture, illustration, etc.
Object name is pone.0010972.e029.jpg component of the An external file that holds a picture, illustration, etc.
Object name is pone.0010972.e030.jpg protein sample An external file that holds a picture, illustration, etc.
Object name is pone.0010972.e031.jpg and can be derived by following the procedures as elaborated above.

Note that before performing prediction, each of the 376 components in Eq.1 should undergo the following standard conversion procedure:

equation image
(1a)

where An external file that holds a picture, illustration, etc.
Object name is pone.0010972.e033.jpg is the number of the total proteins in the training dataset, An external file that holds a picture, illustration, etc.
Object name is pone.0010972.e034.jpg and An external file that holds a picture, illustration, etc.
Object name is pone.0010972.e035.jpg are the mean and standard deviation of the An external file that holds a picture, illustration, etc.
Object name is pone.0010972.e036.jpg component over the An external file that holds a picture, illustration, etc.
Object name is pone.0010972.e037.jpg protein samples. The converted values obtained by Eq.1a will have a zero mean value over the An external file that holds a picture, illustration, etc.
Object name is pone.0010972.e038.jpg protein samples, and will remain unchanged if going through the same conversion procedure again [24], [26].

mRMR method

The “maximum relevance & minimum redundancy” (mRMR) method was originally developed by Peng et al. [50] to deal with the microarray data processing. In their method, each feature is ranked according to its relevance to the target and redundancy with other features. A “good” feature is defined as the one that has the best trade-off between maximizing the relevance to the target and minimizing the redundancy within the features. To quantify both the relevance and redundancy, the following mutual information (MI) is defined to estimate how one vector is related to another:

equation image
(2)

where An external file that holds a picture, illustration, etc.
Object name is pone.0010972.e040.jpg, An external file that holds a picture, illustration, etc.
Object name is pone.0010972.e041.jpg are two vectors, An external file that holds a picture, illustration, etc.
Object name is pone.0010972.e042.jpg is the joint probabilistic density, An external file that holds a picture, illustration, etc.
Object name is pone.0010972.e043.jpg and An external file that holds a picture, illustration, etc.
Object name is pone.0010972.e044.jpg are the marginal probabilistic densities.

Suppose An external file that holds a picture, illustration, etc.
Object name is pone.0010972.e045.jpg denotes the entire space containing all the aforementioned 376 components, and An external file that holds a picture, illustration, etc.
Object name is pone.0010972.e046.jpg denotes the space contains An external file that holds a picture, illustration, etc.
Object name is pone.0010972.e047.jpg components selected from An external file that holds a picture, illustration, etc.
Object name is pone.0010972.e048.jpg. The space to be identified is denoted by An external file that holds a picture, illustration, etc.
Object name is pone.0010972.e049.jpg that contains An external file that holds a picture, illustration, etc.
Object name is pone.0010972.e050.jpg components. The relevance An external file that holds a picture, illustration, etc.
Object name is pone.0010972.e051.jpg of the feature An external file that holds a picture, illustration, etc.
Object name is pone.0010972.e052.jpg in An external file that holds a picture, illustration, etc.
Object name is pone.0010972.e053.jpg with the target An external file that holds a picture, illustration, etc.
Object name is pone.0010972.e054.jpg can be calculated by:

equation image
(3)

And the redundancy An external file that holds a picture, illustration, etc.
Object name is pone.0010972.e056.jpg of the feature An external file that holds a picture, illustration, etc.
Object name is pone.0010972.e057.jpg in An external file that holds a picture, illustration, etc.
Object name is pone.0010972.e058.jpg with all the features in An external file that holds a picture, illustration, etc.
Object name is pone.0010972.e059.jpg can be calculated by:

equation image
(4)

To obtain a feature An external file that holds a picture, illustration, etc.
Object name is pone.0010972.e061.jpg in An external file that holds a picture, illustration, etc.
Object name is pone.0010972.e062.jpg with maximum relevance and minimum redundancy, Eqs.3 and 4 are combined with the mRMR function:

equation image
(5)

For a feature set with An external file that holds a picture, illustration, etc.
Object name is pone.0010972.e064.jpg components, the feature evaluation will continue for 376 rounds. After these evaluations, a feature set An external file that holds a picture, illustration, etc.
Object name is pone.0010972.e065.jpg can be obtained by the mRMR method as formulated below:

equation image
(6)

where each feature in An external file that holds a picture, illustration, etc.
Object name is pone.0010972.e067.jpg has an subscript index, indicating at which round that the feature is selected. The better a feature is, the earlier it will satisfy Eq.5 and be selected, and the smaller its subscript index will be.

Nearest Neighbor Algorithm

In our study, the Nearest Neighbor (NN) algorithm or NNA is used to classify a protein as either labile or a stable one. NNA makes its decision by calculating the “distances” of a query protein with all the proteins in the training dataset one-by-one. Varieties of distance scales can be used for this purpose, such as Euclidean distance [51], Hamming distance [52], and Mahalanobis distance [34]. In the current study, the distance between the query protein An external file that holds a picture, illustration, etc.
Object name is pone.0010972.e068.jpg and An external file that holds a picture, illustration, etc.
Object name is pone.0010972.e069.jpg, the An external file that holds a picture, illustration, etc.
Object name is pone.0010972.e070.jpg protein in the training dataset, is defined by [53], [54], [55]:

equation image
(7)

Where An external file that holds a picture, illustration, etc.
Object name is pone.0010972.e072.jpg and An external file that holds a picture, illustration, etc.
Object name is pone.0010972.e073.jpg are the feature component vector of query protein and the An external file that holds a picture, illustration, etc.
Object name is pone.0010972.e074.jpg protein in the training dataset (cf. Eq.1); An external file that holds a picture, illustration, etc.
Object name is pone.0010972.e075.jpg is the inner product of An external file that holds a picture, illustration, etc.
Object name is pone.0010972.e076.jpg and An external file that holds a picture, illustration, etc.
Object name is pone.0010972.e077.jpg; An external file that holds a picture, illustration, etc.
Object name is pone.0010972.e078.jpg and An external file that holds a picture, illustration, etc.
Object name is pone.0010972.e079.jpg represent the modules of vectors An external file that holds a picture, illustration, etc.
Object name is pone.0010972.e080.jpg and An external file that holds a picture, illustration, etc.
Object name is pone.0010972.e081.jpg. The smaller An external file that holds a picture, illustration, etc.
Object name is pone.0010972.e082.jpg is, the more similar An external file that holds a picture, illustration, etc.
Object name is pone.0010972.e083.jpg to An external file that holds a picture, illustration, etc.
Object name is pone.0010972.e084.jpg is. According to the NN rule, given a training set An external file that holds a picture, illustration, etc.
Object name is pone.0010972.e085.jpg, the query protein An external file that holds a picture, illustration, etc.
Object name is pone.0010972.e086.jpg will be predicted belonging to the same class of An external file that holds a picture, illustration, etc.
Object name is pone.0010972.e087.jpg that is the closest to An external file that holds a picture, illustration, etc.
Object name is pone.0010972.e088.jpg. In other words, if

equation image
(8)

where An external file that holds a picture, illustration, etc.
Object name is pone.0010972.e090.jpg is the argument of An external file that holds a picture, illustration, etc.
Object name is pone.0010972.e091.jpg that minimizes An external file that holds a picture, illustration, etc.
Object name is pone.0010972.e092.jpg, and if An external file that holds a picture, illustration, etc.
Object name is pone.0010972.e093.jpg belongs to An external file that holds a picture, illustration, etc.
Object name is pone.0010972.e094.jpg class, then the query protein An external file that holds a picture, illustration, etc.
Object name is pone.0010972.e095.jpg should also belong to the same class.

Jackknife Cross-Validation Method

In biological literatures, the independent dataset test, subsampling or K-fold (such as 5-fold and 10-fold) test, and jackknife test are the three cross-validation methods often used to examine the accuracy of a statistical predictor [52]. Of these three, however, the jackknife is thought the most objective as elucidated in [41] and elaborated in [40]. Therefore, the jackknife cross-validation has been increasingly adopted to examine the power of various predictors (see, e.g., [54], [56], [57], [58], [59], [60]) and will be used in this study as well. During jackknifing, each protein sample in the benchmark dataset is in turn singled out to test using the rule parameters trained by the remaining protein samples. For clarity to describe the test process, let us define

equation image
(9)

where An external file that holds a picture, illustration, etc.
Object name is pone.0010972.e097.jpg is the benchmark dataset used in this study (cf. Dataset S1), An external file that holds a picture, illustration, etc.
Object name is pone.0010972.e098.jpg the sub-dataset containing only the “short” or medium” half-life proteins, An external file that holds a picture, illustration, etc.
Object name is pone.0010972.e099.jpg only the “long” and “extra-long” half-life proteins, An external file that holds a picture, illustration, etc.
Object name is pone.0010972.e100.jpg only the “short” half-life proteins, An external file that holds a picture, illustration, etc.
Object name is pone.0010972.e101.jpg only the “medium” half-life proteins, An external file that holds a picture, illustration, etc.
Object name is pone.0010972.e102.jpg only the “long” half-life proteins, An external file that holds a picture, illustration, etc.
Object name is pone.0010972.e103.jpg only the “extra-long” half-life proteins, and An external file that holds a picture, illustration, etc.
Object name is pone.0010972.e104.jpg the union symbol in the set theory. The jackknife success rates were examined according to the following equations:

equation image
(10)

where An external file that holds a picture, illustration, etc.
Object name is pone.0010972.e106.jpg is the overall success rate in identifying proteins in An external file that holds a picture, illustration, etc.
Object name is pone.0010972.e107.jpg as “short/medium” or “long/extra-long” type (see the 1st equation of Eq.9), An external file that holds a picture, illustration, etc.
Object name is pone.0010972.e108.jpg the number of corrected predictions for the “short/medium” type, An external file that holds a picture, illustration, etc.
Object name is pone.0010972.e109.jpg the number of corrected predictions for the “long/extra-long” type, An external file that holds a picture, illustration, etc.
Object name is pone.0010972.e110.jpg the number of total proteins in An external file that holds a picture, illustration, etc.
Object name is pone.0010972.e111.jpg, and An external file that holds a picture, illustration, etc.
Object name is pone.0010972.e112.jpg the number of total proteins in An external file that holds a picture, illustration, etc.
Object name is pone.0010972.e113.jpg; An external file that holds a picture, illustration, etc.
Object name is pone.0010972.e114.jpg the success rate in identifying proteins in An external file that holds a picture, illustration, etc.
Object name is pone.0010972.e115.jpg as “short” or “medium” type (see the 2nd equation of Eq.9); An external file that holds a picture, illustration, etc.
Object name is pone.0010972.e116.jpg the success rate in identifying proteins in An external file that holds a picture, illustration, etc.
Object name is pone.0010972.e117.jpg as “long” or “extra-long” type (see the 3rd equation of Eq.9).

Feature Selection

Although the mRMR step could arrange the feature components according to some sort of ranks, it is not sufficient for us to determine which feature components should be selected to optimize the performance of our predictor. To solve the problem, the IFS (incremental feature selection) method is adopted as illustrated below.

Based on the ranked features obtained from the mRMR step, we can construct 376 feature component sets by adding one component at a time in an ascending order, with the i-th set given by

equation image
(11)

For each of such An external file that holds a picture, illustration, etc.
Object name is pone.0010972.e119.jpg feature component sets, an NNA predictor was constructed and its jackknife success rate derived. Finally, we obtained a curve, called the IFS curve, with the subscript index i in Eq.11 as its X-axis and the corresponding jackknife success rate as its Y-axis. The feature set, say An external file that holds a picture, illustration, etc.
Object name is pone.0010972.e120.jpg, would be deemed as the optimal one if the IFS curve has a peak at An external file that holds a picture, illustration, etc.
Object name is pone.0010972.e121.jpg.

Predict metabolic stability of drug target proteins

We predicted the stability of 170 proteins targeted by 332 drugs with known half-life. The drug-target pairs and half-life of drugs were downloaded from DrugBank [61]. Only the drugs with well-defined target proteins and half-life were analyzed. To unify the time unit, the half-life spans of all the drugs investigated were uniformly converted to minutes. As formulated in Eqs.8 and 9, the test procedures are as follows. A query drug target protein was first identified as “short/medium” half-life and “long/extra-long” half-life. If it turned out to “short/medium” half-life, the predictor would automatically continue to classify it as “short” half-life or “medium” half-life; otherwise, classify it as “long” half-life or “extra-long” half-life. Finally, each of the drug target proteins investigated was assigned as “short”, “medium”, “long”, or “extra-long” half-life, respectively.

Results

mRMR results

The mRMR program in this study was downloaded from http://penglab.janelia.org/proj/mRMR/. We set the parameter An external file that holds a picture, illustration, etc.
Object name is pone.0010972.e122.jpg to characterize our data into three groups according to their values which are: (1) smaller than An external file that holds a picture, illustration, etc.
Object name is pone.0010972.e123.jpg, (2) between An external file that holds a picture, illustration, etc.
Object name is pone.0010972.e124.jpg and An external file that holds a picture, illustration, etc.
Object name is pone.0010972.e125.jpg, and (3) greater than An external file that holds a picture, illustration, etc.
Object name is pone.0010972.e126.jpg. In the above criteria, An external file that holds a picture, illustration, etc.
Object name is pone.0010972.e127.jpg is the average value of the features in all samples, and An external file that holds a picture, illustration, etc.
Object name is pone.0010972.e128.jpg the standard deviation. In addition to the list generated by the mRMR to show the index of each feature described above, mRMR also output a table called MaxRel list that contains the relevance of features to their target, as defined in Eq.3. In this study, only the mRMR list was used in the feature selection procedure.

IFS results

In the IFS procedure, we built 376 feature sets based on the ordered feature set S obtained in the mRMR step. Accordingly, 376 prediction models were constructed and tested as described above. Shown in Figure 2 is the IFS curve for (A) all the proteins in An external file that holds a picture, illustration, etc.
Object name is pone.0010972.e129.jpg (cf. the 1st equation of Eq.9), (B) only the “short” and “medium” half-life proteins (cf. the 2nd equation of Eq.9), (C) only the “long” and “extra-long” half-life proteins (cf. the 3rd equation of Eq.9). As shown in Figure 2 (A), the overall accuracy reached its peak of 72.8% when the number of feature component used was 62. The 62 feature components selected by mRMR would constitute the optimal feature set for the “short/medium”-“long/extra-long” classifier. The optimal feature set for the “short”-medium” classifier contained 43 feature components, with the peak success rate of 69.8%; while the optimal feature set for the “long”-“extra-long” classifier contained 122 feature components, with the peak success rate of 67.8%. The optimal feature components were extracted according to their impact to the success rates in predicting stability of proteins. The aforementioned 62, 43, and 122 optimal feature components are provided in the Table S2 (A), (B), and (C), respectively.

Figure 2
The IFS curves of protein's metabolic stability predictions.

Analysis of optimal feature components

To investigate what kinds of features are critical for protein stability, we extracted the optimal feature components and counted the numbers of each kind of features. Shown in Figure 3 is the numbers of each kind of features in (A) the 62 feature components for the “short/medium”-“long/extra-long” classifier, (B) the 43 feature components for the “short”-“medium” classifier, and (C) the 122 feature components for the “long”-“extra-long” classifier, respectively. As we can see from Figure 3, the following seven kinds of features play the major roles in affecting the protein stability: (1) KEGG enrichment scores, (2) subcellular locations, (3) polarity, (4) amino acids composition, (5) hydrophobicity, (6) secondary structure propensity, and (7) the number of protein complexes.

Figure 3
The numbers of each kind of features in optimal feature sets.

In a recent work Yen et al. [23] discovered that protein stability was correlated with amino acid composition. Our results have further confirmed their finding. These authors also found that the short half-life group and medium half-life group had a larger proportion of the unstable “cell cycle control” proteins, and that the long half-life group had a larger fraction of “mitosis” proteins consisting of actins, tubulins, septins, and so forth. Interestingly, our studies indicate that the metabolic stability of a protein is associated with its subcellular location, such as whether it is located in nucleus, cytoplasm, extracellular, or cell membrane, quite consistent with their findings [23] as well. Meanwhile, it was found that the enrichment of degradation, metabolism and signaling pathways could help predict protein's metabolic stability (see Table S2), which is quite sensible as well.

Proteins bound with ligands or proteins not prone to be denatured, are usually more stable. This would logically require them to have proper fold patterns or microenvironments. The reason why membrane proteins are relatively more stable is that their folding process involves binding with, transmembrane helix insertion into (see, e.g., [3], [62], [63]), and helix-helix interactions with the presence of bilayer interfaces [64], [65], [66]. Membrane protein fold topology may be categorized into two basic secondary structural motifs, namely α-helices and β-barrels [67]. Stability is a consequence of the low electrostatic potential energy of small substructures called knots and is opposed by the stress developed in contraction of the large substructures called matrices [68]. The features investigated in this study have provided useful insight regarding the energetics of driving forces governing folding, assembly, insertion, and translocation of membrane proteins [69]. The knowledge of inter-residue interactions in proteins structures is very useful for understanding the mechanism of protein folding and stability. Also, the secondary structure propensity of amino acids in a protein, as well as their polarity and hydrophobicity, would play an important role to the inter-residue interactions, and hence to its fold pattern [70], folding rate [71], [72], and stability as well [73]. Furthermore, driven by the hydrophobic force, a protein could overcome the entropic barrier and fold from a random coiled state into some type of topological shape, with disulfide bonding, hydrogen bonding, ion-pairs, and van der Waals interactions defining the shape and keeping it from falling apart [74].

A general solution for predicting the metabolic stability of proteins, even with a moderate success rate, is an extremely difficult and complicated problem. However, any progress in this regard would provide us with very useful insights for in-depth researches in protein science and developing new strategy for drug design.

The predicted metabolic stability of drug target proteins

It is interesting to predict the metabolic stability of drug target proteins and compare the results with the half-life spans of the corresponding drugs. Although there were many factors that can affect the half-life of a drug, we found that the stability of its target protein is a quite important one. For demonstration, the predicted metabolic stability outcomes for some drug target proteins and the real half-life spans of their corresponding drugs are given in the Table S3, from which we found some intriguing correlations. The half-life of drugs targeted to proteins with predicted “short or medium half-life” (with median of 420 minutes) was shorter than the half-life of drugs targeted to proteins with predicted “long or extra-long half-life” (with median of 709 minutes). The median of the half-life of drugs targeted to proteins with predicted “short half-life”, “medium half-life”, “long half-life” and “extra-long half-life” were 303, 510, 540 and 1080 minutes, respectively.

For instance, Dinoprostone (DrugBank accession number DB00917) is a prescription drug used, as a vaginal suppository, to prepare the cervix for labour and to induce labour. The half-life of Dinoprostone is less than 5 minutes. The predicted stability results for its target proteins PTGER1 (UniProtKB/Swiss-Prot ID P34995) and PTGER2 (UniProtKB/Swiss-Prot ID P43116) were both “short” half-life. Again, Clorazepate (DrugBank accession number DB00628) is for treating anxiety. It also has the function for muscle relaxant and anticonvulsant. Its half-life is about 2 days (1,440 minutes), and the predicted stability for its target proteins BZRP (UniProtKB/Swiss-Prot ID P30536) and GABRA1 (UniProtKB/Swiss-Prot ID P14867) were “long” and “extra-long”, respectively, fully consistent with the sense that the more stable a protein is, the longer half-life drug is needed for effectively targeting it; and vice versa.

Discussion

We have developed a new method for predicting the metabolic stability of proteins by integrating their various biochemical and physicochemical features. It is indicated by the rigorous jackknife cross-validation test that the predictor can achieve an overall success rate of 72.8%. With the feature selection approach based on the mRMR method and IFS procedure, we found that the following seven features would play the major roles in determining the stability of proteins: KEGG enrichment scores, subcellular locations, polarity, amino acids composition, hydrophobicity, secondary structure propensity, and the number of protein complexes. These findings might provide useful information for drug development. The method presented in this paper might also become a high throughput tool for large-scale annotating the metabolic stability of proteins.

Supporting Information

Dataset S1

The sequences of benchmark dataset.

(0.71 MB TXT)

Table S1

List of the 376 feature components.

(0.10 MB XLS)

Table S2

The optimal feature components.

(0.05 MB XLS)

Table S3

The predicted metabolic stability for drug target proteins.

(0.10 MB XLS)

Acknowledgments

The authors wish to thank the two Reviewers for their constructive comments, which are very helpful for strengthening the presentation of this study.

Footnotes

Competing Interests: The authors have declared that no competing interests exist.

Funding: This work was supported by grants from the National High Technology Research and Development Program of China (2006AA02A301), the National Basic Research Program of China (No. 2007CB512202), the Knowledge Innovation Program of the Chinese Academy of Sciences (Grant No. KSCX1-YW-R-74) and Key Research Program (CAS) (KSCX2-YW-R-112). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

References

1. Chou KC. Review: Low-frequency collective motion in biomacromolecules and its biological functions. Biophysical Chemistry. 1988;30:3–48. [PubMed]
2. Madkan A, Blank M, Elson E, Chou KC, Geddis MS, et al. Steps to the clinic with ELF EMF. Natural Science. 2009;1:157–165.
3. Schnell JR, Chou JJ. Structure and mechanism of the M2 proton channel of influenza A virus. Nature. 2008;451:591–595. [PMC free article] [PubMed]
4. Martel P. Biophysical aspects of neutron scattering from vibrational modes of proteins. Prog Biophys Mol Biol. 1992;57:129–179. [PubMed]
5. Kamerzell TJ, Middaugh CR. The complex inter-relationships between protein flexibility and stability. J Pharm Sci. 2008;97:3494–3517. [PubMed]
6. Chou KC, Chen NY. The biological functions of low-frequency phonons. Scientia Sinica. 1977;20:447–457.
7. Chou KC. The biological functions of low-frequency phonons: 4. Resonance effects and allosteric transition. Biophysical Chemistry. 1984;20:61–71. [PubMed]
8. Chou KC. The biological functions of low-frequency phonons: 6. A possible dynamic mechanism of allosteric transition in antibody molecules. Biopolymers. 1987;26:285–295. [PubMed]
9. Chou KC, Mao B. Collective motion in DNA and its role in drug intercalation. Biopolymers. 1988;27:1795–1815. [PubMed]
10. Chou KC. Low-frequency resonance and cooperativity of hemoglobin. Trends in Biochemical Sciences. 1989;14:212. [PubMed]
11. Chou KC, Zhang CT, Maggiora GM. Solitary wave dynamics as a mechanism for explaining the internal motion during microtubule growth. Biopolymers. 1994;34:143–153. [PubMed]
12. Pielak RM, Jason R, Schnell JR, Chou JJ. Mechanism of drug inhibition and drug resistance of influenza A M2 channel. Proceedings of National Academy of Science, USA. 2009;106:7379–7384. [PMC free article] [PubMed]
13. Huang RB, Du QS, Wang CH, Chou KC. An in-depth analysis of the biological functional studies based on the NMR M2 channel structure of influenza A virus. Biochem Biophys Res Comm. 2008;377:1243–1247. [PubMed]
14. Du QS, Huang RB, Wang CH, Li XM, Chou KC. Energetic analysis of the two controversial drug binding sites of the M2 proton channel in influenza A virus. Journal of Theoretical Biology. 2009;259:159–164. [PubMed]
15. Wang JF, Chou KC. Insight into the molecular switch mechanism of human Rab5a from molecular dynamics simulations. Biochemical and Biophysical Research Communications. 2009;390:608–612. [PubMed]
16. Wang JF, Gong K, Wei DQ, Li YX, Chou KC. Molecular dynamics studies on the interactions of PTP1B with inhibitors: from the first phosphate binding site to the second one. Protein Engineering Design and Selection. 2009;22:349–355. [PubMed]
17. Wang JF, Yan JY, Wei DQ, Chou KC. Binding of CYP2C9 with diverse drugs and its implications for metabolic mechanism. Medicinal Chemistry. 2009;5:263–270. [PubMed]
18. Chou JJ, Li S, Klee CB, Bax A. Solution structure of Ca2+-calmodulin reveals flexible hand-like properties of its domains. Nature Structural Biology. 2001;8:990–997. [PubMed]
19. Li L, Wei DQ, Wang JF, Chou KC. Computational studies of the binding mechanism of calmodulin with chrysin. Biochem Biophys Res Comm. 2007;358:1102–1107. [PubMed]
20. Wei H, Wang CH, Du QS, Meng J, Chou KC. Investigation into adamantane-based M2 inhibitors with FB-QSAR. Medicinal Chemistry. 2009;5:305–317. [PubMed]
21. Gong K, Li L, Wang JF, Cheng F, Wei DQ, et al. Binding mechanism of H5N1 influenza virus neuraminidase with ligands and its implication for drug design. Medicinal Chemistry. 2009;5:242–249. [PubMed]
22. Wang JF, Zhang CC, Chou KC, Wei DQ. Review: Structure of cytochrome P450s and personalized drug. Current Medicinal Chemistry. 2009;16:232–244. [PubMed]
23. Yen HC, Xu Q, Chou DM, Zhao Z, Elledge SJ. Global protein stability profiling in mammalian cells. Science. 2008;322:918–923. [PubMed]
24. Chou KC. Pseudo amino acid composition and its applications in bioinformatics, proteomics and system biology. Current Proteomics. 2009;6:262–274.
25. Rice P, Longden I, Bleasby A. EMBOSS: the European Molecular Biology Open Software Suite. Trends Genet. 2000;16:276–277. [PubMed]
26. Chou KC. Prediction of protein cellular attributes using pseudo amino acid composition. PROTEINS: Structure, Function, and Genetics (Erratum: ibid, 2001, Vol44, 60) 2001;43:246–255. [PubMed]
27. Chou KC. Using amphiphilic pseudo amino acid composition to predict enzyme subfamily classes. Bioinformatics. 2005;21:10–19. [PubMed]
28. Dubchak I, Muchnik I, Mayor C, Dralyuk I, Kim SH. Recognition of a protein fold in the context of the SCOP classification. Proteins-Structure Function and Genetics. 1999;35:401–407. [PubMed]
29. Niu B, Jin Y, Lu L, Fen K, Gu L, et al. Prediction of interaction between small molecule and enzyme using AdaBoost. Mol Divers. 2009;13:313–320. [PubMed]
30. Xiao X, Chou KC. Digital coding of amino acids based on hydrophobic index. Protein & Peptide Letters. 2007;14:871–875. [PubMed]
31. Zhang TL, Ding YS, Chou KC. Prediction protein structural classes with pseudo amino acid composition: approximate entropy and hydrophobicity pattern. Journal of Theoretical Biology. 2008;250:186–193. [PubMed]
32. Chou KC, Zhang CT. Predicting protein folding types by distance functions that make allowances for amino acid interactions. Journal of Biological Chemistry. 1994;269:22014–22020. [PubMed]
33. Dubchak I, Muchnik I, Mayor C, Dralyuk I, Kim SH. Recognition of a protein fold in the context of the Structural Classification of Proteins (SCOP) classification. Proteins: Structure, Function, and Genetics. 1999;35:401–407. [PubMed]
34. Chou KC. A novel approach to predicting protein structural classes in a (20-1)-D amino acid composition space. Proteins: Structure, Function & Genetics. 1995;21:319–344. [PubMed]
35. Pollastri G, Przybylski D, Rost B, Baldi P. Improving the prediction of protein secondary structure in three and eight classes using recurrent neural networks and profiles. Proteins-Structure Function and Genetics. 2002;47:228–35. [PubMed]
36. Pollastri G, Baldi P, Fariselli P, Casadio R. Prediction of coordination number and relative solvent accessibility in proteins. Proteins-Structure Function and Genetics. 2002;47:142–53. [PubMed]
37. Chothia C, Finkelstein AV. The classification and origins of protein folding patterns. Annu Rev Biochem. 1990;59:1007–1039. [PubMed]
38. Fauchere JL, Charton M, Kier LB, Verloop A, Pliska V. Amino acid side chain parameters for correlation studies in biology and pharmacology. International Journal of Peptide and Protein Research. 1988;32:269–278. [PubMed]
39. Grantham R. Amino acid difference formula to help explain protein evolution. Science. 1974;185:862–864. [PubMed]
40. Chou KC, Shen HB. Review: Recent progresses in protein subcellular location prediction. Analytical Biochemistry. 2007;370:1–16. [PubMed]
41. Chou KC, Shen HB. Cell-PLoc: A package of web-servers for predicting subcellular localization of proteins in various organisms. Nature Protocols. 2008;3:153–162. [PubMed]
42. The UniProt Consortium. The Universal Protein Resource (UniProt) 2009. Nucl Acids Res. 2009;37:D169–174. [PMC free article] [PubMed]
43. Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ. Basic local alignment search tool. J Mol Biol. 1990;215:403–410. [PubMed]
44. Chou KC, Shen HB. A new method for predicting the subcellular localization of eukaryotic proteins with both single and multiple sites: Euk-mPLoc 2.0. PLoS ONE. 2010;5:e9931. [PMC free article] [PubMed]
45. Chou KC, Shen HB. Review: recent advances in developing web-servers for predicting protein attributes. Natural Science. 2009;2:63–92.
47. Sharan R, Ulitsky I, Shamir R. Network-based prediction of protein function. Mol Syst Biol. 2007;3:88. [PMC free article] [PubMed]
48. Jensen LJ, Kuhn M, Stark M, Chaffron S, Creevey C, et al. STRING 8–a global view on proteins and their functional interactions in 630 organisms. Nucleic Acids Research. 2009;37:D412–416. [PMC free article] [PubMed]
49. Ruepp A, Waegele B, Lechner M, Brauner B, Dunger-Kaltenbach I, et al. CORUM: the comprehensive resource of mammalian protein complexes–2009. Nucleic Acids Research. 2010;38:D497–501. [PMC free article] [PubMed]
50. Peng H, Long F, Ding C. Feature selection based on mutual information: criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans Pattern Anal Mach Intell. 2005;27:1226–1238. [PubMed]
51. Chou KC, Zhang CT. A correlation coefficient method to predicting protein structural classes from amino acid compositions. European Journal of Biochemistry. 1992;207:429–433. [PubMed]
52. Chou KC, Zhang CT. Review: Prediction of protein structural classes. Critical Reviews in Biochemistry and Molecular Biology. 1995;30:275–349. [PubMed]
53. Qian Z, Cai YD, Li Y. A novel computational method to predict transcription factor DNA binding preference. Biochem Biophys Res Commun. 2006;348:1034–1037. [PubMed]
54. Huang T, Cui W, Hu L, Feng K, Li YX, et al. Prediction of pharmacological and xenobiotic responses to drugs based on time course gene expression profiles. PLoS One. 2009;4:e8126. [PMC free article] [PubMed]
55. Chou KC, Cai YD. A new hybrid approach to predict subcellular localization of proteins by incorporating gene ontology. Biochemical and Biophysical Research Communications. 2003;311:743–747. [PubMed]
56. Lin H. The modified Mahalanobis discriminant for predicting outer membrane proteins by using Chou's pseudo amino acid composition. Journal of Theoretical Biology. 2008;252:350–356. [PubMed]
57. Chen C, Chen L, Zou X, Cai P. Prediction of protein secondary structure content by using the concept of Chou's pseudo amino acid composition and support vector machine. Protein & Peptide Letters. 2009;16:27–31. [PubMed]
58. Ding H, Luo L, Lin H. Prediction of cell wall lytic enzymes using Chou's amphiphilic pseudo amino acid composition. Protein & Peptide Letters. 2009;16:351–355. [PubMed]
59. Li FM, Li QZ. Predicting protein subcellular location using Chou's pseudo amino acid composition and improved hybrid approach. Protein & Peptide Letters. 2008;15:612–616. [PubMed]
60. Lin H, Ding H, Feng-Biao Guo FB, Zhang AY, Huang J. Predicting subcellular localization of mycobacterial proteins by using Chou's pseudo amino acid composition. Protein & Peptide Letters. 2008;15:739–744. [PubMed]
61. Wishart DS, Knox C, Guo AC, Cheng D, Shrivastava S, et al. DrugBank: a knowledgebase for drugs, drug actions and drug targets. Nucleic Acids Research. 2008;36:D901–906. [PMC free article] [PubMed]
62. Wang J, Pielak RM, McClintock MA, Chou JJ. Solution structure and functional analysis of the influenza B proton channel. Nat Struct Mol Biol. 2009;16:1267–1271. [PMC free article] [PubMed]
63. Oxenoid K, Chou JJ. The structure of phospholamban pentamer reveals a channel-like architecture in membranes. Proceedings of the National Academy of Sciences of the United States of America. 2005;102:10870–10875. [PMC free article] [PubMed]
64. Cristian L, Lear JD, DeGrado WF. Determination of membrane protein stability via thermodynamic coupling of folding to thiol-disulfide interchange. Protein Sci. 2003;12:1732–1740. [PMC free article] [PubMed]
65. White SH, Wimley WC. Membrane protein folding and stability: physical principles. Annu Rev Biophys Biomol Struct. 1999;28:319–365. [PubMed]
66. Chou KC, Carlacci L, Maggiora GM, Parodi LA, Schultz MW. An energy-based approach to packing the 7-helix bundle of bacteriorhodopsin. Protein Science. 1992;1:810–827. [PMC free article] [PubMed]
67. Chou KC, Carlacci L. Energetic approach to the folding of alpha/beta barrels. Proteins: Structure, Function, and Genetics. 1991;9:280–295. [PubMed]
68. Lumry R. Protein substructures and folded stability. Biophys Chem. 2002;101–102:81–92. [PubMed]
69. Minetti CA, Remeta DP. Energetics of membrane protein folding and stability. Arch Biochem Biophys. 2006;453:32–53. [PubMed]
70. Shen HB, Chou KC. Predicting protein fold pattern with functional domain and sequential evolution information. Journal of Theoretical Biology. 2009;256:441–446. [PubMed]
71. Shen HB, Song JN, Chou KC. Prediction of protein folding rates from primary sequence by fusing multiple sequential features. Journal of Biomedical Science and Engineering (JBiSE) 2009;2:136–143.
72. Chou KC, Shen HB. FoldRate: A web-server for predicting protein folding rates from primary sequence. The Open Bioinformatics Journal. 2009;3:31–50.
73. Gromiha MM, Selvaraj S. Inter-residue interactions in protein folding and stability. Prog Biophys Mol Biol. 2004;86:235–277. [PubMed]
74. Fields PA. Review: Protein function at thermal extremes: balancing stability and flexibility. Comp Biochem Physiol A Mol Integr Physiol. 2001;129:417–431. [PubMed]

Articles from PLoS ONE are provided here courtesy of Public Library of Science
PubReader format: click here to try

Formats:

Related citations in PubMed

See reviews...See all...

Cited by other articles in PMC

See all...

Links

  • Protein
    Protein
    Published protein sequences
  • PubMed
    PubMed
    PubMed citations for these articles
  • Substance
    Substance
    PubChem Substance links

Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...