Logo of nihpaAbout Author manuscriptsSubmit a manuscriptNIH Public Access; Author Manuscript; Accepted for publication in peer reviewed journal;
J Med Chem. Author manuscript; available in PMC Feb 12, 2010.
Published in final edited form as:
PMCID: PMC2715819

Additivity in the Analysis and Design of HIV Protease Inhibitors


We explore the applicability of an additive treatment of substituent effects to the analysis and design of HIV protease inhibitors. Affinity data for a set of inhibitors with a common chemical framework were analyzed to provide estimates of the free energy contribution of each chemical substituent. These estimates were then used to design new inhibitors, whose high affinities were confirmed by synthesis and experimental testing. Derivations of additive models by least-squares and ridge-regression methods were found to yield statistically similar results. The additivity approach was also compared with standard molecular descriptor-based QSAR; the latter was not found to provide superior predictions. Crystallographic studies of HIV protease-inhibitor complexes help explain the perhaps surprisingly high degree of substituent additivity in this system, and allow some of the additivity coefficients to be rationalized on a structural basis.

1 Introduction

The human immunodeficiency virus (HIV), the cause of AIDS, currently infects more than 30 million people around the world1 and approximately 2 million people died of AIDS in 2007 alone. Current treatments include inhibiting the reverse transcriptase and protease of HIV with small molecule drugs, but their effectiveness can be diminished by the occurrence of resistance mutations in the virus.2 New inhibitors of the HIV reverse transcriptase and protease are thus needed that will inhibit not only wild-type but also mutated forms of the virus's proteins.3,4

We have therefore sought to develop compounds which can inhibit both wild-type and mutated forms of the HIV protease.5,6,7,8 The design approach is based in part upon the hypothesis that inhibitors that bind within a consensus envelope of bound substrate peptides are more likely to retain affinity for clinically relevant mutants.9,10,11,12,13 Many of the HIV protease inhibitors synthesized and tested in the course of this effort possess a common chemical scaffold with three variable substituent positions (Figure 1). These compounds form an incomplete combinatorial library and the question arose as to whether any of the unsynthesized compounds within the full library should be expected to bind HIV protease with high affinity and might thus be worth synthesizing and testing.

Figure 1
Framework of HIV protease inhibitors studied in this work, showing the three points of substitution.

In approaching this problem, it is natural to ask which are the best substituents, and to make sure that all combinations of the best substituents have been tested. Such an approach implicitly relies upon an assumption of independence; i.e., that a substituent which appears in a high affinity compound will also tend to impart high affinity when combined with other substituents to form a different compound. The present study makes this assumption explicit through an implementation of the Free and Wilson additivity model,14 and evaluates the accuracy of the additivity model not only retrospectively but also prospectively, by using it to select additional compounds from within the virtual combinatorial library for synthesis and testing. The results bear on the reliability of the additivity approximation in the present system. In addition, methodological variations are assessed, and the additivity approach is furthermore compared with a traditional quantitative structure-activity relationship (QSAR) evaluation of the same data sets. The results of the additivity analysis are also considered in the light of existing and new crystallographic complexes of bound inhibitors from the present series.

2 Methods

2.1 Data Modeling

2.1.1 Additive model for affinity

The additive model assumes that the free energy contributions of the substituents of a given compound are independent and additive.14 Two formulations are considered. In the first, one compound is chosen to be the reference compound and its substituents are the reference substituents for each of the three positions of substitution. The additivity model is then given by:


where pKi(a,b,c) is the predicted pKi of a compound in the series with substituents a, b and c at the R1, R2 and R3 positions, respectively; S1a approximates the change in pKi upon replacing the reference substituent at R1 (a=1) with substituent a, while leaving R2 and R3 unchanged; and S2b and S3c have analogous interpretations for the R2 and R3 positions, respectively. The values of S1a, S2b and S3c are obtained by least-squares fitting, as described below. In the second formulation, similar to that of Free and Wilson,14 the observed pKi values are mean-centered and scaled, i.e., linearly transformed to yield a mean of zero and a standard deviation of one. This procedure removes the need to specify a reference molecule or reference substituents.

Two methods of parameterizing the additive models of affinity were tested, ordinary least-squares regression and ridge regression. The next two subsections describe these procedures. Fitting the additive model by ordinary least-squares regression

Ordinary least-squares regression was used to optimize the values of the substituent parameters based on the experimental pKi data. This method fits a vector β to the linear equation:


such that the residual sum of squares deviations (RSS) is a minimum.


The values in the column vector β correspond to all the substituent parameters S1a, S2b and S3c in Equation 1 other than the reference substituents; element i in the vector yobs equals the measured value of pKi(a, b, c) - pKi(1, 1, 1) for compound i and element i in the vector yfit is the corresponding fitted; and the rows of the design matrix X contain ones and zeroes, depending on whether or not a given molecule contains a given substituent.

Least-squares fitting was performed via singular value decomposition15 with code implemented by the authors in C and employing an available singular value decomposition routine.16. Additional code was written to compute predicted values of pKi from Equation 1.

Four of the measured Ki values were reported experimentally as ≥10 μM, corresponding to a pKi≤5. For convenience, these inequalities were converted to ranges by setting the somewhat arbitrary limit that pKi≥2. The same least-squares fitting procedure then was carried out for all combinations of three possibilities for each of these measurements: use of the upper limit in the fit; use of the lower limit; and exclusion of the pKi value from the fit. The combination that yielded the lowest sum-squared error was chosen. Empirically, the fitted values never came close to the arbitrary lower limit of 2, so the precise value of this quantity is not an important parameter of the models. The same method was used to establish pKi ranges of 11-13 for tight-binding inhibitors, as detailed in Results.

Confidence limits at the 95% level for the fitted parameters and pKi predictions were obtained by the bootstrap sampling procedure.15,17 This procedure consists of random selection of molecules from the training set with replacement, such that the total number sampled is equal to the number of molecules in the training set. For each of 500 such bootstrap samples, a set of substituent parameters was obtained from fitting, and pKi predictions made where applicable. Confidence limits for each parameter and pKi prediction were obtained by determining the range containing the middle 95% of predictions. Note that some bootstrap samples lack data to enable fitting for some parameters and therefore do not yield parameter values needed for some of the pKi predictions, so there may be fewer than 500 parameters/predictions from which to calculate the relevant confidence interval.

The statistical significance of the fit of modeled (yfit) to measured (yobs) pKi values was assessed by calculating the appropriate F-statistic and its corresponding p-value. Molecules containing unique instances of their R1 substituents and their associated R1 substituent parameters were omitted from this analysis because their parameters could be trivially adjusted to give a perfect fit for that molecule's pKi value. The F-statistic was calculated as:


where TSS is the total sum of squares = i(yobs,iyobs¯)2, RSS is the residual sum of squares (Eq 3), N is the number of observations (number of molecules included in the calculation of the F-statistic) and M is the number of parameters (number of substituent parameters included). The corresponding p-value was obtained from the look-up table at http://graphpad.com/quickcalcs/pvalue1.cfm with the associated numerator and denominator degrees of freedom M and N-M-1, respectively. The calculated p-values were less than 0.0001 in all cases, indicating high statistical significance.

The effect of changing the reference molecule was shown to be modest. The value of any fitted parameter rarely deviated by more than 0.25 pKi units from the median value of that parameter obtained from systematically changing the reference molecule to all of the training molecules in a given set. Similarly, the values of the pKi predictions for each test set molecule were always within 0.25 pKi units of the median predictions for that molecule (results not shown). Fitting the additive model by ridge regression

Ridge regression18,19 was evaluated as an alternative to ordinary least squares for optimization of the additive substituent parameters. Ridge regression fits observed data to Equation 2 but operates on mean-centered and scaled data (see above), and supplements the RSS in the target function with an additional term that penalizes fits with large substituent parameters:


where λ is the ridge parameter. The value of RSSRR is minimized by the regression coefficients β that solve the following linear equation:


The ridge parameter λ is a positive number that drives the coefficients in the β vector toward zero and drives all values of yfit toward the average value of the observed data yobs. As λ goes to zero, ridge regression reduces to linear least-squares regression with mean-centered and scaled data. Shrinking the coefficients by using an appropriate value of λ can reduce the sensitivity of the fitted model to noise in the input data and thereby improve its predictivity. Even a small value of λ may be sufficient to avoid problems that can arise if the matrix product XTX is ill-conditioned; i.e., if it is sensitive to numerical errors upon matrix inversion or similar operations. However, if λ is too large, then the shrinkage of the coefficients detracts from the predictivity of the model.

The code to perform ridge regression calculations was written in C by the authors. Fitting was carried out with λ set to 10-12, 1, 3, 10 and 100 in order to assess the trade-off between the increasing RSS and the decreasing size of the sum-of-squared coefficients in the β vector. The optimal value of λ will shrink the coefficients while producing only a small increase in RSS. Although methods exist to find λ through cross-validation, they were not used in this case because some of the molecules in the training set contain unique instances of some of the R1 substituents, and predictions for these molecules made within the context of cross-validation (in which these molecules would be omitted from the training) would be meaningless. The λ value of 10-12 amounts to ordinary linear regression with mean-centered and scaled data. The present ridge regression code was not outfitted with the ability to handle data ranges, so compounds for which only inequality data are available were omitted. Also, so that the ridge regression results could be compared with the traditional QSAR results on an equal footing, compounds with substituents 5 or 6 at R2 were omitted. These substituents are not distinguished from each other by the global QSAR descriptors because they are stereoisomers of each other, and so we wished to exclude them from the QSAR analysis. This proved to remove most of the compounds containing R1 substituents 17, 18 and 19, and so the remaining four of these compounds were also removed. The resulting set of compounds used for the ridge regression and regular QSAR tests comprises compounds 1-55 and 71-106 from Table 1.

Table 1
Substituent indices, and observed and fitted pKi values of the molecules used in this study. The chemical structures of the substituents are shown in Table 2. The dashed line separates the compounds whose R1 substituents are the larger cyclic carbamate-containing ...

2.1.2 Modeling affinity by QSAR with global molecular descriptors

The additivity approximation was compared with traditional QSAR methods in which the molecules are represented by global molecular descriptors, which quantify aspects of the chemical structure as a whole. Parameters were fit by partial least squares (PLS)19,20 and ridge regression.18,19

Molecular descriptors were calculated with version 2.1 of the DRAGON program.21 Here, 511 descriptors were calculated from descriptor categories 1-6 (constitutional descriptors, topological descriptors, molecular walk counts, BCUT descriptors, Galvez topological charge indices and 2D autocorrelations). Principal components from each descriptor class were calculated with DRAGON, which generated 39 principal component-derived descriptors for the set of molecules. One set of calculations used all 39 transformed descriptors for both PLS and ridge regression. A second set of calculations used a genetic algorithm (GA) to select a subset of the 39 descriptors. The genetic algorithm used is very similar to that used by Hoffman et al.22 for selecting molecular descriptors for PLS regression. For both GA-PLS and GA-ridge regression methods, five GA runs were performed, each with 100 generations of 100 chromosomes apiece. The descriptors chosen were those encoded by the most fit chromosome.

For the GA-PLS method, the fitness was a function of the cross-validated Q2 value; the number of compounds, n; and the number of PLS factors, c with an additional term to penalize models using more than 6 descriptors:


Here m6 equals the larger of zero and the number of descriptors selected minus six. It was added to the fitness function because we have observed that GA-PLS calculations without such a term can find a highly fit descriptor set even when the descriptors are merely random numbers arbitrarily assigned to each compound. Penalizing models with a larger number of selected descriptors ameliorates this problem (unpublished results).

For ridge regression without GA, the value of λ that corresponded to the highest value of the leave-one-out statistic, Q2, was found using the golden section search.15 For ridge regression with GA, the value of λ was chosen to optimize a rapidly computable approximation to Q2, the generalized cross-validation statistic (GCV). The GCV approximates leave-one-out cross validation without actually performing repeated calculations leaving out each molecule in turn.23 To rapidly find the GCV-optimized value of λ, the following formula was iterated to convergence, 24 starting from an initial guess of 0.1:


where n is the number of data points (pKi values); m is the number of variables (descriptors); A = XTX + λIm; the projection matrix P = InXA-1XT; and Ix is an identity matrix with x rows and columns. The term yTobs P2 yobs is the residual sum-of-squares.24 In the rare instance when this iterative procedure failed to converge, the value of λ was left at 0.1.

The resultant GA fitness function for ridge regression was the leave-one-out cross-validation statistic Q2 minus the quantity m62 described earlier. Note that the GCV method was not used to calculate the leave-one-out cross-validation statistic Q2 itself, but only used to find reasonable values for λ during the cross-validation.

2.2 Compounds Studied as HIV-1 Protease Inhibitors

Table 1 lists all 106 compounds studied here, and Table 2 provides the chemical structures of their substituents. The R1 substituents are separated into those which contain a cyclic carbamate group and those which do not, and the inhibitors are similarly divided into those which contain a cyclic carbamate group at R1 (CC compounds) and those which do not (non-CC compounds). Some of the CC compounds achieve high affinities, but the CC substituents tend to be larger than the non-CC substituents and thus tend to protrude from the substrate envelope more than the non-CC substituents. Therefore, the substrate envelope hypothesis would suggest that inhibitors containing non-CC R1 inhibitors should better inhibit mutant forms of the HIV protease.9,10,11,12,13 The design methodologies for most of the training set compounds discussed in this work have been reported elsewhere,5,6,7 as have the chemical synthesis and inhibition assays.5,6,7,25,26,27 The synthesis of new compounds is described in the next section.

Table 2
The R1, R2 and R3 substituents of the compounds listed in Table 1.a

2.3 Experimental Methods

2.3.1 Chemistry

The general synthetic route applied for the preparation of the inhibitors is illustrated in Scheme 1. The Boc-protected intermediates (R)-(hydroxyethylamino)sulfonamides 110112 were prepared according to the procedures described earlier.5 Briefly, ring opening of commercially available chiral epoxide, (1S,2S)-(1-oxiranyl-2-phenylethyl)carbamic acid tert-butyl ester 107 with selected R2 primary amines provided the amino alcohols 108 and 109. Reactions of selected R3 sulfonyl chlorides with 108 and/or 109 gave the sulfonamide intermediates, (R)-(hydroxyethylamino)sulfonamides 110112. After removing the Boc protection, the free amine fragments were coupled with selected R1 carboxylic acids using two different coupling methods: The cyclic carbamate-based acid fragment was first converted to the corresponding acyl chloride and then reacted with amine to provide the target compounds 35 and 36 (Method A);5 the selected carboxylic acids were reacted with free amines using EDCI/HOBt in H2O-CH2Cl2 (1:1) mixture to afford the designed inhibitors 37, 41, 44, 49, 51 and 53 (Method B).25

Scheme 1
Reaction Scheme for the Synthesis of Designed Protease Inhibitors

2.3.2 HIV-1 protease inhibition assays

The HIV-1 protease inhibitory activities of all newly designed inhibitors were determined by a fluorescence resonance energy transfer (FRET) method.5,26 Protease substrate, (Arg-Glu(EDANS)-Ser-Gln-Asn-Tyr-Pro-Ile-Val-Gln-Lys(DABCYL)-Arg) was purchased from Molecular Probes. The energy transfer donor (EDANS) and acceptor (DABCYL) dyes are labeled at two ends of the peptide, respectively, to perform FRET. Fluorescence measurements were carried out on a fluorescence spectrophotometer (Photon Technology International) at 30 °C. Excitation and emission wavelengths were set at 340 nm and 490 nm, respectively. Each reaction was recorded for about 10 min. Wild-type HIV-1 protease (Q7K) was desalted through PD-10 columns (Amersham Biosciences). Sodium acetate (20 mM, pH 5) was used as elution buffer. Apparent protease concentrations were around 50 nM estimated by UV spectrophotometry at 280 nm. All inhibitors were dissolved in dimethylsulfoxide (DMSO) and diluted to appropriate concentrations. Protease (2 μL) and inhibitor (2 μL) or DMSO were mixed and incubated for 20–30 min at room temperature before initializing substrate cleavage reaction. For all experiments, 150 μL of 1 μM substrate were used in substrate buffer [0.1 M sodium acetate, 1 M sodium chloride, 1 mM ethylenediaminetetraacetic acid (EDTA), 1 mM dithiothreitol (DTT), 2% DMSO and 1 mg/mL bovine serum albumin (BSA) with an adjusted pH 4.7]. Inhibitor binding dissociation constant (Ki) values were obtained by nonlinear regression fitting (GraFit 5, Erithacus software) to the plot of initial velocity as a function of inhibitor concentrations based on the Morrison equation.27 The initial velocities were derived from the linear range of reaction curves.

3 Results

Three rounds of modeling and two rounds of synthesis were carried out. The first additivity model (Model 1) was generated with the least-squares regression methodology using the pKi values of 61 molecules that had been synthesized to date. Based on this model, several further compounds were proposed, synthesized and tested. The second additivity model (Model 2) was generated after the pKi values of 39 more compounds, including some of the proposed compounds, became available. The second model was used to propose 7 more compounds, and these were designed, synthesized and tested. In addition, an evaluation of errors in the models led to an adjustment in the treatment of the experimental data for the highest affinity compounds. Finally, the third additivity model (Model 3) was constructed based upon the measured pKi values of the 61+39+6=106 compounds that had been studied experimentally. This third model provides the current best estimates of the affinity contributions of the various substituents.

Additional calculations were carried out to compare the various regression methods with each other and to compare the additivity model with traditional descriptor-based QSAR.

3.1 Additivity Models

3.1.1 First additivity model and cycle of inhibitor design

The first additivity model was constructed by using ordinary least-squares regression to fit 61 molecules' pKi values to Equation 1, with molecule 1 (Ki=0.1 nM)5 as the reference compound. Parameters for 19, 6 and 8 substituents at the R1, R2 and R3 positions, respectively, were fitted. A plot of the fitted vs. observed pKi values (Figure 2) shows that there is a good fit of the data to Equation 1. Some of the data points have zero residual because their corresponding molecules contain the only instances of their R1 substituents (Table 1), and so the value of their corresponding R1 substituent parameters take on values that yield zero error.

Figure 2
Comparison with experiment of calculated pKi values for 61 (fitted) plus 4 (predicted) HIV protease inhibitors (Model 1). The filled square corresponds to the reference compound, the unfilled squares indicate compounds with only one example of a given ...

The fitted parameters listed in Table 3 approximately quantify the change in pKi from 10 (Ki = 0.1 nM) upon replacing a substituent in the reference compound by the listed substituent. These values range from 1.61 (R1 substituent 6) to -3.82 (R3 substituent 7). The bootstrap-derived 95% confidence limits' ranges vary from 0.78 to 3.22 with median 1.16, and indicate that comparison of the various substituent parameters must be approached with caution due to the large uncertainties in these values.

Table 3
Substituent parameters obtained from least-squares fitting of pKi values for 61 molecules (Model 1). The cyclic carbamate R1 substituents are those above the dashed line.

The parameters for all of the CC substituents (R1 substituents 1-6) are greater than or equal to zero. This observation reinforces the general observation of the potency of compounds that incorporate this moiety.5 The parameters from this least-squares fit indicate that R1 substituent 6 is the most potent of these moieties. In contrast, only five of the 13 non-CC R1 substituents have fitted parameters close to or greater than zero. Of these, three (7, 13 and 14) are found in only one molecule each, and the other two (9 and 10) are found in only two molecules.

The fitted parameters in Table 3 show that there is one clearly preferred moiety for the R2 position, substituent 1. This group's substituent parameter is more than one log unit greater than those of the other R2 substituents, indicating that replacement of R2 substituent 1 with any of the other R2 groups would be expected to reduce binding affinity by more than one order of magnitude.

The most potent contributor to binding affinity at the R3 position, substituent 1, again is the reference substituent for this position. Three other R3 substituents decrease binding by less than an order of magnitude relative to the corresponding reference substituent, i.e. their fitted parameter substituents are greater than -1. Two of these three moieties contain an oxygen atom at the 4-position of the phenyl ring, as does the reference substituent. The other is the 4-anilino group, which is also found in the potent HIV protease inhibitors Amprenavir28 and Darunavir.29

Compounds containing four of the five most potent non-CC R1 substituents (groups 7, 9, 10 and 13), plus the relatively potent reference R2 and R3 substituents, were proposed and their pKi values predicted from additivity considerations (Table 4). These compounds were subsequently synthesized and tested, and were found to have sub-nanomolar affinity (Table 4). This result contrasts with the corresponding training set molecules for which the most potent non-CC inhibitor is molecule 56 (CARB-AD378) which has a Ki value of 23.9 nM (pKi 7.62). This represents a successful application of the additivity method to design molecules with increased affinity. On the other hand, the measured binding constants of two of these compounds are more than an order of magnitude less than their corresponding predictions. Also, only two of the compounds' pKi values lie within their respective predictions' 95% confidence limit.

Table 4
Comparison of predicted and observed pKi values for four molecules not present in the 61 molecule training set (Model 1). The predictions, including the bootstrap estimates of the 95% confidence intervals, were made prior to synthesis and testing of the ...

3.1.2 Second additivity model and cycle of inhibitor design

In the course of this research, 36 additional molecules from a separate inhibitor design project were synthesized and tested, the MIT-2 library from reference 7. Some of the substituent moieties in these molecules were not present in the 61-molecule training set used in the first additivity model, so another fit of parameters was performed. Additionally, the observed values of three of the molecules in Table 4 were included in this new training set. (Compound 37 had not yet been synthesized as was therefore not included at this stage.) The fit of this set of 100 molecules is shown in Figure 3 and the substituent parameters are presented in Table 5.

Figure 3
Comparison with experiment of calculated pKi values for 100 (fitted) plus 6 (predicted) HIV protease inhibitors (Model 2). The filled square corresponds to the reference compound, the unfilled squares indicate compounds with only one example of a given ...
Table 5
Substituent parameters obtained from least-squares fitting of pKi values for 100 molecules (Model 2). Cyclic carbamate R1 subsituents are above the dashed line. Parameters (1) from Model 1 (Table 3) are included for comparison; (2) indicates data for ...

The parameter values for the R1 substituents 9, 10 and 13 each noticeably decrease from their respective values in the first additivity model, presumably reflecting the inclusion of pKi values lower than the values predicted from the first set of parameters. The additivity analysis also identified three of the new substituents as potent contributors to inhibitor binding: R1 substituent 22 (parameter value 0.23); R2 substituent 8 (parameter value 0.23) and R3 substituent 9 (parameter value 0.40). The substituents identified as contributing to high affinity were incorporated into a new set of additivity-designed compounds (Table 6). Some of these compounds contained the most potent non-CC substituents in the R1 position (substituents 7, 13 and 14).

Table 6
Comparison of predicted (Model 2) and observed pKi values for seven molecules not present in the second training set.

Others contained R1 substituent 6, the most potent of the CC substituents according to this second additivity model (Table 5). Comparison of the predicted pKi values with those obtained from experiment are shown in Table 6 and Figure 3. The pKi value of compound 37, which had not been synthesized at this point, also was predicted using the updated substituent parameter values.

The observed pKi values of the newly synthesized compounds are all less than predicted (Table 6), much as observed for the previous set of predictions (Table 4). The predictions for the non-CC compounds deviate from the observed values by -0.49 to -1.36. Two of these four compounds, 39 and 51, are more potent than nearly all of the non-CC training set inhibitors (Table 1). Interestingly, these two compounds are the best predicted of the new compounds and their observed pKi values are inside the 95% confidence intervals of the corresponding predictions. The observed pKi values of the three CC compounds are less than predicted by more than one unit; i.e., the observed Ki values are greater than predicted by more than an order of magnitude. This unexpected result is considered in more detail later in the next section.

3.1.3 Analysis of first and second additivity models Overestimation of affinities

As noted above, the pKi values predicted with the additivity model consistently exceeded the observed affinities, yielding inaccurate results in particular for compounds containing R1 substituent 6. Additional calculations were performed to address this issue.

The training set compounds with R1 substituent 6 are all of high affinity (Table 1) and this contributes to the high value of the fitted parameter for this substituent (Table 5). This in turn appears to lead to excessively high pKi predictions for the three designed compounds which contain this substituent (Table 6). It has been reported that the binding constants of inhibitors whose Ki values are below approximately 10 pM (pKi > 11) cannot be reliably measured using the standard fluorometric assay.30 To crudely account for this uncertainty in the measured pKi values, a new additivity model was constructed in which training set pKi values greater than 11 were replaced with the range 11-13. Note, however, that no new compounds were added to the dataset. New predictions were made for the compounds in Table 6; Table 7 shows the results. The predictions made with the new model are closer to the experimentally determined pKi values than those from the original model, and the improvements are pronounced for the three compounds that contain R1 substituent 6. Of these, the errors in two of the three predictions are within the range observed for the non-CC compounds. This analysis suggests that one reason for overprediction of affinities in the first and second additivity models is experimental uncertainty in the highest measured pKi values. Allowing for this uncertainty, as done here, improves the predictions.

Table 7
Comparison of original Model 2 predictions with new Model 2 predictions where training data account for the imprecision of high pKi data by using a range.

Even after allowing for these uncertainties, however, the predictions for two of the four non-CC compounds remained outside the calculated 95% confidence intervals. In order to investigate these persistent overestimates, we constructed two artificial training sets and corresponding test sets of molecules. The first training and test sets (labeled A) consist of the compounds with R1 substituents 1-5, with half of the compounds containing R1 substituent 3 placed in the test set (Table 8). The second training and test set (labeled B) were generated by exchanging the first sets' training and test compounds that contain R1 substituent 3. The pKi values of the molecules in the first test set are all over-predicted, similar to what was observed for the additivity-designed molecules (Tables 4, ,66 and and7),7), but the pKi values of three of the four molecules in the second test set were underpredicted. This change in the general direction of prediction was accompanied by a decrease in the fitted parameter value for the common R1 substituent 3. These results indicate that over-prediction of affinity is not a problem intrinsic to the additivity method, but varies with the compounds in the training set.

Table 8
Least-squares fitting and subsequent pKi prediction for two different permutations of training and test set molecules. Ridge regression versus ordinary least squares regression

Least-squares fitting can over-fit the training set data and so the subsequent predictions can vary significantly with small changes in the data such as changes due to measurement error or differences in training set composition. The technique of ridge regression attenuates this sensitivity by reducing the magnitudes of the fitted parameters and decreases the risk of over-fitting the training data. Accordingly, this method was tested for the various data sets (compounds whose pKi values were predicted using Models 1 and 2, and the test sets from Models A and B) with different values of the ridge parameter, λ, in order to determine its effect on the quality of the predictions. Note that the lowest value used, λ =10-12, reduces the method to ordinary least-squares regression with mean-centered and scaled data, and the predictions from this approach are similar to those from ordinary least-squares regression with a reference compound (Table 10).

Table 10
Comparison of pKi predictions made by least-squares regression and ridge regression approaches to fitting the additivity model. The test sets are the two sets of molecules for which predictions from Models 1 and 2 were made, and also test sets A and B. ...

Assessment of the sum of squared coefficients from the trained ridge regression models versus the training set sum-squared residuals suggests optimal values of the ridge parameter, λ, to be either 1 or 3, as these choices reduce the sum of squared coefficients while only slightly increasing the sum-squared residual for the training set. The test set RMS prediction error for λ=1 is approximately the same as our original, reference molecule-based additivity-based method, and it is generally slightly smaller for λ =3 (Table 10). As λ is increased to 10 and 100, however, the errors rise. These results suggest that using ridge regression instead of ordinary least-squares regression may lead to a slight improvement in the pKi predictions, given an appropriate choice of λ.

3.1.4 Third and final additivity model

A third additivity model was constructed based on the full set of 106 compounds and with all pKi values greater than 11 treated as ranges of 11-13 (see above). This is arguably the best model because it uses all the available data and includes the improved treatment of experimental uncertainty. As summarized in Table 11, the parameters for R1 substituents 6, 7 and 14 decrease relative to the second model because of the inclusion of pKi values below those predicted with the second additivity model. The parameter value for R1 substituent 13 falls less than the other three, presumably because the prediction for the compounds containing this substituent was more accurate than the predictions for the other six compounds (Table 6). The best of the non-CC R1 substituents are those numbered 7, 13 and 22; and the best of the CC R1 substituents are 3, 5 and 6.

Table 11
Third additivity model (Model 3), with substituent parameters obtained from least-squares fitting of pKi values from 107 molecules and with pKi values above 11 replaced by the range 11-13. Parameters which are underlined are for substituents not used ...

The parameter value of R2 substituent 8 fall from 0.23 to -0.01, relative to the second model, suggesting that this substituent's contribution to binding is approximately the same as that of R2 substituent 1. These two moieties are the most potent contributors to binding in the R2 position. The best of the R3 substituents are 1, 6 and 9. There are no major changes for R3 substituent parameters between the second and third additivity models.

3.2 Additivity Approach versus Descriptor-Based QSAR

As an alternative to the substituent additivity paradigm, QSAR calculations were performed using the ridge regression and partial least squares techniques, with the molecules represented via global molecular descriptors based on chemical structure. Two sets of partial least-squares and ridge regression calculations were performed: one pair of calculations using a genetic algorithm to find a small number of descriptor principal components (usually six) that maximize a cross-validation-based fitness function; and a second pair of calculations using all 39 descriptor principal components. While there are noticeable differences in these pKi predictions, relative to those of the additivity method, there does not seem to be a uniform improvement in prediction accuracy (Table 12). Thus, these traditional QSAR models do not appear to provide any advantage over the additivity approach for this system.

Table 12
Comparison of pKi predictions via additivity with predictions from whole molecule descriptor-based QSAR methods. The tests set are the same as in Table 10.

3.3 Crystallographic Correlations

3.3.1 Structural basis of additivity

Crystallographic data on a number of HIV protease ligand complexes (molecules 15, 28, 31, 32, 34, 38, 42, 50, 52, 56, 57, 75, 82, 86 and 94),5,7,8,unpublished enable the additivity approximation be rationalized on a structural basis. Thus, as shown in Figure 4a, substituents at the three positions of the scaffold occupy separate regions of the protease and do not contact each other. Furthermore, the common chemical scaffold of the various inhibitors (Figure 1) remains in essentially the same pose for all of the inhibitors. The lack of substituent-substituent contacts and of substituent-induced shifts of the scaffold provides a structural rationale for the reliability of the additivity model. It is likely that additivity would have been less reliable if the phenyl group of the scaffold had represented a fourth variable substituent, because some of the R1 substituents, notably the cyclic carbamates, interact with the phenyl group (Figure 4a). Interestingly, the main-chain of Gly 48A in the protease adopts a somewhat different conformation for the CC versus the non-CC inhibitors, possibly to avoid a clash with the cyclic carbamate group (Figure 4B). If this shift in the conformation of the protease in response to the CC substituents had extended to other protease subsites, it might have generated large deviations from independent substituent additivity. In fact, however, the conformational shift is local to the R1 group, so it can be accounted for by the fitted affinity contributions of the R1 substituents. Thus, the substituent parameters can incorporate more effects than simply the direct interactions of ligand substituents with the protease.

Figure 4
Superimposed HIV protease inhibitors. A. Molecules 38 (magenta), 42 (brown), 56 (green-cyan), 28 (cyan), 32 (blue), 75 (cobalt blue), 82 (orange), 94 (pink), 15 (violet), 57 (green-yellow), 31 (off-white), 34 (gold), 86 (grey), 50 (lime green), 52 (salmon ...

3.3.2 Structural analysis of substituents and affinities R1 substituent

No single chemical feature is common to the most potent R1 substituents, but a number of them contain a carbonyl group that forms a hydrogen bond to the main chain nitrogen of Asp 29A. These include the CC substituents (substituent parameters -0.02 – 0.70) whose other contacts are non-polar in nature. The non-CC substituents 7 and 22 (parameter values 0.30 and 0.37, respectively) also make hydrogen bonds to the main chain nitrogen of Asp 29A and the main chain oxygen of Gly 48A similar to those made by the bound substrate.9 The less efficacious substituent 20 (parameter value -0.46) lacks a group to make the second of these two hydrogen bonds although its atoms occupy similar positions to those of R1 substituents 7 and 22. The R1 substituent 17 makes hydrogen bonds to the same two protease atoms as do substituents 7 and 22, but with a different geometry, a consequence of its amide group being in the opposite orientation to those in R1 substituents 7 and 22. (Table 2). This suggests that the precise geometry of these hydrogen bonds may be important for binding.

R1 substituents 13, 14 and 21 (substituent parameter values 0.71, -0.15 and -0.12, respectively), which are all substituted phenyls, also tend to generate potent inhibitors. Crystal structures of representative bound ligands containing these substituents, compounds 50 (MIT-1-KK80), 52 (MIT-1-KK81) and 82 (MIT-2-AD93),7 show that they make mainly hydrophobic contacts with the protease, although the hydroxyl groups of the latter two also make hydrogen bonds with the protease. The substituent parameters of several less potent substituents can also be understood from an examination of these three crystal structures. The low affinity parameters of R1 substituents 12 and 15 (substituent parameters -0.74 and -1.09) which are chemically similar to 13, 14, and 21, can be rationalized in terms of expected steric clashes of substituents on their respective benzene rings with the protease. The low values of the substituent parameters of R1 substituents 8 and 18 (-0.88 and -2.57, respectively) can be rationalized by the lack of nearby hydrogen bond donor moieties from the protease in appropriate geometries (unpublished model building). The R1 substituent 9, which is nearly as potent as the larger R1 substituent 1, interacts with the protease through hydrophobic contacts. R2 substituent

The most potent of the R2 substituents studied in this work, substituents 1, 7 and 8, contain no rings or heteroatoms. Examination of the crystal structures of relevant HIV protease-inhibitor complexes (compounds 15, 28, 31, 32, 34, 75, 82, 86, and 94) shows that the corresponding atoms of these substituents closely superimpose. In contrast, substituent 4, which is a cyclized version of R2 substituent 1, occupies a similar region of space as substituent 1, but has a slightly smaller volume. Replacement of substituent 1 with substituent 4 leads to a reduction in affinity of approximately one order of magnitude, suggesting that the precise shape complementarity of this group is important for inhibitor binding. R3 Substituent

The crystal structures of complexes with compounds 15, 28, 31, 32, 34, 75, 82, 86 and 94 shows that the three R3 substituents with the largest parameter values, substituents 1, 5, 6 and 9, each have an atom at the 4-position of a six-membered ring which can accept a hydrogen bond from Asp 30B. (This should also occur for substituent 5, which is not represented in any of the crystal structures). The next best R3 substituent is the 4-anilino group (substituent 4) which is found in the HIV protease inhibitors Amprenavir28 and Darunavir.29 Interestingly, the 4-anilino group is associated with superior pharmacokinetics relative to substituents 1 and 6,29 and this advantage apparently outweighs a small sacrifice in affinity. This reminds us that binding affinity is not the sole criterion for selection of substituents in drug candidates.

4 Discussion

The approximation of substituent additivity is found to be informative and useful for a series of HIV protease inhibitors having a common chemical scaffold. Statistical analysis yielded significant least-squares fits of measured and modeled pKi values, and the fitted substituent parameters enabled the design of new inhibitors with sub-nanomolar binding affinities.

A major virtue of the additivity method is its obvious interpretability and its consequent utility for designing molecules that consist of new combinations of substituents already tested in a partial combinatorial set of inhibitors. The non-CC compounds that were designed from additivity considerations used R1 groups that were originally found in inhibitors with only moderate potency, yet these new compounds bind the protease target with better than 1 nM affinity, representing a successful application of the methodology. Application of additivity to design CC compounds with increased affinity was less successful, but it should be noted that two of the three designed compounds have affinities close to the limit of measurement accuracy, 10pM (pKi = 11).

The prediction errors of the CC compounds were generally larger than those of the non-CC compounds. However, the magnitude of these errors decreased when the highest training set pKi values were replaced by ranges that more realistically expressed the uncertainty of these experimental data. This approach effectively capped the pKi values at 11 and reduced the predicted affinities to values closer to those observed experimentally. The prediction error of compound 36 with observed pKi = 9.63, was still relatively large, however. This compound differs only in its R2 substituent from that of the more potent compound 32 (pKi > 11);5 the R2 substituents are 8 and 1 respectively, which only differ by a methyl group. From additivity considerations, one would expect their pKi difference to be very small, as the value of the additivity parameter for the replacement of the R2 substituent 1 with substituent 8 is -0.01 (Table 11). The differing consequences of the methyl group for these two pairs of compounds point to a breakdown of additivity, or perhaps experimental uncertainties greater than supposed.

The generality of the present conclusions regarding additivity are necessarily limited by the number and accuracy of the available data. For example, some of the models described here were generated using a training set that includes only one or a few instances of a given substituent. The affinity contributions assigned to such substituents are therefore not optimally tested and, to the extent that the system is non-additive, the apparent accuracy of the additive model will depend upon which specific compounds appear in the training and test sets, as reported in Section On the other hand, the tests reported here are strengthened by the fact that they involve blind predictions of the affinities of new compounds. It is also worth emphasizing that, although a more redundant data set would make the statistics more informative and reliable, it would not necessarily improve the additivity of the models, which derives not from the statistics, but from the physics of the protein-ligand system (see below). In fact, if the system were perfectly additive, then a training set with only M1 + M2 + M3 compounds, where Mi is the number of different substituents at site Ri, would allow us to predict the affinities of all M1×M2×M3 −(M1 + M2 + M3) other compounds in the combinatorial library with perfect accuracy. Finally, it is worth emphasizing that experimental uncertainty can lead even an additive system to appear nonadditive, and it is worth accounting explicitly for least the larger known uncertainties when constructing and testing an additive model (Section

As just discussed, the accuracy of the pKi predictions can depend upon the molecules in the training set. Thus, interchanging four molecules between test and training sets with a total of 27+4=31 compounds led to noticeably different accuracies. In one case, the predicted pKis were consistently too high, and in the other the pKis were too low. In addition, the parameter of the R1 substituent common to the interchanged molecules was assigned values differing by 0.45 pKi units. These artificial training and test sets exhibit behavior similar to that of the previous sets of molecules, which displayed consistent over-prediction of pKi values and changes in parameter values upon changes to the training set. Thus, the model generated by a given set of pKi values should necessarily be seen as provisional and should be updated upon obtaining relevant new data. We sought to reduce the sensitivity of the predictions to the choice of training set data by using ridge regression instead of ordinary least squares. The apparently optimal value of the ridge parameter, λ = 3, the largest value tried that did not increase the sum-squared residuals by an excessive amount, generally led to only a minor improvement in prediction accuracy. This suggests that the use of ordinary least-squares regression is not a major reason for errors in the original predictions. Still, for three of the four test sets, increasing the value of λ did somewhat improve the pKi predictions (Table 10).

The accuracy of the additivity approach was furthermore compared to that of standard descriptor-based QSAR methods. The results were mixed, with each method yielding slightly better accuracy in some cases but not in others. However, the additivity method is arguably superior in the present application because it provides clear guidance on the contributions of individual substituents to affinity. Thus, it is trivial to use additivity information to propose new molecules for synthesis, albeit with the restriction that the substituents must have been included in at least one training set molecule. Furthermore, the additivity-based method can distinguish between chemically similar substituents, such as R1 substituents 14 and 15, whose contributions to binding affinity differ by approximately one order of magnitude. In contrast, a QSAR method which represents molecules by whole-molecule descriptors is less well suited to make such fine distinctions.

Why does the additivity approximation work so well for the present ligand-protein series, and will it be equally applicable to other systems? These question may addressed by considering some of the physical requirements that must be met in order for the simple additivity model to work, as follows:

  1. The substituents should not contact each other other in either the bound or free state, or else changing R1, for example, would change the relative affinities of ligands with different R2 substituents. This requirement is most easily met by a large ligand with small, widely separated substituents.
  2. Changing one substituent should not shift the position of the other substituents in the binding site. For example, changing from a small R1 substituent to a bulky on could generate nonaddivity by shifting the entire ligand and thereby forcing R2 to move away from its original position. This requirement is perhaps most easily met if all the substituents are similar in size, the ligand has a flexible scaffold, and small changes in the conformation of the scaffold do not lead to significant changes in its interaction with the protein. (For example, the flexible part of the scaffold might be solvent-exposed.)
  3. Changing R1 must not cause a change in protein conformation that propagates to the interaction site of R2, because this, too, would alter the interaction of R2 with the protein. Meeting this requirement may be facilitated if all the substituents at a given site make similar interactions with the protein, and if the protein is very rigid or, perhaps, so soft that conformational shifts remain local.

There may also be nonadditive effects on conformational fluctuations, and hence the entropy, of the ligand and the protein, but it is not clear whether avoiding these interactions will place additional requirements on the system.

The protein-ligand series considered here does, arguably, go a long way to meeting these requirements, given the flexibility of the scaffold (Figure 1) and the fact that the substituents bind in discrete regions of the protein without contacting each other (Figure 4). In addition, the substituents at each site are fairly similar in size, and the largest variations in size occur at R1 and R3, which lie at the two entrances of the active site tunnel and therefore can accommodate substituents of varied size without steric clashes or protein reorganization. The available crystal structures bear out the expectation that changes at one ligand site produce minimal perturbation at the other sites. The additivity approximation may well be useful for other protein-ligand systems that meet the requirements laid out above.

5 Summary

The concept of substituent additivity proves to be a pleasingly simple and practical way to use existing binding affinity data for the design of new HIV protease inhibitors with high affinity and a good fit to the substrate envelope. The present study also has broader significance, because there are few other articles in which an additivity analysis was followed up and tested by synthesis of compounds containing the most favorable substituents (but see ref 31). The accuracy of initial predictions with the additive model motivated a careful look at this methodology, highlighting the provisional nature of the mathematical models obtained and the importance of accounting for measurement imprecision for high affinity compounds. Correlations with crystallographic data rationalize the observed additivity and the contributions of the various substituents to binding affinity. Analysis of the physical requirements for substituent independence provide guidance to the identification of other systems where additivity analysis is likely to be useful.

6 Experimental Section (Chemistry)


Proton nuclear magnetic resonance (1H NMR) and carbon nuclear magnetic resonance (13C NMR) spectra were recorded with a Varian Mercury 400 MHz NMR spectrometer operating at 400 MHz for 1H and 100 MHz for 13C NMR. Chemical shifts are reported in ppm (δ scale) relative to the solvent signal, and coupling constant (J) values are reported in Hertz. Data are represented as follows: chemical shift, multiplicity (s = singlet, d = doublet, t = triplet, q = quartet, m = multiplet, dd = doublet of doublet, br = broad), coupling constant in Hz, and integration. High resolution mass spectra (HRMS) were recorded on Waters Q-TOF Premier mass spectrometer by direct infusion of solutions of each compound using electrospray ionization (ESI) in positive mode. All reactions were performed in oven-dried round bottomed or modified Schlenk flasks fitted with rubber septa under N2 atmosphere, unless otherwise noted. All final coupling reactions were carried out at 0.5 mmol scale unless stated otherwise. Air- and moisture-sensitive liquids and solutions were transferred via syringe or stainless steel cannula. Organic solutions were concentrated under reduced pressure by rotary evaporation at 35–40 °C. Flash and column chromatography was performed using silica gel (230–400 mesh, Merck KGA). Analytical thin-layer chromatography (TLC) was performed using silica gel (60 F-254) coated aluminum plates (Merck KGA) and spots were visualized by exposure to ultraviolet light (UV) and/or exposure to an acidic solution of p-anisaldehyde (anisaldehyde) followed by brief heating. Dichloromethane was dried over P2O5 and distilled, tetrahydrofuran (THF) was distilled from sodium/benzophenone, and anhydrous N,N-dimethylformamide was purchased from Aldrich and used as received. All other reagents and solvents were purchased from commercial sources and used as received.

Typical procedure for the coupling reactions (Method A)

(5S)-3-(3-Acetylphenyl)-N-[(1S,2R)-2-hydroxy-3-[[(4-methoxyphenyl)sulfonyl][(2S)-2-methylbutyl]amino]-1-(phenylmethyl)propyl]-2-oxo-oxazolidine-5-carboxamide (35)

Excess oxalyl chloride was added to solid (S)-3-(3-acetylphenyl)-2-oxo-oxazolidine-5-carboxylic acid (0.125 g, 0.5 mmol) and the resulting mixture was stirred at room temperature overnight. The oxalyl chloride was removed by distillation under reduced pressure and residue dried under high vacuum for 30 minutes. A solution of the resulting acid chloride in dry THF (5 mL) was used in the coupling reaction. To an ice-cooled mixture of the Boc deprotected amine (0.5 mmol) in dry THF (5 mL) was added Et3N (0.15 mL, 1.1 mmol) followed by the slow addition of the acid chloride solution. After 15 minutes the reaction mixture was warmed to room temperature and stirred until reaction was complete (monitored by TLC). Small amounts of water and ethyl acetate were added and layers were separated. The organic extract was washed with saturated aqueous NaCl solution, dried (Na2SO4), filtered and evaporated. The residue was purified by flash chromatography on silica gel using ethyl acetate-hexanes (3:1) mixture as eluent to provide the target compound (0.30 g, 92%) as white solid: 1H NMR (400 MHz, CDCl3) δ 7.89 (t, J = 2.0 Hz, 1H), 7.83 (m, 1H), 7.77 (m, 1H), 7.76–7.72 (m, 2H), 7.52 (t, J = 8.4 Hz, 1H), 7.13 (dd, J = 8.4, 1.6 Hz, 2H), 7.03–6.98 (m, 4H), 6.86 (dt, J = 8.4, 1.2 Hz, 1H), 6.75 (d, J = 10.0 Hz, 1H), 4.80 (dd, J = 9.6, 5.6 Hz, 1H), 4.25 (m, 1H), 4.08 (t, J = 9.6 Hz, 1H), 3.92 (m, 1H), 3.87 (s, 3H), 3.65 (d, J = 2.4 Hz, 1H), 3.41 (dd, J = 9.6, 6.0 Hz, 1H), 3.20 (dd, J = 15.6, 9.6 Hz, 1H), 3.12–3.04 (m, 2H), 2.98 (dd, J = 15.2, 2.8 Hz, 1H), 2.82 (dd, J = 13.2, 7.2 Hz, 1H), 2.76 (dd, J = 13.6, 10.4 Hz, 1H), 2.65 (s, 3H), 1.62 (m, 1H), 1.52 (m, 1H), 1.11 (m, 1H), 0.90–0.86 (m, 6H); 13C NMR (100 MHz, CDCl3) δ 197.64, 168.60, 163.37, 153.04, 138.12, 138.08, 137.42, 129.79 (2C), 129.76, 129.72, 129.60 (2C), 128.63 (2C), 126.73, 124.77, 123.04, 117.56, 114.65 (2C), 72.40, 69.91, 57.61, 55.89, 53.87, 53.39, 48.25, 35.66, 33.74, 26.98, 26.65, 17.16, 11.26. HRMS (ESI) m/z: Calcd for C34H42N3O8S [M + H]+ 652.2693; found, 652.2714.

(5S)-3-(3-Acetylphenyl)-N-[(1S,2R)-3-[(6-benzothiazolylsulfonyl)[(2S)-2-methylbutyl]amino]-2-hydroxy-1-(phenylmethyl)propyl]-2-oxo-oxazolidine-5-carboxamide (36)

Coupling method B; solvent for flash chromatography: EtOAc-hexanes (4:1); yield: 0.310 g, 91%; white solid: 1H NMR (400 MHz, CDCl3) δ 9.22 (s, 1H), 8.49 (d, J = 1.6 Hz, 1H), 8.27 (d, J = 8.4 Hz, 1H), 7.93–7.90 (m, 2H), 7.82–7.76 (m, 2H), 7.52 (t, J = 8.4 Hz, 1H), 7.13 (dd, J = 8.4, 1.6 Hz, 2H), 7.02 (t, J = 8.0 Hz, 2H), 6.89–6.82 (m, 2H), 4.80 (dd, J = 9.6, 5.6 Hz, 1H), 4.26 (m, 1H), 4.08 (t, J = 9.6 Hz, 1H), 3.98 (m, 1H), 3.63 (d, J = 3.6 Hz, 1H), 3.43 (dd, J = 9.2, 5.6 Hz, 1H), 3.25 (dd, J = 15.2, 9.2 Hz, 1H), 3.16–3.05 (m, 3H), 2.93 (dd, J = 13.2, 6.8 Hz, 1H), 2.78 (dd, J = 13.6, 10.8 Hz, 1H), 2.65 (s, 3H), 1.66 (m, 1H), 1.51 (m, 1H), 1.12 (m, 1H), 0.90–0.86 (m, 6H); 13C NMR (100 MHz, CDCl3) δ 197.66, 168.69, 158.32, 155.89, 153.07, 138.09, 138.07, 137.37, 135.63, 134.66, 129.73, 129.58 (2C), 128.67 (2C), 126.79, 125.09, 124.84, 124.72, 123.04, 122.60, 117.58, 72.46, 69.94, 57.62, 53.88, 53.52, 48.26, 35.70, 33.73, 26.98, 26.67, 17.15, 11.28. HRMS (ESI) m/z: Calcd for C34H39N4O7S2 [M + H]+ 679.2260; found, 679.2287.

Typical procedure for the coupling reactions (Method B)

(2S)-2-(Acetylamino)-N-[(1S,2R)-2-hydroxy-3-[[(4-methoxyphenyl)sulfonyl](2-methylpropyl)amino]-1-(phenylmethyl)propyl]-propanamide (37)

To a solution of the N-[(1S,2R)-2-Hydroxy-3-[[(4-methoxyphenyl)sulfonyl](2-methylpropyl)amino]-1-(phenylmethyl)propyl]-carbamic acid tert-butyl ester5 (0.254 g, 0.5 mmol) in CH2Cl2 (15 mL) was added TFA (5 mL) and the mixture was stirred at room temperature for 1 hour. Solvents were removed under reduced pressure and the residue was dissolved in CH2Cl2, washed with 10% aqueous NaHCO3 solution, dried (Na2SO4), filtered, and evaporated under reduced pressure to provide the free amine as white solid. To an ice-cooled solution of this amine in a mixture of H2O-CH2Cl2 (1:1) (12 mL) were added N-Ac-Ala-OH (0.079 g, 0.6 mmol) followed by HOBt (0.092 g, 0.6 mmol) and EDCI (0.115 g, 0.6 mmol) under N2 atmosphere. The reaction mixture was stirred at 0–4 °C until the reaction was complete (monitored by TLC). A small amount of CH2Cl2 was added and layers were separated. The organic extract was washed with saturated aqueous NaCl solution, dried (Na2SO4), filtered and evaporated under reduced pressure. The residue was purified by flash chromatography on silica gel using CHCl3-MeOH (19:1) mixture as eluent to provide the target compound (0.235 g, 90%) as white solid: 1H NMR (400 MHz, CDCl3) δ 7.74–7.70 (m, 2H), 7.27–7.16 (m, 5H), 6.99–6.95 (m, 2H), 6.65 (d, J = 8.8 Hz, 1H), 5.87 (d, J = 7.6 Hz, 1H), 4.32 (m, 1H), 4.15 (m, 1H), 4.07 (d, J = 3.6 Hz, 1H), 3.86 (s, 3H), 3.85 (m, 1H, overlapping signal), 3.14–3.02 (m, 3H), 2.92–2.84 (m, 3H), 1.88 (s, 3H), 1.84 (m, 1H), 1.19 (d, J = 6.8 Hz, 3H), 0.87 (d, J = 6.8 Hz, 3H, overlapping signal), 0.86 (d, J = 6.4 Hz, 3H, overlapping signal); 13C NMR (100 MHz, CDCl3) δ 172.82, 170.29, 163.24, 138.05, 130.19, 127.72 (2C), 129.58 (2C), 128.65 (2C), 126.71, 114.57 (2C), 72.79, 58.93, 55.86, 54.20, 53.55, 49.13, 35.44, 27.41, 23.38, 20.34, 20.18, 18.27. HRMS (ESI) m/z: Calcd for C26H38N3O6S [M + H]+ 520.2481; found, 520.2461.

(2S)-2-(Acetylamino)-N-[(1S,2R)-2-hydroxy-3-[[(4-methoxyphenyl)sulfonyl][(2S)-2-methylbutyl]amino]-1-(phenylmethyl)propyl]-propanamide (39)

Coupling method A; solvent for flash chromatography: EtOAc-hexanes (3:2); yield: 0.235 g, 88%; white solid: 1H NMR (400 MHz, CDCl3) δ 7.74–7.70 (m, 2H), 7.27–7.16 (m, 5H), 6.99–6.95 (m, 2H), 6.61 (d, J = 8.8 Hz, 1H), 5.86 (d, J = 7.6 Hz, 1H), 4.33 (m, 1H), 4.16 (m, 1H), 4.02 (d, J = 3.6 Hz, 1H), 3.86 (s, 3H), 3.83 (m, 1H), 3.12–3.0 (m, 3H), 2.97–2.88 (m, 2H), 2.82 (dd, J = 13.2, 7.6 Hz, 1H), 1.89 (s, 3H), 1.60 (m, 1H), 1.44 (m, 1H), 1.20 (d, J = 6.8 Hz, 3H), 1.05 (m, 1H), 0.86–0.82 (m, 6H); 13C NMR (100 MHz, CDCl3) δ 172.81, 170.27, 163.24, 138.0, 130.11, 129.73 (2C), 129.60 (2C), 128.65 (2C), 126.72, 114.57 (2C), 72.73, 57.58, 55.86, 54.11, 53.54, 49.13, 35.45, 33.63, 26.79, 23.37, 18.35, 17.12, 11.33. HRMS (ESI) m/z: Calcd for C27H40N3O6S [M + H]+ 534.2638; found, 534.2630.

N-[(1S,2R)-2-Hydroxy-3-[[(4-methoxyphenyl)sulfonyl](2-methylpropyl)amino]-1-(phenylmethyl)propyl]-3-methyl-4,4,4-trifluoro-2-butenamide (41)

Coupling method A; solvent for flash chromatography: EtOAc-hexanes (1:1); yield: 0.240 g, 88%; gummy solid: 1H NMR (400 MHz, CDCl3) δ 7.70–7.66 (m, 2H), 7.32–7.20 (m, 5H), 6.99–6.95 (m, 2H), 6.14 (m, 1H), 5.95 (d, J = 8.8 Hz, 1H), 4.26 (m, 1H), 3.99 (d, J = 2.8 Hz, 1H), 3.87 (s, 3H), 3.11 (dd, J = 15.2, 8.8 Hz, 1H), 3.03 (dd, J = 14.0, 5.6 Hz, 1H), 2.99–2.90 (m, 3H), 2.79 (dd, J = 13.6, 6.8 Hz, 1H), 2.04 (d, J = 1.6 Hz, 3H), 1.83 (m, 1H), 0.90 (d, J = 6.4 Hz, 3H), 0.87 (d, J = 6.4 Hz, 3H); 13C NMR (100 MHz, CDCl3) δ 164.74, 163.34, 138.64 (q, J = 30.0 Hz), 137.69, 129.90, 129.67 (2C), 129.52 (2C), 128.90 (2C), 126.95, 124.82 (t, J = 272.6 Hz), 123.50 (q, J = 5.2 Hz), 114.60 (2C), 72.74, 59.09, 55.87, 53.92, 53.85, 34.90, 27.55, 20.34, 20.13, 12.17. HRMS (ESI) m/z: Calcd for C26H34F3N2O5S [M + H]+ 543.2141; found, 543.2100.

N-[(1S,2R)-2-Hydroxy-3-[[(4-methoxyphenyl)sulfonyl](2-methylpropyl)amino]-1-(phenylmethyl)propyl]-4-oxo-2-pentenamide (44)

Coupling method A; solvent for flash chromatography: EtOAc-hexanes (3:2); yield: 0.225 g, 89%; white foamy solid: 1H NMR (400 MHz, CDCl3) δ 7.70–7.66 (m, 2H), 7.31–7.19 (m, 5H), 7.99–6.93 (m, 2H), 6.91 (d, J = 15.6 Hz, 1H), 6.59 (d, J = 15.2 Hz, 1H), 6.22 (d, J = 8.8 Hz), 4.28 (m, 1H), 4.06 (d, J = 2.8 Hz, 1H), 3.93 (m, 1H), 3.87 (s, 3H), 3.10 (dd, J = 15.2, 8.4 Hz, 1H), 3.03–2.96 (m, 3H), 2.91 (dd, J = 14.0, 8.4 Hz, 1H), 2.79 (dd, J = 13.6, 6.8 Hz, 1H), 2.31 (s, 3H), 1.82 (m, 1H), 0.88 (d, J = 6.8 Hz, 3H, overlapping signal), 0.86 (d, J = 6.8 Hz, 3H, overlapping signal); 13C NMR (100 MHz, CDCl3) δ 197.81, 164.51, 163.33, 137.61, 137.21, 133.64, 129.89, 129.67 (2C), 129.52 (2C), 128.91 (2C), 126.98, 114.62 (2C), 72.65, 59.07, 55.88, 54.46, 53.75, 34.84, 29.11, 27.52, 20.35, 20.14. HRMS (ESI) m/z: Calcd for C26H35N2O6S [M + H]+ 503.2216; found, 503.2221.

3-Fluoro-N-[(1S,2R)-2-hydroxy-3-[[(4-methoxyphenyl)sulfonyl](2-methylpropyl)amino]-1-(phenylmethyl)propyl]-2-methyl-benzamide (49)

Coupling method A; solvent for flash chromatography: EtOAc-hexanes (1:1); yield: 0.220 g, 81%; white foamy solid: 1H NMR (400 MHz, CDCl3) δ 7.71 (m, 2H), 7.33–7.21 (m, 5H), 7.10–7.0 (m, 2H), 6.98 (m, 2H), 6.78 (dd, J = 7.6, 1.2 Hz, 1H), 5.98 (d, J = 8.8 Hz, 1H), 4.38 (m, 1H), 3.99 (m, 1H), 3.87 (s, 3H), 3.21–3.12 (m, 3H), 3.01–2.93 (m, 2H), 2.86 (dd, J = 13.2, 6.8 Hz, 1H), 2.07 (d, J = 2.0 Hz, 3H), 1.89 (m, 1H), 0.93 (d, J = 6.8 Hz, 3H), 0.89 (d, J = 6.8 Hz, 3H); 13C NMR (100 MHz, CDCl3) δ 169.43 (d, J = 2.9 Hz), 163.33, 161.54 (d, J = 243.9 Hz), 138.40 (d, J = 4.4 Hz), 137.89, 129.99, 129.68 (2C), 129.57 (2C), 128.91 (2C), 127.24 (d, J = 8.1 Hz), 126.98, 123.73 (d, J = 18.3 Hz), 122.21 (d, J = 3.7 Hz), 117.0 (d, J = 22.7 Hz), 114.62 (2C), 73.19, 59.19, 55.88, 54.40, 53.94, 35.14, 27.58, 20.35, 20.16, 11.32 (d, = 4.4 Hz). HRMS (ESI) m/z: Calcd for C29H36FN2O5S [M + H]+ 543.2329; found, 543.2319.

3-Fluoro-N-[(1S,2R)-2-hydroxy-3-[[(4-methoxyphenyl)sulfonyl][(2S)-2-methylbutyl]amino]-1-(phenylmethyl)propyl]-2-methyl-benzamide (51)

Coupling method A; solvent for flash chromatography: EtOAc-hexanes (3:1); yield: 0.240 g, 82%; white solid: 1H NMR (400 MHz, CDCl3) δ 9.21 (s, 1H), 8.44 (d, J = 1.6 Hz, 1H), 8.25 (d, J = 8.8 Hz, 1H), 7.88 (dd, J = 8.8, 1.6 Hz, 1H), 7.33–7.21 (m, 5H), 7.11–7.0 (m, 2H), 6.79 (dd, J = 8.8, 1.2 Hz, 1H), 6.03 (d, J = 8.4 Hz, 1H), 4.39 (m, 1H), 4.02 (m, 1H), 3.22 (m, 2H), 3.16 (dd, J = 14.0, 5.2 Hz, 1H), 3.10 (dd, J = 13.6, 7.6 Hz, 1H), 3.0 (dd, J = 14.0, 9.6 Hz, 1H), 2.94 (dd, J = 13.6, 8.0 Hz, 1H), 2.08 (d, J = 2.0 Hz, 3H), 1.68 (m, 1H), 1.49 (m, 1H), 1.09 (m, 1H), 0.88–0.82 (m, 6H); 13C NMR (100 MHz, CDCl3) δ 169.49 (d, J = 3.0 Hz), 161.55 (d, J = 243.9 Hz), 158.27, 155.83, 138.29 (d, J = 4.4 Hz), 137.75, 135.81, 134.63, 129.56 (2C), 128.97 (2C), 127.29 (d, J = 8.1 Hz), 127.09, 125.03, 124.68, 123.76 (d, J = 18.3 Hz), 122.47, 122.20 (d, J = 3.7 Hz), 117.0 (d, J = 22.8 Hz), 73.09, 57.67, 54.54, 53.76, 35.16, 33.77, 26.74, 17.12, 11.35 (d, J = 3.0 Hz), 11.33. HRMS (ESI) m/z: Calcd for C30H35FN3O4S2 [M + H]+ 584.2053; found, 584.2068.

3,4-Dihydroxy-N-[(1S,2R)-2-hydroxy-3-[[(4-methoxyphenyl)sulfonyl][(2S)-2-methylbutyl]amino]-1-(phenylmethyl)propyl]-benzamide (53)

Coupling method A; solvent for flash chromatography: EtOAc; yield: 0.160 g, 57%; white solid: 1H NMR (400 MHz, CDCl3 + 2 drops CD3OD) δ 7.63–7.59 (m, 2H), 7.25 (m, 5H), 7.18 (m, 1H), 7.14 (d, J = 2.0 Hz, 1H), 7.0 (dd, J = 8.8, 2.4 Hz, 1H), 6.91–6.87 (m, 2H), 6.80 (d, J = 8.0 Hz, 1H), 6.68 (d, J = 8.4 Hz, 1H), 4.28 (m, 1H), 3.92 (m, 1H), 3.82 (s, 3H), 3.16 (dd, J = 14.8, 4.0 Hz, 1H), 3.04 (d, J = 6.8 Hz, 1H), 2.96 (dd, J = 14.8, 7.6 Hz, 1H), 2.85 (d, J = 7.2 Hz, 1H), 2.16 (m, 2H), 1.60 (m, 1H), 1.36 (m, 1H), 1.01 (m, 1H), 0.83–0.78 (m, 6H); 13C NMR (100 MHz, CDCl3 + 2 drops CD3OD) δ 168.42, 168.34, 163.26, 148.64, 144.40, 138.03, 129.64 (4C), 128.80 (2C), 126.81, 125.70, 119.52, 114.78, 114.54 (2C), 72.95, 57.64, 55.83, 54.45, 53.66, 35.32, 33.61, 26.92, 17.05, 11.28. HRMS (ESI) m/z: Calcd for C29H37N2O7S [M + H]+ 557.2321; found, 557.2341.

Table 9
Comparison of fitted parameters obtained from the two training sets of Table 8.


We thank Drs. Madhavi Kolli for sharing crystal structures prior to publication and Sune Peterson for pointing out that our additivity analysis “reinvented the wheel” first devised by Free and Wilson. A generous gift of chiral epoxides from Kaneka USA is gratefully acknowledged. This publication was made possible by Grants No. GM066524 and GM061300 from the National Institute of General Medical Sciences (NIGMS) of the NIH. The contents of this publication are solely the responsibility of the authors and do not necessarily represent the official views of the NIGMS.

Nonstandard Abbreviations

total sum of squares
residual sum of squares
genetic algorithm
partial least squares
generalized cross-validation
cyclic carbamate
confidence interval
ridge regression
genetic algorithm/ridge regression
genetic algorithm/partial least squares


1. Joint United Nations Programme on HIV/AIDS (UNAIDS) and World Health Organization (WHO) AIDS epidemic update. Dec, 2007. http://www.unaids.org.
2. Richman DD. HIV chemotherapy. Nature. 2001;410:995–1001. [PubMed]
3. Pauwels R. New non-nucleoside reverse transcriptase inhibitors (NNRTIs) in development for the treatment of HIV infections. Curr Opin Pharmacol. 2004;4:437–446. [PubMed]
4. Turner SR. HIV protease inhibitors - The next generation. Curr Med Chem - Anti-Infective Agents. 2002;1:141–162.
5. Ali A, Reddy GSKK, Cao H, Anjum SG, Nalam MNL, Schiffer CA, Rana TM. Discovery of HIV-1 protease inhibitors with picomolar affinities incorporating N-aryl-oxazolidinone-5-carboxamides as novel P2 ligands. J Med Chem. 2006;49:7342–7356. [PubMed]
6. Reddy GSKK, Ali A, Nalam MNL, Anjum SG, Cao H, Nathans RS, Schiffer CA, Rana TM. Design and synthesis of HIV-1 protease inhibitors incorporating oxazolidinones as P2/P2′ ligands in pseudosymmetric dipeptide isosteres. J Med Chem. 2007;50:4316–4328. [PMC free article] [PubMed]
7. Altman MD, Ali A, Reddy GSKK, Nallam MN, Ghafoor S, Cao H, Chellappan S, Kairys V, Fernandes MX, Gilson MK, Schiffer CA, Rana TM, Tidor B. HIV-1 Protease inhibitors from inverse design in the substrate envelope exhibit subnanomolar binding to drug-resistant variants. J Am Chem Soc. 2008;130:6099–6113. [PMC free article] [PubMed]
8. Chellappan S, Reddy GSKK, Ali A, Nalam MNL, Anjum SG, Cao H, Kairys V, Fernandes MX, Altman MD, Tidor B, Rana TM, Schiffer CA, Gilson MK. Design of mutation-resistant HIV protease inhibitors with the substrate envelope hypothesis. Chem Biol Drug Des. 2007;69:298–313. [PubMed]
9. Prabu-Jeyabalan M, Nalivaika E, Schiffer CA. Substrate shape determines specificity of recognition for HIV-1 protease: analysis of crystal structures of six substrate complexes. Structure. 2002;10:369–381. [PubMed]
10. King NM, Prabu-Jeyabalan M, Nalivaika EA, Schiffer CA. Combating susceptibility to drug resistance: lessons from HIV-1 protease. Chem Biol. 2004;11:1333–1338. [PubMed]
11. King NM, Prabu-Jeyabalan M, Nalivaika EA, Wigerinck P, de Bethune MP, Schiffer CA. Structural and thermodynamic basis for the binding of TMC114, a next-generation human immunodeficiency virus type 1 protease inhibitor. J Virol. 2004;78:12012–12021. [PMC free article] [PubMed]
12. Prabu-Jeyabalan M, King NM, Nalivaika EA, Heilek-Snyder G, Cammack N, Schiffer CA. Substrate envelope and drug resistance: crystal structure of RO1 in complex with wild-type human immunodeficiency virus type 1 protease. Antimicrob Agents Chemother. 2006;50:1518–1521. [PMC free article] [PubMed]
13. Chellappan S, Kairys V, Fernandes MX, Schiffer C, Gilson MK. Evaluation of the substrate envelope hypothesis for inhibitors of HIV-1 protease. Proteins. 2007;68:561–567. [PubMed]
14. Free SM, Jr, Wilson JW. A Mathematical Contribution to Structure-Activity Studies. J Med Chem. 1964;7:395–399. [PubMed]
15. Press WH, Flannery BP, Teukolsky SA, Vetterling WT. Numerical Recipes in C: The Art of Scientific Computing. Cambridge University Press; 1992.
17. Efron B, Tibshirani RJ. An Introduction to the Bootstrap. Chapman & Hall/CRC; 1994.
18. Hoerl AE, Kennard RW. Ridge Regression: Biased Estimation for Nonorthogonal Problems. Technometrics. 1970;12:55–67.
19. Frank IE, Friedman JH. A Statistical View of Some Chemometrics Regression Tools. Technometrics. 1993;35:109–135.
20. Geladi P, Kowalski BR. Anal Chim Acta. 1986;186:l–17.
21. Todeschini R, Consonni V, Pavan M. DRAGON 2.1. Milano Chemometrics and QSAR Research Group; Milan, Italy: 2002.
22. Hoffman B, Cho SJ, Zheng W, Wyrick S, Nichols DE, Mailman RB, Tropsha A. Quantitative structure-activity relationship modeling of dopamine D(1) antagonists using comparative molecular field analysis, genetic algorithms-partial least-squares, and K nearest neighbor methods. J Med Chem. 1999;42:3217–3226. [PubMed]
23. Golub GH, Heath M, Wahba G. Generalised cross-validation as a method for choosing a good ridge parameter. Technometrics. 1979;21:215–223.
24. Orr MJL. Introduction to Radial Basis Function Networks. http://anc.ed.ac.uk/rbf/rbf.html.
25. Ho GJ, Emerson KM, Mathre DJ, Shuman RF, Grabowski EJJ. Carbodiimide-mediated amide formation in a two-phase system. A high-yield and low-racemization procedure for peptide synthesis. J Org Chem. 1995;60:3569–3570.
26. Matayoshi ED, Wang GT, Krafft GA, Erickson J. Novel fluorogenic substrates for assaying retroviral proteases by resonance energy transfer. Science. 1990;247:954–958. [PubMed]
27. Greco WR, Hakala MT. Evaluation of methods for estimating the dissociation constant of tight binding enzyme inhibitors. J Biol Chem. 1979;254:12104–12109. [PubMed]
28. Kim EE, Baker CT, Dwyer MD, Murcko MA, Rao BG, Tung RD, Navia MA. Crystal structure of HIV-1 protease in complex with VX-478, a potent and orally bioavailable inhibitor of the enzyme. J Am Chem Soc. 1995;117:1181–1182.
29. Surleraux DL, Tahri A, Verschueren WG, Pille GM, de Kock HA, Jonckers TH, Peeters A, De Meyer S, Azijn H, Pauwels R, de Bethune MP, King NM, Prabu-Jeyabalan M, Schiffer CA, Wigerinck PB. Discovery and selection of TMC114, a next generation HIV-1 protease inhibitor. J Med Chem. 2005;48:1813–1822. [PubMed]
30. Miller JF, Andrews CW, Brieger M, Furfine ES, Hale MR, Hanlon MH, Hazen RJ, Kaldor I, McLean EW, Reynolds D, Sammond DM, Spaltenstein A, Tung R, Turner EM, Xu RX, Sherrill RG. Ultra-potent P1 modified arylsulfonamide HIV protease inhibitors: the discovery of GW0385. Bioorg Med Chem Lett. 2006;16:1788–1794. [PubMed]
31. Dirlam JP, Czuba LJ, Dominy BW, James RB, Pezzullo RM, Presslitz JE, Windisch WW. Synthesis and antibacterial activity of 1-hydroxy-1-methyl-1,3-dihydrofuro[3,4-b]quinoxaline 4,9-dioxide and related compounds. J Med Chem. 1979;22:1118–1121. [PubMed]
PubReader format: click here to try


Related citations in PubMed

See reviews...See all...

Cited by other articles in PMC

See all...


  • BioAssay
    PubChem BioAssay links
  • Compound
    PubChem Compound links
  • PubMed
    PubMed citations for these articles
  • Substance
    PubChem Substance links

Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...