Cosolvent and Dynamic Effects in Binding Pocket Search by Docking Simulations

The lack of conformational sampling in virtual screening projects can lead to inefficient results because many of the potential drugs may not be able to bind to the target protein during the static docking simulations. Here, we performed ensemble docking for around 2000 United States Food and Drug Administration (FDA)-approved drugs with the RNA-dependent RNA polymerase (RdRp) protein of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) as a target. The representative protein structures were generated by clustering classical molecular dynamics trajectories, which were evolved using three solvent scenarios, namely, pure water, benzene/water and phenol/water mixtures. The introduction of dynamic effects in the theoretical model showed improvement in docking results in terms of the number of strong binders and binding sites in the protein. Some of the discovered pockets were found only for the cosolvent simulations, where the nonpolar probes induced local conformational changes in the protein that lead to the opening of transient pockets. In addition, the selection of the ligands based on a combination of the binding free energy and binding free energy gap between the best two poses for each ligand provided more suitable binders than the selection of ligands based solely on one of the criteria. The application of cosolvent molecular dynamics to enhance the sampling of the configurational space is expected to improve the efficacy of virtual screening campaigns of future drug discovery projects.

explicitly during the clustering, the remaining frames are simply added to the cluster with the cluster representative most similar to them. The number of frames utilised during the clustering was 11000, 11400 and 10400 for the water, phenol/water and benzene/water trajectories. To measure the quality of the clustering, four metrics are utilised. Two of these, namely the Davies-Bouldin index (DBI) and the pseudo-F statistic (pSF), aim to compare the intra-and intercluster variances, with a small intracluster and large intercluster variance indicating good quality clustering. 2 Since both of these scores are heavily influenced by the number of obtained clusters, the comparison of their absolute values between different MD trajectories has limited meaning. Instead, the trends arising in these metrics through the systematic variation of the clustering parameters can be interpreted to optimise these parameters. At this point it is useful to mention that low values of DBI and high values of pSF are desirable. The other two descriptors utilised to describe the quality of the clustering are the number of noise frames (frames not included in any cluster), and the number of clusters defined by the algorithm. The number of noise frames should clearly be kept low to avoid missing any important conformations only because it is visited very rarely and is therefore considered an outlier by the algorithm. The noise frames are automatically determined by the algorithm based on the supplied ε and k parameters and are defined as those frames which do not belong to any cluster and are not cluster centers themselves. Finally, while a high number of clusters is desirable, as it can result in a wider variety of protein conformations, the computational limitations of performing explicit docking calculations to each representative conformation with thousands of ligands should be kept in mind.
On Figure S1, the descriptors of the water solvated trajectory clustering can be seen for the case when only the alpha carbons are considered during the RMSD distance calculations.
As it can be seen, considering ε values larger than 1.2Å leads to a single obtained cluster ( Figure S1D), for which the clustering descriptor metrics cannot provide a meaningful value.
Since a single cluster is clearly not ideal, these large ε values do not need to be considered during the search for the optimal parameters. Focusing instead on parameter k, the most 2 significant differences between the different values for this parameter can be discovered on the pSF bahaviour ( Figure S1B). Here, the curves with k= 4 or 6, reaching their peak at ε=1.1Å, are clearly superior to the other two. Contrary, the variation of k has much more limited effects on DBI, as shown in Figure S1A. In fact, all DBI curves are more or less constant if ε is smaller than or equal to 1.1Å, at which point the DBI values drop rapidly and become zero at 1.2Å. Considering the behaviour of the pSF and DBI descriptors, values of ε=1.1Å and k=4 are promising candidates to be the optimal choice. Further advantages of this choice can be seen by looking at Figure S1C   solvated trajectory are plotted on Figure S2. Contrary to the water trajectory, in this case the number of clusters does not decrease to a single one at higher ε values, but instead is saturated at three, see Figure S2D. As a consequence, the DBI and pSF values do not vanish for these values of ε ( Figure S2A,B). Instead, a sudden shift can be observed between ε=1.0 A and 1.1Å for both metrics, while for values higher or lower than these, the curves are more or less constant. The facts that this shift is occurring near ε = 1.1Å, and that this value is already in the more favorable interval for both metric curves (small values of DBI and high values of pSF), highlight the attractiveness of choosing 1.1Å as the value of the ε parameter.
The pSF value at ε = 1.1Å of the curve associated with k = 4 is again one of the best along with k = 5. Figure S2C