Edinburgh Research Explorer A Collective Variable for the Rapid Exploration of Protein Druggability

,


Introduction
The development of a new medicine is a long and expensive process subjected to high attrition rates. 1 Over the last decades, around 60% of drug discovery projects failed to identify viable leads able to modulate adequately the activity of a protein target. 2Analyses of the sequenced human genome indicate that less than 50% of disease-involved genes code for druggable proteins. 3,4A protein target found to be nondruggable late in the drug discovery process is a significant waste of time and expense in the pharmaceutical industry.Accordingly, an early assessment of druggability offers the opportunity to focus efforts on tractable targets, thereby reducing the rate of failure. 5The concept of druggability is ambiguous because it has been used in many different fields to describe, in a different context, the properties of genes, proteins and ligands.In the context of structure-based drug design, protein druggability is often related to the ability of a therapeutic target to bind a drug-like small molecule, leaving aside many important facets of the drug discovery and development process such as selectivity, toxicology or pharmacokinetics. 3Since druggability is closely linked to the notion of binding site in this specific context, the terms "bindability" or "ligandability" have also been proposed as alternatives. 6,7is report focuses on the use of computation for structure-based evaluation of protein druggability.][10][11] As interest in druggability developed in the last fifteen years, more recent efforts have focused on correlating directly structural descriptors to druggability.An early effort was contributed by Hadjuk and coworkers. 12NMR-based fragment screening was used to develop a mathematical model for druggability measurements whereby structural descriptors were correlated to NMR hit-rates.The methodology is based on the assumption that a druggable cavity tends to bind more fragments than a nondruggable pocket.A second approach, called MAP POD , was published by Cheng et al. shortly after. 13The authors proposed a scoring function to assess the maximal affinity between a small molecule and a binding site based on physicochemical and geometric features.This study also introduced a new category of proteins that are neither 'druggable' nor 'nondruggable', but are instead "difficult" to target with small molecules.The suggestion was that this category of proteins should be targeted with highly polar molecules administrated as pro-drugs.These early contributions have paved the way for a similar class of computational methods that aim to detect and evaluate potential binding sites at protein surfaces.For instance, the public dataset compiled for MAP POD was used to parameterize Dscore, a druggability function coupled with the pocket detector SiteMap. 14,15Dscore is a simple linear combination of three descriptors reflecting the volume, enclosure and hydrophobicity of the binding site.Schmidtke et al.   have recently developed a fast methodology based on a new publically accessible dataset. 16The approach features a logistic regression analysis to extract local and global hydrophobic descriptors of a protein pocket.Another recent structure-based approach published in the field is the Drugpred method. 17e Drugpred scoring function is based on a large freely accessible non-redundant protein dataset and was derived using partial least-squares projection.Drugpred appears to be less sensitive to small binding site structural modifications that do not dramatically affect pocket properties. 17e above described methods were designed to assess the druggability of a crystallographic protein structure.However, it is well known that sometimes a few local structural rearrangements around a protein binding site can profoundly influence the binding affinity of a small molecule to its target. 18,191][22] One of the first method based on classical molecular dynamics simulations was published by Seco et al. 20 In this grid-based approach, an explicit restrained MD simulation of a protein is performed in the presence of a given concentration of isopropyl alcohol.The binding propensities of the probe at the protein surface are then back-computed to evaluate spatially resolved binding free energies.A similar protocol was recently applied on different systems using several diverse probes. 22The authors showed that probe molecules could induce both local and global structural rearrangements of the protein, leading to increases in target druggability.Nevertheless a frequent concern with these techniques is that the observed conformational changes reflect denaturation of the protein due to high probe concentrations.Thus judicious use of positional restraints is required to limit the occurrence of undesirable conformational changes.Also, probe diffusion necessary to compute binding propensities in buried cavities can be very slow with standard MD approaches.
To overcome the limitations of current MD based druggability prediction methods, this report introduces the JEDI algorithm (''Just Exploring Druggability at protein Interfaces'').JEDI has been designed to evaluate protein druggability "on-the-fly" during MD simulations without using organic probes or protein restraints.The druggability function relies on a set of geometric parameters describing the volume, the enclosure and the hydrophobicity of a binding site.The JEDI scoring function is fast, continuous and differentiable.Accordingly, the JEDI druggability descriptor can be used to construct artificial druggability potentials that are designed to bias sampling of protein binding site conformations similar to a training set with the aid of a druggability force applied during a MD simulation.JEDI has been implemented in the software PLUMED 1.3 to enable biased molecular dynamics simulations with a variety of free-energy calculations techniques and diverse popular MD engines. 23The methodology was parameterized using the freely accessible Druggable Cavity Directory (DCD) dataset. 16The sensitivity of the method to binding site conformational changes was tested with a compiled dataset of cryptic binding sites.Detailed druggability assessments have also been performed on the fly during unbiased and biased MD simulations of a test protein, VHL.This demonstrated the potential for JEDI analyses to detect cryptic druggable binding sites in proteins and to deliver conformations suitable as input for subsequent docking calculations.

Methods
Datasets.Protein structures were taken from the Non Redundant Druggability Dataset (NRDD) in the Druggable Cavity Directory compiled by Schmidtke et al. 16 A set of 63 unique proteins has been used to parameterize the JEDI scoring function.Each protein has been assigned by the authors of the original study an experimental druggability value from 1 to 10 (from less druggable to more druggable) according to its capability to bind a drug-like compound.The dataset can be further divided into three categories: non-druggable (DCD score 1 to 4), difficult (DCD score 5 to 7) and druggable (DCD score 8 to 10).
In order to benchmark JEDI against an existing methodology, druggability calculations were performed on the energy-minimized structures of the training dataset using the program fpocket. 16,24A detailed list of the dataset is given in the supporting information, including druggability scores obtained with both approaches (SI Table S1).A validation dataset, called the hidden pocket dataset, has also been compiled.Each protein in this dataset is represented by two different structures that exhibit conformational variability in the binding site region that correlates with variations in the binding affinities of known ligands.
Protocol overview.JEDI is a grid-based approach.The methodology includes three major steps (Figure 1A).First, a region of interest where the druggability evaluation will be conducted must be defined.This area can be located anywhere in the protein structure in principle, but in this report efforts are focused on evaluating the druggability of regions known to contain a binding site.Thus spatial regions to analyze were defined from the position of known ligands.A large 3D cubic grid with 1.5 Å spacing between grid points is initially positioned around the region of interest.Next, only grid points within 6 Å of one ligand atom were retained.All protein heavy atoms within 3 Å of a grid point are then selected for druggability calculations and this set of atoms is referred as the 'binding site region'.This setup is then followed by either a single point calculation or MD simulations with druggability evaluated at regular intervals in unbiased simulations, or at each time-step for MD simulations biased with the JEDI potential.Every druggability assessment requires that the 'activity' of all grid points is evaluated, with grid points classified as inactive, partially active or fully active according to their geometric position in the binding site.Then, volume and hydrophobicity descriptors that depend on grid point activities and local geometric arrangements of protein atoms are computed in order produce a conformation-dependent protein druggability score.
To avoid errors in the druggability predictions due to diffusion of the protein over the course of an MD simulation, the Cartesian coordinates of the grid points are re-evaluated prior to each druggability assessment.Firstly, the distance vector between the center of mass of the protein atoms in the binding site region in the conformation at the n-th step of the MD simulation (r com,t=n ) and the initial protein conformation (r com,t=0 ) is evaluated.Then, the rotation matrix that best fits the protein backbone atoms of the entire protein onto their coordinates at t = 0 is computed using the Kabsch algorithm. 25nally, the resulting translation vector and rotation matrix are used to transform the grid point Cartesian coordinates at t = 0 into grid point Cartesian coordinates at t = n.
Scoring Function.The JEDI druggability score is calculated as a linear combination of two partial-least squared derived descriptors reflecting the volume, and the hydrophobicity (eq 1).
where V druglike , V a and H a represent respectively the drug-like volume descriptor, the pocket volume descriptor and the pocket hydrophobicity.and γare constants of the model derived by multiple linear regressions against a training set.All the descriptors presented below are based on spline functions such that the JEDI potential is continuous and at least twice differentiable.Two forms of spline functions have been used operating on variables v and k (Figure 1C).The first one turns "off" with v starting at k at v min , reaching 0 at v min +Δ (eq 2).
The active volume descriptor V of the binding site is given by equation 4: where N is total number of grid points, V g is the volume of space covered by a grid point.To capture the shape of the pocket, each grid point is assigned an activity score a i between 0 and 1 (inactive to active), according to its geometric position inside the binding pocket (eq 5).
The first term of eq 5 gradually turns off grid points according their distances from the region of interest.This term is optional, but is useful to ensure that fluctuations in druggability scores are not unduly influenced by conformational changes that are remote from the protein region of interest.The minimum distance BS i between a grid point i and the M atomic coordinates defining the binding site region is calculated as: With =50.0Å and   =   −   , where r gi and r pj are respectively the position vectors of grid point i and protein atom j belonging to the binding site region.The second term in equation 5 causes grid points that overlap with protein atoms to be gradually inactivated (Figure 1B).The minimum distance mind i between grid points and protein atoms is calculated with an equation similar to eq 6.The third term in equation 5 gradually inactivates solvent exposed grid points (Figure 1B).The active volume V is then converted in a pocket volume descriptor V a using equation 8.
=    (8)  where V max is the maximum active volume descriptor.This constant was set to be equal to the maximum active volume V calculated for protein binding sites in the ''druggable'' category of the DCD dataset.
Accordingly, a cavity presenting the characteristics of a typical small-molecule binding site will have a typical V a value in the interval [0.0,1.0].In order to penalize overly large or overly small cavities that are not suitable for drug-like small molecules, the descriptor V druglike is also computed with eq 9.
=    (1.0,   , Δ  )   (1.0,   , Δ  ) where V min is equal to 0 Å 3 by default.Analysis of pockets from the DCD dataset suggested a ΔV min value of 36 Å 3 .For simplicity, the same value was used for ΔV max .The effect is that cavities that differ substantially in active volume from those present in the training set will have a low value of V druglike .In turn this will assign a low JEDI score to cavities that differ markedly from the training set.This parameter may be easily tuned if cavities for ligands that differ substantially from those present in the DCD training set are desired.
The active grid hydrophobicity function captures the average hydrophobicity of the active grid points and is given by eq 10: where the hydrophobicity score H i of the grid point i is calculated as where apolar i and polar i are function of the number of apolar (C and S) and polar (O and N) protein atoms within the distance r hydro defined by equations 12a and 12b: where M apolar and M polar are the total number of apolar and polar protein atoms in the binding site region.

JEDI derivatives.
Because the JEDI potential is based on functions that are continuous and differentiable, the gradient with respect to the Cartesian coordinates    ,    ,    of the j protein atoms in the binding site region can be calculated using the following equation: where M is the number of protein atoms in the binding site region, Similar expressions can be derived for the partial derivatives with respect to    and    .A detailed derivation of all partial derivatives in equation 14 is given in the supplementary inforamation.

JEDI optimization.
The parameters of the JEDI model were optimized using the python module PyEvolve. 26After investigation, only the CC mind , ΔE, ΔCC2 and r hydro variables were selected for optimization using a range of physically plausible values (SI Table S2).An elitist genetic algorithm was then iterated for 50 generations on a population of 40 individuals.The fitness function was defined to maximize the r 2 of JEDI score vs DCD score values after a Partial Least Squares regression.Uncertainties in the JEDI score parameters were determined with 100 iterations of bootstrapping using a split of 0.7/0.3 for the training and validation sets.
MD simulations.Proteins, ligands and cofactors were prepared using the python script Protein Preparation Wizard developed by Schrödinger and available in Maestro. 27First, missing hydrogen atoms were added to the structure to assign the appropriate bond number and formal charge.Then, proteins were manually verified to avoid incomplete side chains and steric clashes.Molecular dynamics simulations have been performed using GROMACS 4.5.5 combined with PLUMED 1.3. 23,28imulations were carried out in implicit solvent using the Generalized Born model and the Onufriev-Bashford-Case method to calculate the Born radii with a cutoff of 20 Å. 29,30 An energy minimization was performed using the steepest descent algorithm to reach the convergence parameter of 300 kJ.mol - 1 .nm - of maximum force change.Then, production runs of 50 ns were performed using a time step of 2.0 fs.Systems were maintained at a constant temperature of 310 K using a stochastic Berendsen thermostat with a coupling constant of 1.0 ps. 31 The force field Amber99sb-ILDN was used for the proteins and the GAFF force field has been used for ligands and cofactors. 32,33The GAFF parameters for the ligands and the cofactors were obtained by using the software acpype, in combination with the antechamber utility from the AMBER12 software package. 34,35A new atom type was created to represent grid points.To avoid interactions between the protein atoms and the grid points, the Lennard-Jones parameters  and  and the atomic partial charges of grid points were equal to zero.All grid points are frozen in space during energy minimization and molecular dynamics time-steps.
Umbrella sampling simulations.Several umbrella sampling calculations were performed using the following biasing potential: where s(r) is the JEDI score of protein binding site conformation r,  is the force constant of the biasing potential, and s 0 is a target value for JEDI score . 36Several biased MD simulations were performed by varying  and s 0 for different systems.The resulting trajectories were clustered to identify the most likely conformations associated with a given set of (, s 0 ) values.The single linkage clustering approach as implemented in GROMACS was used to identify the most representative conformations of each resulting trajectory.The RMSD cluster cutoff was set to 1 Å.RMSD calculations were performed using the coordinates of heavy atoms constituting the binding site region, excluding atoms that can form symmetry equivalent conformations (e.g.Valine C γ atoms).Finally, cluster homogeneity was manually checked.

Docking calculations.
Several representative protein structures were extracted from the trajectories to perform docking calculations.The Maestro software was used to prepare input files for both receptors and ligands. 27Protonation states of binding site Histidine residues were chosen to be consistent with those from the MD simulations (in particular, His110 and His115 were protonated on the -nitrogen atom).Docking calculations were performed with the software Autodock Vina and the Autodock/Vina plugin for pymol. 37,38For each complex, the same docking grid was used, and up to twenty poses were generated.Different protocols featuring a fully rigid receptor or allowing side-chain flexibility of selected residues were used.

Choice of descriptors.
The druggability score of the JEDI methodology is based on a linear combination of structural descriptors characterizing the volume and the hydrophobicity of a cavity.The choice of those collective variables were influenced by the literature. 6,12,13,16,17,39A rule-based method published by Perola et al.
suggested five suitable descriptors: volume, depth, enclosure, percentage of charged residues and hydrophobicity.These descriptors summarize a general consensus fairly well. 40After investigation, only two descriptors have been retained: the active volume and the hydrophobicity.An early version of JEDI was also including a descriptor capturing the degree of buriedness of the binding site.The buriedness, as described by Volkamer et al., was captured as the ratio between the number of hull grid points in contact with the protein surface and the total number of hull grid points. 39Here hull grid points are defined as in Volkamer et al. and correspond to the outer layer of active grid points that define the shape of a binding site.After preliminary investigations, this descriptor was not found to contribute significantly to the druggability prediction.This is likely because the current definition of the active volume descriptor is penalizing solvent-exposed grid points and thus already accounts for buriedness.Consequently, shallow solvent exposed cavities have a lower active volume descriptor than buried enclosed closed cavities.
The results depicted in Figure 2 demonstrate that higher JEDI score values do correlate with a larger binding site active volume V and a larger hydrophobicity descriptor H a .
Since the publication of the first large scale classification of protein binding sites by An et al, 41 numerous studies have been conducted in the field of pocket detection and analysis to improve understanding of the physicochemical properties that underlie protein-ligand interactions. 6,16,17,39The average volume of a druggable binding site was evaluated around 600 Å 3 , 41 with maximum values around 900-1200 Å 3 . 39,40These estimates are in line with those computed with JEDI; the average volume of a binding site represented by the total number of active and partially active (a i >0) grid points was found to be 496  202 Å 3 , with a maximum value of 1019 Å 3 .The results shown in Figure 2A depict the distribution of active volume (V) values for different categories of protein binding sites.As the active volume is the sum of the grid point volumes weighted by their activity, it is in general much smaller than the volume of the binding site.An average value for the whole dataset is V = 125  60 Å 3 .
The JEDI hydrophobicity descriptor shares similarities with the descriptor used by by Eyrisch et al. 42,43 In accordance with previous literature studies, druggable binding sites tend to have higher average hydrophobicity values (H a = 0.72 ± 0.03) than non-druggable binding sites (H a = 0.60 ± 0.04).
This descriptor was found to be the most significant contribution to the JEDI score values with a weight β almost five times larger than the α volume coefficient (Table 1).This observation is in a good agreement with the literature, where the apolar character of a cavity is usually the most important structural descriptor for druggability assessment. 13,39Indeed, a single hydrophobicity descriptor has been shown in some instances to be sufficient to distinguish druggable proteins from nondruggable proteins. 16,24uggability scoring of diverse protein structures.
The JEDI parameters were first optimized using multiple linear regressions and the elitist selection variant of the genetic algorithm methodology implemented in the python module PyEvolve. 26DI druggability scores obtained at the end of the process are shown in Figure 3A.For comparison, fpocket was used to calculate the druggability score of each protein in the training dataset (Figure 3B).
The results suggest that JEDI predictions are slightly more accurate than those obtained using fpocket with a r 2 of 0.63 ± 0.11 and 0.52 ± 0.13 respectively.Closer inspection of Figure 3A shows that JEDI discriminates fairly well sites categorized as ''undruggable'' from those classified as ''druggable'', but proteins in the ''difficult'' category show a large scatter in JEDI score values.Clearly, the exact ''experimental'' DCD druggability score assigned to a given protein can be debated, and this must be kept in mind when calibrating computational methods against this dataset.Additional tests were conducted by positioning the grid on buried or solvent exposed regions of the protein Malate Dehydrogenase (PDB 1BMD), where no apparent pockets were observed.The resulting JEDI score values where invariably lower than 1.5.
Detailed structural analyses of accurate and inaccurate druggability predictions for representatives druggable and non-druggable protein binding sites is useful to characterize the strengths and weaknesses of the present approach.Four representative structures were chosen for this purpose (Figure 4), and JEDI descriptor values for these structures are shown in table 4. Figure 4A represents the binding site of a malate dehydrogenase in complex with the coenzyme NAD (PDB 1BMD).This enzyme has been classified as nondruggable due to the difficulty of finding a drug-like compound able to compete with NAD for access to the binding site.The binding affinity of several known nucleotide inhibitors have been previously determined by enzymatic assays. 44The best competitive inhibitor is the cyclic nucleotide cAMP, presenting a K i value 560 nM.If this protein is clearly evaluated as nondruggable by fpocket (score = 0.11), it remains challenging for other methodologies such as the NMR-based approach developed by Hadjuk and coworkers, which predicts the cavity as having an intermediate druggability. 12This is in line with the observed JEDI score value for this system (5.1).The relatively high JEDI score is largely due to the relative large active volume V of the binding site (157 Å 3 ), which is in the range of V values typical for druggable sites (Table 3, first row).Thus, that malate dehydrogenase is not considered druggable in practice may be more a reflection of the difficulty for a drug-like molecule to compete with NAD at a ca.300 M expected intracellular concentration in mammalian cells, 45 rather than the occurrence of an unusually polar or shallow binding site.An example of a correct nondruggable prediction is depicted in figure 4B for the binding site of Inositol Polyphosphate (IP) phosphatase. 46In addition to a small active volume due to a poor degree of enclosure, this small pocket presents a very low hydrophobicity score (Table 3, second row).This is mainly because of a Calcium ion in the binding site.A correctly predicted druggable cavity is shown in figure 4C.This mostly apolar well-enclosed pocket corresponds to the binding site of the S810L mutant mineralocorticoid receptor interacting with spironolactone (Table 3, third row). 47This inhibitor has shown IC 50 values in the range of 1.6 -60 nM in a cell-based luciferase reporter assay. 48Lastly, figure 4D depicts a druggable binding site that is incorrectly predicted to be 'difficult' to target.In addition to a high polarity caused by the presence of a zinc ion buried in the pocket, the binding site of carbonic anhydrase II is particularly small. 49Most successful carbonic anhydrase inhibitors exploit direct interactions with the buried zinc ion.The present version of JEDI does not account for potentially favorable metal-ligand interactions and this explains the discrepancy between the JEDI score and DCD score values (Table 3, fourth row).

Sensitivity to minor structural variations, and computational cost.
A potential concern at the outset of the project was that JEDI score values would be unduly Next the computational cost of the JEDI calculations was assessed.An important consideration is that the calculations should not slow down too much molecular dynamics simulations.Benchmarks are shown in Table 4.If JEDI is used to monitor druggability values on the fly during an MD simulation, then it isn't necessary to evaluate druggability at every time-step, as snapshots between successive times-steps are highly correlated.With druggability evaluation every 1 ps the time incurred is negligible, unless the MD simulation is parallelized across multiple processors.Likewise, single-point druggability estimates of a protein structure are far faster than alternative methodologies that take seconds to minutes. 14,16,50The implementation of MD simulation protocols biased with JEDI requires a druggability calculation at each time-step.In this case the performance loss is approximately a factor of 1.4 to 2.7, depending on the number of processors used to speed-up the evaluation of the non-bonded energies.Evidently, further gains in efficiency could be achieved by parallelizing key subroutines in the JEDI code.Alternatively, multiple time-step algorithms schemes for collective variables as proposed recently by Ferrarotti et al. could be used to decrease the computational cost further. 51The relative efficiency is also influenced by the choice of an implicit solvent model for this study, which dramatically speeds up the evaluation of non-bonded energies.Overall, the performance was deemed acceptable, given scope for future improvements.
Application to a hidden pockets dataset.
Validation of the methodology was pursued by analysis of a set of six proteins known to adopt distinct binding site conformations in the presence of different ligands (Figure 6).In each instance, two conformations for each protein were selected for druggability assessments.Protein structures were aligned and a grid defined from the largest ligand was used to compute a JEDI score value for both conformations.In all instances the ligand atoms were ignored for druggability calculations.The results of this analysis are shown in Table 5. Human phenylethanolamine N-methyltransferase (hPNMT) is an enzyme involved in the synthesis of epinephrine from norepinephrine using the cofactor S-adenosyl-Lmethionine (SAM) to methylate the primary amine of noradrenaline.Two different hPNMT inhibitors, 1 and 2, have been reported to inhibit the enzyme with K i values of 0.28 μM and 0.063 μM respectively (radiochemical assay). 52It has been shown that these two ligands bind to different conformations of the hPNMT binding site (Figure 6A).Both compounds engage in significant hydrophobic interactions, but the larger ligand (2) positions a p-chlorophenyl group in a cavity that is hidden in the hPNMT/1 complex.Formation of the enlarged cavity in hPNMT/2 necessitates the rearrangement of the side-chain Lys57, as well as a small displacement of helix 3.The JEDI calculations were able to capture a favorable increase in druggability of ca.0.8 units for the protein binding site conformation seen in hPNMT/2 in comparison with hPNMT/1.The change in druggability is due to a favorable increase in both V and H a (Table 5, first row).
The von Hippel-Lindau protein (pVHL) forms a complex with the proteins CUL2, Elongin B and C, and Rbx1.This complex is involved in the ubiquitination of the transcription factor hypoxiainducible factor (HIF-1), leading to proteasome-mediated degradation of HIF-1. 53Small molecules 3 and 4 have been reported to inhibit interactions between pVHL and HIF-1 with K d values of 86.1 μM and 27.7 μM respectively (fluorescence polarization assay). 19The ligands occupy the same binding site, but a different orientation of Arg107 is observed, giving rise to a slightly more enlarged cavity in VHL/4 (Figure 6B).This translates into a slightly higher JEDI score value for VHL/4 over VHL/3.This is because repositioning of Arg107 increased the value of H  in VHL/4.However this is partially offset by a decrease in V a .This is because the displacement of Arg107 exposes more grid points to the solvent, and as a consequence, grid points previously fully active become partially active (Table 5, second row).
Serine/threonine-protein kinase or polo-like kinase 1 (PLK-1) is an enzyme involved in the regulation of cell division.,The PLK-1 inhibitor 5 binds with an IC 50 = 730 nM (fluorescence polarization assay) to the ATP binding site, and also to a subpocket that has been called the adaptive pocket, whereas the inhibitor 6 shows an IC 50 of 530 nM (kinase enzymatic assay) and binds to the native purine-pocket of the active site (Figure 6C). 54,55However the larger active volume observed in the PLK-1/5 bound conformation is mainly due to active grid points around the methylpiperazine moiety of 5.These grid points are inactive in the PLK-1/6 complex because they are too solvent exposed.The adaptive pocket seen in PLK-1/6 is predicted to be less druggable than the native pocket seen in PLK-1/5 by ca.0.8 units (Table 5, third row).
Prostate specific membrane antigen (PSMA) is a glycoprotein overexpressed as a homodimer in many forms of prostate cancer.Compound 7 is an example of a first generation of PSMA inhibitors that binds the very polar binding site of PSMA with a K i of 11 nM (fluorescence-based NAALADase assay). 56More recently, compounds belonging to the class of antibody recruiting small molecules targeting prostate cancer (ARM-P) have been reported, and compound 8 binds PSMA with a K i of 0.02 nM (enzymatic assay). 57,58A crystallographic structure of the PSMA/8 complex revealed that 8 binds to an open PSMA conformation that was not observed in the PSMA/7 complex.The large difference in binding affinities between 7 and 8 appears to be well reproduced by a large difference in JEDI score values (Table 5, fourth row).However in this instance the active volume V is much larger than for a typical small molecule binding site and as a consequence the druggability score is strongly penalized by the score remains small because the DNP pocket is relatively small.Thus the PSMA binding site is a good illustration of challenging conditions encountered when performing JEDI analysis of binding sites for ligand that depart from typical rule-of-five compliant small molecules.
HIV-RT is an enzyme playing a crucial role in the replication of the HIV virus.0][61][62][63] Druggability predictions were compared for the NNRTI-binding pocket of the apo structure of HIV-1 RT and in complex with 9 (Figure 6E).This compound belongs to the second generation of NNRTIs and inhibits wild type HIV-RT with an IC 50 of 2.1 nM (antiviral assay). 64,65The binding site of the HIV-RT/9 complex was found to be one of the most druggable pocket analysed in this work.It is noteworthy that the NNRTI cavity is actually partially formed in the apo protein, and has an active volume of V = 192 A 3 The holo structure features an enlarged binding site and side chains rearrangements that increase the hydrophobicity H a (Table 5, fifth row).
Interleukin-2 (IL-2) is a cytokine playing a crucial role in the regulation of white blood cells of the immune system.The small molecule 10 binds to a pocket only partially present in the apo structure.
An additional cavity is present in the holo complex and it forms by displacement of two residues, Phe42 and Glu62 (Figure 6F). 66A similar pocket volume descriptor is observed for both apo and holo forms of IL-2.However this time, a higher druggability score was predicted in absence of ligand, because the hydrophobicity H a is lower in the IL-2/10 complex (Table 5, sixth row).This occurred because the motion of Phe42 and Glu62 promotes hydrogen bonding with Glu62 and Lys43, activating grid points close to polar atoms, thus decreasing hydrophobicity.
Overall, the methodology is clearly able to correlate fluctuations in druggability score with noteworthy binding site conformational changes that have the potential to impact structure-based ligand design activities.In five cases out of six, the conformation with the highest JEDI score corresponds to the conformation that binds the most tightly bound ligand.Careful interpretation of the results is needed when considering unusual protein-ligand complexes, such as PSMA/8.Quantitative correlation with binding affinities is not expected since the ligands differ.Further, druggability is not exclusively linked to binding affinity.PSMA is an example of a binding site for which ligands with very low K i values are known (7 and 8), but the low predicted druggability score is adequate since most of the binding affinity is achieved by means of strongly polar ligand moieties positioned in the active site.These in turn translate into inauspicious drug-like properties, such as low cell permeability. 56,58 the fly evaluation of druggability during MD simulations.
Further tests were conducted with MD simulations of VHL.Druggability values were collected every ps over the course of a 50 ns simulation of apo VHL or VHL/3.The results are shown in Figure 7.
The binding site druggability remained stable throughout the VHL/3 simulation, with an average JEDI score of 7.80.6 which is consistent with the expected value from previous analyses (Table 5, row 2).Clustering analysis reveals only one major binding site conformation (76% of the trafectory), that is  5, second row).Figure 8A (upper panel) indicates that the target druggability value is rapidly achieved in all instances.As expected fluctuations from the target value decrease with increased  values.The trajectory obtained using  = 2 000 kJ.mol -1 .nm - was subjected to further clustering.The most populated clusters (51% of the overall trajectory) are very similar to the VHL/3 structure, with RMSD values always inferior to 2.0 Å.In the unbiased MD simulation of apo VHL, only 14% of the computed conformation exhibited an RMSD to the VHL/3 conformation that was smaller than 2.0 Å.Some clusters still contain conformations with Tyr98 pointing inside the binding site, but the occurrence is greatly decreased.His110 was also found to be much less flexible.It is apparent that the ligand binding site is almost fully formed in the most populated cluster of the biased apo VHL simulation (Figure 8A, bottom part).
The umbrella sampling simulations of VHL/3 were performed to encourage the binding site to adopt more druggable conformations.A reference value s 0 = 9 was selected based on the JEDI score of VHL/4. Figure 8B upper part shows that higher  values are needed to achieve the desired s 0 value.This indicates that the conformations with high JEDI score values do not form spontaneously.The increase in JEDI score values that is achieved correlates largely with the position of Arg107.This amino acid initially closes the binding site, but with the present bias, it shifts rapidly to a solvent exposed position, thus causing an enlargement of the binding site.This motion was rarely observed in unbiased MD simulations.
Next, more significant structural rearrangements were sought by performing umbrella sampling simulations of apo VHL with s 0 = 3.0.Results obtained with  = 2000 kJ.mol -1 .nm - are shown in Figure 9. Requesting such a low target druggability value forces VHL to largely collapse the binding site.Here the collapse is even more pronounced than observed in the unbiased apo VHL simulations, with the binding pockets of the isoxazole and pyrrolidine moieties completely masked.Consequently, the pocket volume descriptor V a decreases, and the active volume V becomes sufficiently low such that the V druglike term penalizes the JEDI score values.The hydrophobicity descriptor H  is stable during the biased simulation, with an average value slightly lower than observed in the unbiased apo VHL simulation.
The closure of the binding site has totally or partially inactivated numerous grid points that were previously in a buried cavity, leaving only a few active grid points at the protein surface and near polar groups.An illustration of the most populated cluster (73% of the trajectory) is depicted in figure 9B.
Umbrella sampling simulations of apo VHL were also performed by setting s 0 = 10 and  = 2000 kJ.mol -1 .nm - to encourage the exploration of conformations with high druggability.The results are presented in Figure 10A.As observed previously, the simulation is rapidly sampling conformations in the requested range of JEDI score .As expected, V a and H a are almost always higher than in the previously described simulations.However, larger fluctuations are observed in both descriptors throughout the biased simulation.An increase in hydrophobicity H a is always offset by a decrease of the active volume descriptor V a and vice versa.Clustering analysis of the trajectory here reveals at least two significant distinct clusters (populations 18% and 8% respectively).The second cluster (Figure 10C) corresponds to a low V a / H a binding site conformation that is significantly different from the VHL/3 structure.
The pyrrolidine pocket has collapsed and side-chains rearranged to expose hydrophobic groups to the surface.The first cluster (Figure 10B) corresponds to a conformation comparable to the VHL/3 holo structure.Additionally, Arg107 has adopted a solvent exposed position that contributes favorably to the JEDI score as demonstrated previously (Table 5, second row).A significant difference that was not observed in previous simulations is the rearrangement of Arg69 in the left-hand side part of the binding site.This conformational rearrangement leads to a more extended cavity with high druggability scores.
The flexibility of the left hand side pocket, has been recently discussed in the literature in the context of crystallographic structure analyses of multiple VHL ligand complexes, 67 and Galdeano et al. have suggested that additional interactions between ligands and this part of the binding site may facilitate the development of improved VHL ligands.

Docking ligands into JEDI computed conformations.
Several docking calculations were carried out to evaluate the utility of the conformational ensembles computed from the umbrella sampling simulations.Figure 11A depicts results obtained using the computed apo VHL conformation closest to the average conformation of the most populated cluster taken from an umbrella sampling simulation with s 0 = 10.0 and  = 2000 kJ.mol -1 .nm - (Figure 10B, top).Ligand 3 was found to adopt a pose that bears a substantial similarity with the crystallographic position of the ligand (RMSD of 3.6 Å, VINA binding energy of -5.6 kcal.mol - ).This is however not the top-scored pose which had a VINA binding energy of -6.2 kcal.mol - .Qualitatively the discrepancy with the crystallographic binding mode is mostly due to a shift of the isoxazole ring of 3 that is involved instead in stacking interactions with Tyr112.Closer inspection of the computed complex indicates that this binding mode is preferred because the computed ''left-hand side'' VHL pocket that would normally host the isoxazole ring is too shallow.However, small fluctuations in pocket depth are apparent in snapshots that are present in the same cluster, and it is possible to manually select a snapshot with a lefthand-side pocket that more closely resembles the crystallographic structure (RMSD of the binding-site sidechains depicted in Figure 11A to the crystallographic conformation is 1.3 Å, see Figure S2 for details).Repeating docking calculations on this conformation (Figure 11B) yields indeed a well scored pose (VINA binding energy -6.4 kcal.mol - ) that reproduces fairly well the crystallographic position of the ligand (RMSD of 2.1 Å) though this is again not the top-scoring pose which had a VINA binding energy of -7 kcal.mol - .As a control, the same docking protocol was also applied to the computed apo VHL conformation closest to the average conformation of the most populated cluster from an unbiased classical MD simulation (Figure 11C).As expected, the lowest-RMSD pose was significantly different from the crystallographic binding mode of 3 (RMSD of 5.4 Å, VINA binding energy -6.1 kcal.mol - ).
The docking calculations were repeated allowing side-chain flexibility of Tyr98, Ile109 but no improvements were observed.This is likely because significant conformational changes involving both side-chain and backbone atoms rearrangements are necessary to form the ligand binding site from the apo protein conformations sampled from the unbiased MD simulation.Conversely, little improvements was seen in the RMSD of the ligands docked into the JEDI computed conformations with the aid of a flexible side-chain docking protocol, presumably because the binding site is already largely formed.

Conclusions
A novel approach to assess protein binding site druggability has been developed.The fast, continuous and differentiable JEDI druggability estimator has been implemented in PLUMED and has been used as a collective variable in order to compute protein druggability at every integration step of a MD simulation. 23While the use of MD to sample protein conformations limits throughput, it offers the advantage of sampling more accurate protein conformations than those that may be generated by alternative molecular modelling approaches.As discussed elsewhere, high accuracy is crucial if computed protein conformations are to be subject to follow-up virtual screens. 68The methodology is able to distinguish nondruggable, difficult and druggable pockets (r 2 = 0.6±0.1),and is relatively insensitive to insignificant structural rearrangements in a binding site.
Some limits in the estimator were exposed, for instance neglect of potential metal-ligand interactions.
This could be remedied with additional structural descriptors.In addition, the present scoring function is only calibrated for detecting cavities that bind drug-like small molecules.JEDI was tested additionally on a dataset of hidden pockets for structurally diverse protein targets.The results show a good ability for the approach to detect conformational changes that influence the druggability of a protein binding site.
With the present version of the method, care must be taken when performing this analysis on binding sites for ligands that depart from typical rule-of-five compliant small molecules.
The main novelty of the approach lies in its potential to bias MD simulations with a JEDI force that will encourage a protein region to adopt conformations that match desired druggability scores.The results obtained through several umbrella sampling simulations of VHL indicate that JEDI enables the rapid sampling of 'holo-like' protein binding site conformations that are rarely seen in unbiased apo MD simulations.For structure-based drug design purposes this would be useful to identify tractable conformations in drug targets that may be otherwise considered undruggable from crystallographic analysis.An advantage of the approach over induced-fit docking/MD refinement protocols is that druggable cavities can be identified for targets that lack known ligands. 69JEDI also enables biased simulations of protein-ligand complexes.For structure-based drug design purposes, this would be useful to identify enlarged cavities that could accommodate a larger analog of an existing ligand.
Further work will focus on replacing the GBSA implicit solvent model with explicit solvent models, and this is expected to improve the accuracy of the computed conformations. 70Additional work is also desirable to identify the most efficient and accurate docking protocols to use in combination with JEDI computed conformations.Clustering of the biased simulations in VHL has identified in many instances several structurally distinct conformations that match a given target druggability value.That druggability is a degenerate collective variable was expected, and an exciting direction for this work is to couple the JEDI calculations with other collective variables to resolve distinct conformational states.
This will facilitate the evaluation of the free energy of these hidden conformational states with respect to the native state conformation.This parameter is likely to be important for practical applications.
Presumably the feasibility of targeting productively with a ligand a putative cryptic binding site hinges on an acceptable stability relative to the native state. 68Several enhanced sampling methodologies could in principle be suitable to this end, 71 and progress towards this objective will be reported in due course.Expressions for the derivatives of the JEDI potential.This material is available free of charge via the Internet at http://pubs.acs.org.Proteins discussed in the text, Figure 4 and Table 3 are represented with green crosses.nondruggable binding site that is predicted to have a low druggability score.It is here in complex with inositol(1,4)-bisphosphate (PDB 1I9Z).C) Mineralocorticoid receptor is a druggable target that is predicted to have a high druggability score.It is here in complex with spironolactone (PDB 2OAX).D) Carbonic anhydrase II is a druggable target that is predicted to have a low druggability score.It is here in complex with ethoxzolamide (PDB 3CAJ).The protein surface has been colored according to polar (blue) and apolar (orange) atoms.The 3D ligand conformations are represented in red licorice.Green dots symbolize grid points, and grid points with activity values a i > 0 are depicted with small spheres.

Supporting Information
Calcium and zinc ions are respectively represented as grey and pink Van der Waals spheres.Pictures were prepared using the software VMD. 72       cluster from an unbiased MD simulation.Results obtained using Vina. 37The crystallographic pose is in red sticks.All other symbols and representations are as in Figure 7.
sensitive to minor structural variations that are typically observed when crystal structures of the same protein are solved and refined independently.A major motivation for the development of JEDI was to observe variability in JEDI score between different structures of the same protein, only when conformational changes relevant for drug design are observed (e.g. a side-chain flip).This feature requires a subtle balance, on the one hand the methodology should not be too sensitive to very minor structural changes, but on the other hand it should be sufficiently sensitive to capture a fluctuation in druggability if the rearrangement is significant.The strategy here adopted was to evaluate the sensitivity of the JEDI score values for comparable conformations of the same protein interacting with different ligands.The structural similarity was quantified by means of RMSD calculations on the backbone and C β atoms of the binding site atoms of each protein.Selected proteins for which RMSD values of the different structures were less than 0.5 Å were retained for further analysis.Additionally, visualization of the binding sites confirmed that there was no noticeable difference in binding site conformation between the different selected structures.Figure5Ashows the distribution of JEDI score values obtained by this analysis for a representative protein taken from the 'nondruggable', 'difficult' and 'druggable' categories of the DCD dataset.Although small fluctuations in JEDI score are observed in the case of the difficult and the druggable binding site, the results suggest nevertheless a good reproducibility and robustness to insignificant structural changes.By contrast the fpocket methodology sometimes exhibits substantial variations in druggability that complicates interpretation of the scores (Figure5B).As an additional test of sensitivity, the dependence of the JEDI score values on the initial placement of the grid was assessed by evaluating the druggability of the same protein after translations of grid point coordinates by up to ±0.5 Å in the x, y, and z directions in Cartesian space.The druggability predictions were found to be quite insensitive to such translations, with fluctuations in the JEDI score values in the range of 0.1.

V
druglike .This indicates that the predictions for PSMA should be treated with care as the binding site differs substantially from those present in the training set.Compound 8 is unusual because it is made of a long flexible linker connecting a moiety positioned in the buried PSMA active site (Figure6Dblue square), and another moiety positioned in the arene binding site at the protein surface (Figure6Dblack square).The JEDI analysis was therefore repeated by splitting the initial grid in two regions to predict the druggability of each pocket independently.A first grid was placed around the active site, and a second was located around the DNP pocket.A low score was observed for the active site in both instances (JEDI score = 2.3 and 2.6 respectively), because of a very high polarity due to the presence of by several ions and polar and charged amino acids in the active site.The DNP pocket in PSMA/8 does score slightly higher (JEDI score = 3.1) than the same region in the PSMA/7 complex (JEDI score = 2.3) but depicted in Figure 7C (right panel).By contrast, the apo simulation shows an average druggability score of 5.70.8.Numerous structurally different binding site conformations are sampled.In the present MD simulations, the apo binding pocket is quickly obstructed by the rearrangement of Tyr98 and His110, inducing a drop of druggability.This could reflect inaccuracies in the protein force field used for the present study.Dozens of clusters were identified and the most populated (JEDI score ca.6.3) is present in 67% of the simulation (Figure 7C, left panel).This partially closed conformation is mainly stabilized by hydrogen bonds between the phenolic OH group of Tyr98 and the protein backbone.His110 is very flexible throughout the simulation.Surprisingly, significant side-chain rearrangements that partially block the binding site do not affect dramatically the JEDI score values.This occurs here because the shift in position for Tyr98 has created a new hydrophobic sub-pocket that contributes favorably to the JEDI score .However this sub-pocket is now occluded by Tyr98 and disconnected from the rest of the binding site.Further, the rest of the VHL binding site is still partially present, including the central pyrrolidine binding pocket.Binding site conformations that correspond to extreme druggability fluctuations seen in the apo simulation are depicted in figure 7D.In general, the apo conformations that present high JEDI score values were found to be structurally similar to the VHL/3 conformation seen in the crystal structure.Biasing MD simulations with the JEDI potential.Umbrella sampling simulations were performed for apo VHL and VHL/3 using equation 15 and by varying force constant values for  and target JEDI score values s 0 .No reweighting of the biased simulations was performed, thus all results presented below correspond to equilibrium properties of the biased Hamiltonians.The results are depicted in Figure 8. Apo simulations were biased to achieve a JEDI score of 8, in expectation with the values previously observed for ligand bound complexes (Table

Figure 1 .
Figure 1.Overview of the JEDI protocol.A) The region of space for druggability assessment is

Figure 2 .
Figure 2. Boxplots of values of the (A) active volume V descriptor and (B) hydrophobicity descriptor

Figure 3 .
Figure 3.The correlation of computed druggability scores with DCD database druggability scores.A)

Figure 4 .
Figure 4.The relationship between JEDI druggability scores, binding site descriptors and ligand

Figure 5 .
Figure 5.The sensitivity of druggability scores to small structural differences.The boxplots illustrate

Figure 6 .
Figure 6.Conformational variability in the hidden pocket dataset.(A) hPNMT in complex with 1 or 2,

Figure 7 .
Figure 7. Druggability fluctuations during an MD simulations of apo VHL.Instantaneous values (thin

Figure 9 .
Figure 9. Druggability fluctuations during a biased simulation of apo VHL with s 0 = 3,  = 2000 kJ.mol -1 .nm - .(A) Instantaneous values and running averages of JEDI score , V a and H a .(B)

Figure 10 .
Figure 10.Druggability fluctuations in apo VHL umbrella sampling simulation with s 0 = 10 and  = 2000 kJ.mol -1 .nm - .(A) Running averages and instantaneous values of of JEDI score , V a and H a , (B) The

Figure 11 .
Figure 11.Ligand docking in JEDI computed VHL conformations.(A) Pose of 3 (green sticks) ,  ℎ , Δ ℎ ) with respect to the Cartesian coordinates of protein atom j.The derivatives of the JEDI potential with respect to the grid Cartesian coordinates do not need to be calculated as the grid is frozen during MD time-steps.By application of the product rule:

Table of computed
JEDI score , fpocket score and DCD score for every protein in the DCD training set.

Table 1 .
List of variables used to compute JEDI score .

Table 2 .
List of constants used to compute JEDI score .

Table 3 .
JEDI descriptor values for the structures depicted in Figure5

Table 4 .
JEDI performance in ns/day for VHL (2278 atoms) and hPNMT (4057 atoms).The results were obtained using a cut-off of 20 Å for the neighbor list, and 100 ps simulations on an Intel Xeon E3-1270 v3 (3.5GHz) processor.