Presentation on theme: "Ugur Sezerman Sabanci University"— Presentation transcript:
1Ugur Sezerman Sabanci University Molecular DockingUgur SezermanSabanci University
2What is docking?Docking is finding the binding geometry of two interacting moleculeswith known structuresThe two molecules (“Receptor” and “Ligand”) can be:- two proteins- a protein and a drug- a nucleic acid and a drugTwo types of docking:- local docking: the binding site in the receptor is known,and docking refers to finding the positionof the ligand in that binding site- global docking: the binding site is unknown. The searchfor the binding site and the position of theligand in the binding site can thenbe performed sequentially or simulaneously
3What Are Docking & Scoring? To place a ligand (small molecule) into the binding site of a receptor in the manners appropriate for optimal interactions with a receptor.To evaluate the ligand-receptor interactions in a way that may discriminate the experimentally observed mode from others and estimate the binding affinity.complexliganddockingscoringreceptorX-ray structure& DG… etc
4Why Do We Do Docking?Drug discovery costs are too high: ~$800 millions, 8~14 years, ~10,000 compounds (DiMasi et al. 2003; Dickson & Gagnon 2004)Drugs interact with their receptors in a highly specific and complementary manner.Core of the target-based structure-based drug design (SBDD) for lead generation and optimization.Lead is a compound thatshows biological activity,is novel, andhas the potential of being structurally modified for improved bioactivity, selectivity, and drugeability.
5Docking ApplicationsDetermine the lowest free energy structures for the receptor-ligand complexSearch database and rank hits for lead generationCalculate the differential binding of a ligand to two different macromolecular receptorsStudy the geometry of a particular complexPropose modification of a lead molecules to optimize potency or other propertiesde novo design for lead generationLibrary design
6Key aspects of docking Scoring Functions Search Methods What are they?Which Scoring Functions are feasible?Search MethodsHow do they work?Which search method should I use?Which program should I use?
7Docking ChallengeBoth molecules are flexible and may alter each other’s structure as they interact:Hundreds to thousands of degrees of freedomTotal possible conformations are astronomical
8Formulation of Docking Problem A scoring function that can discriminate correct (experimentally observed) docking complex structure from incorrect onesA search algorithm that finds the docking complex structure measured by the scoring function
9Formulation of Docking Problem Factors Affecting ∆G0Intramolecular Forces(covalent)• Bond lengths• Bond angles• Dihedral anglesIntermolecular Forces (noncovalent)• Electrostatics• Dipolar interactions• Hydrogen bonding• Hydrophobicity• Van der Waals
10Types of Docking Problems Bound docking : the goal is to reproduce a known complexUnbound docking : complex structure not knownProtein-Small Molecule DockingRigid receptor, rigid ligandRigid receptor, flexible ligandFlexible receptor, flexible ligand
12Docking strategies require: Protein representationA search methodFinal refinement and scoring
131. Protein StructureA 3-D structure of the target protein at atomic resolution must be availableCrystal and solution structures (PDB)Homology modelsPseudoreceptor modelsIdeally, the atomic resolution of crystal structures should be below 2.5 AEven small changes in structure can drastically alter the outcome
14Receptor Structures & Binding Site Descriptions PDB (Protein Data Bank, containing proteins or enzymes:X-ray crystal: >60,000 structures,~10 % have ≤ 1.5 Å, ~80% between ÅNMR:, ensemble accuracy of Å in the backbone region, 1.5 Å in average side chain position (Billeter 1992; Clore et al. 1993)(and high quality homology models built from highly similar sequences)Limitation of experimental structures (Davis et al. 2003):Locations of hydrogen atoms, water molecules, and metal ionsIdentities and locations of some heavy atoms (e.g., ~1/6 of N/O of Asn & Gln, and N/C of His incorrectly assigned in PDB; up to 0.5 Å uncertainty in position)Conformational flexibility of proteinsBinding site descriptions: atomic coordinates, surface,volume, points & distances, bond vectors, grid andvarious properties such as electrostatic potential,hydrophobic moment, polar, nonpolar, atom types, etc.DOCK
15Drug, Chemical & Structural Space Drug-like: MDDR (MDL Drug Data Report) >147,000 entries, CMC (Comprehensive Medicinal Chemistry) >8,600 entriesNon-drug-like: ACD (Available Chemicals Directory) ~3 million entriesLiteratures and databases, Beilstein (>8 million compounds), CAS & SciFinderCSD (Cambridge Structural Database, ~3 million X-ray crystal structures for >264,000 different compounds and >128,00 organic structuresAvailable compoundsAvailable without exclusivity: various vendors (& ACD)Available with limited exclusivity: Maybridge, Array, ChemDiv, WuXi Pharma, ChemExplorer, etc.Corporate databases: a few millions in large pharma companies
163D Structural Information & Ligand Descriptions 2D->3D software: CORINA, OMEGA, CONCORD, MM2/3, WIZARD, COBRA. (reviewed by Robertson et al. 2001)CSD: <0.1 Å for small molecules, but may not be the bound conformation in the receptorPDB: ligand-bound protein structures ~6000 entriesAtoms associated with inter-atom distances, physical and chemical properties, types, charges, pharmacophore, etcFlexibility: conformation ensemble, fragment-based
17Scoring Functions A fast and simplified estimation of binding energies scores <-> DGbindingX-raystructure-scores?configurations of the complex
183. Scoring Functions Factors Affecting ∆G0 Intramolecular Forces(covalent)• Bond lengths• Bond angles• Dihedral anglesIntermolecular Forces• Electrostatics• Dipolar interactions• Hydrogen bonding• Hydrophobicity• Van der Waals
19Types of Scoring Functions Force field based: nonbonded interaction terms as the score, sometimes in combination with solvation termsEmpirical: multivariate regression methods to fit coefficients of physically motivated structural functions by using a training set of ligand-receptor complexes with measured binding affinityKnowledge-based: statistical atom pair potentials derived from structural databases as the scoreOther: scores and/or filters based on chemical properties, pharmacophore, contact, shape complementaryConsensus scoring functions approach
21Force Field Based Scoring Functions e.g. AMBER FF in DOCKAdvantagesFF terms are well studied and have some physical basisTransferable, and fast when used on a pre-computed gridDisadvantagesOnly parts of the relevant energies, i.e., potential energies & sometimes enhanced by solvation or entropy termsElectrostatics often overestimated, leading to systematic problems in ranking complexes
22Molecular mechanics force fields Usually quantify the sum of two energiesthe receptor–ligand interaction energyinternal ligand energy (such as steric strain induced by binding)Interactions between ligand and receptor are most often described by using van der Waals and electrostatic energy terms.AdvantagesFF terms are well studied and have some physical basisTransferable, and fast when used on a pre-computed gridDisadvantagesOnly parts of the relevant energies, i.e., potential energies & sometimes enhanced by solvation or entropy termsElectrostatics often overestimated, leading to systematic problems in ranking complexes
23Molecular mechanics force fields CHARMM[Brooks83]
24Molecular mechanics force fields AMBER:[Cornell95]
25FF Scoring: Implementations AMBER FF: DOCK, FLOG, AutoDOCKCHARMm FF: CDOCK, MC-approach (Caflisch et al. 1997)Potential Grid: rigid receptor structure upon docking. The grid-based score interpolates from eight surrounding grid points only. 100-fold speed up. Examples: DOCK, CDOCK, and many other docking programs.Soften VDW: A soft-core vdw potential is needed for the kinetic accessibility of the binding site (Vieth et al. 1998). FLOG: 6-9 Lennard-Jones function; GOLD: 4-8 vdw + H-bond, and intraligand energy.Solvent Effect on Electrostatic: often approximated by rescaling the in vacuo coulomb interactions by 1/D, where D = 1-80 or = n*r, n = 1-4, r = distance.Solvation and Entropy Terms: Solvation terms decomposed into nonpolar and electrostatic contributions (e.g., DOCK):
26Empirical Scoring Functions LUDI & FlexX(Boehm 1994)Goals: reproduce the experimental values of binding energies and with its global minimum directed to the X-ray crystal structureAdvantages: fast & direct estimation of binding affinityDisadvantagesOnly a few complexes with both accurate structures & binding energies knownDiscrepancy in the binding affinities measured from different labsHeavy dependence on the placement of hydrogen atomsHeavy dependence of transferability on the training setNo effective penalty term for bad structures
27Empirical Scoring: Implementations Mostly differ by what training set and how many parameters are usedCerius2/Insight2000: LUDI, ChemScore, PLP, LigScoreSYBYL: FlexX, F-ScoreHammerhead: 17 parameters for hydrophobic, polar complementary, entropy, solvation. sLOO = 1.0 logK for 34 complexesVALIDATE: 8 parameters for VDW and Coulomb interactions, surface complementarity, lipophilicity, conformational entropy and enthalpy, lipophilic and hydrophilic complementarity between receptor and ligand surfacesPRO_LEADS: 5 coefficients for lipophilic, metal-binding, H-bond, and a flexibility penalty term. sLOO = 2 kcal/mol for 82 complexesSCORE (Tao & Lai, 2001); ChemScore (GOLD)
28Knowledge-based Potentials of Mean Force Scoring Functions (PMF) AssumptionsAn observed crystallographic complex represents the optimum placement of the ligand atoms relative to the receptor atomsThe Boltzmann hypothesis converts the frequencies of finding atom A of the ligand at a distance r from atom B of the receptor into an effective interaction energy between A and B as a function of rAdvantagesSimilar to empirical, but more general (much more distance data than binding energy data)DisadvantagesThe Boltzmann hypothesis originates from the statistics of a spatially uniform liquid, while receptor-ligand complex is a two-component non-uniform mediumPMF are typically pair-wise, while the probability to find atoms A and B at a distance r is non-pairwise and depends also on surrounding atoms
29PMF: ImplementationsVerkhivker et al.(1995): 12 atom pairs, 30 complexes (HIV-1 and simian immunodeficiency virus). Test on 7 other HIV-1 protease complexesWallqvist et al. (1995): 38 complexes, 21 atom types (10 C, 5 O, 5 N, 1 S). Test on 8 complexes sd=1.5 kcal/mol, and 20 complexes rmsd=1.0 A.Muegge et al. (1999): 697 complexes, 16 atom types from receptor & 34 from ligand, 282 statistically significant PMF interactions. Test on 77 diverse compounds: sd=1.8 log Ki. The PMF was combined with a vdw term to account for short-range interactions for DOCK4 docking:DrugScore (Gohlke et al, 2000), FlexX, BLEEPwhere
30Two Kinds of SearchSystematic ✽ Exhaustive ✽ Deterministic ✽ Outcome is dependent on granularity of sampling ✽ Feasible only for low dimensional problems ✽ e.g. DOT (6D)Stochastic✽ Random✽ Outcome varies✽ Must repeat the search to improve chances ofsuccess✽ Feasible for biggerproblems✽ e.g. AutoDock
31Searching Algorithms Systematic search Molecular dynamics Monte Carlo SimulationsSimulated annealingGenetic algorithmsLamarckian Genetic AlgorithmIncremental construction
32Systematic Search Uniform sampling of search space Relative position (3)Relative orientation (3)Rotatable bonds in ligand (n)Rotatable bonds in protein (m)FRED [Yang04]
33Systematic Search Uniform sampling of search space • Exhaustive, deterministic• Quality dependent on granularity of sampling• Feasible only for low-dimensional problemsExample: search all rotationsFRED [Yang04]
34Molecular Mechanics Energy minimization: • Start from a random or specific state (position, orientation, conformation)• Move in direction indicated by derivatives of energy function• Stop when reach local minimum
35Monte Carlo Simulations Tries to dock the ligand inside the receptor site through many random positions and rotationsIn ICM and MCDOCK, this method is used to make random moves of the ligand inside a receptor binding site.After each random move, a force-field based energy minimization is applied.To avoid trapping in local minima, Monte Carlo combine this procedure with other search methods, such as Simulated Annealing, Genetic Algorithm and Lamarckian GA
36Simulated AnnealingGlobal optimization technique based on the Monte Carlo method :• Start from a random or specific state(position, orientation, conformation)• Make random state changes, accepting up-hill moves with probability dictated by “temperature”• Reduce temperature after each move• Stop after temperature gets very small
37Genetic Algorithm (GA) Genetic search of parameter space:• Start with a random population of states• Perform random crossovers and mutations to make children• Select children with highest scores to populate next generation• Repeat for a number of iterationsGold [Jones95], AutoDock [Morris98]
38Lamarckian Genetic Algorithm LGA finds lowest fitness function (energy) values first, then maps these values to their respective genotypes• Each new child is allowed to create a new generation • Genetic algorithm plus Solis and Wets local search Better performance than either simulated annealing or genetic algorithm alone
39Incremental Extension Used in DOCK, FLEXX, FLOG and SurflexGreedy fragment-based construction:• Partition ligand into fragments
40Incremental Extension Greedy fragment-based construction:• Partition ligand into fragments• Place base fragment (e.g., with geometric hashing)
41Incremental Extension Greedy fragment-based construction:• Partition ligand into fragments• Place base fragment (e.g., with geometric hashing)• Incrementally extend ligand by attaching fragments
42Descriptor Matching Methods: DOCK Distance-compatibility graph in DOCK (Ewing and Kuntz 1997): distances between sphere centers and distances between ligand heavy atoms
43Descriptor Matching Methods Distance-compatibility graph in DOCK (Ewing and Kuntz 1997): distances between sphere centers and distances between ligand heavy atomsInteraction site matching in LUDI (Boehm 1992): HBA<->HBD, HYP<->HYPPose clustering and triplet matching in FlexX (Rarey et al. 1996): HBA<->HBD, HYP<->HYPShape-matching in FRED (OpeneyeVector matching in CAVEAT (Lauri and Bartlett 1994)Steric effects-matching in CLIX (Lawrence and Davis 1992)Shape chemical complementarity in SANDOCK (Burkhard et al. 1998)Surface complementarity in LIGIN: (Sobolev et al. 1996)H-bond matching in ADAM (Mizutani et al. 1994)
44Fragment-based Methods Flexibility and/or de novo designIdentification and placement of the base/anchor fragment are very importantEnergy optimization (during or post-docking) is importantExamplesIncremental construction in FlexX with triplet matching and pose clustering to maximize the number of favorable interactionsGrowing and/or joining in LUDI from pre-built fragment and linker libraries and maximize H-bond and hydrophobic interactionsAnchor-based fragment joining in DOCK
45Molecular Simulation: MD & MC Two major components:The description of the degrees of freedomThe energy evaluationThe local movement of the atoms is performedDue to the forces present at each step in MD (Molecular Dynamics)Randomly in MC (Monte Carlo)Usually time consuming:Search from a starting orientation to low-energy configurationSeveral simulations with different starting orientation must be performed to get a statistically significant resultGrid for energy calculation. Larger steps or multiple starting poses are often used for speed and sampling coverage in MD:Di Nola et al. 1994; Mangoni et al. 1999; Pak & Wang 2000; CDOCKER by Wu et al
46MC-based Dockingwhere T is reduced based on a so-called cooling schedule, and grid can be used for energy calculation.An advantage of the MC technique compared with gradient-based methods (e.g. MD) is that a simple energy function can be used which does not require derivative information, and able to step over energy barrier.AutoDOCK (Goodsell & Olson 1990). MCDOCK (Liu & Wang 1999), PRODOCK (Trosset & Scheraga 1999), ICM (Abagyan et al. 1994).Simulated annealing is used in DockVision (Hart & Read 1992) and Affinity (Accelrys Inc., San Diego, CA)Energy minimization is used in QXP (McMartin & Bohacek 1997).
47Genetic Algorithm Docking A fitness function is used to decide which individuals (configurations) survive and produce offspring for the next iteration of optimization. Degrees of freedom are encoded into genes or binary strings.The collection of genes (chromosome) is assigned a fitness based on a scoring function. There are three genetic operators:mutation operator randomly changes the value of a gene;crossover exchanges a set of genes from one parent chromosome to another;migration moves individual genes from one sub-population to another.Requires the generation of an initial population where conventional MC and MD require a single starting structure in their standard implementation.GOLD (Jones et al. 1997); AutoDock 3.0 (Morris et al. 1998); DIVALI (Clark & Ajay 1995).
48DOCK (Kuntz, UCSF) Receptor Structure X-ray crystalNMRhomologyBinding Mode Analysis for Lead Optimization: binding orientations and scores for each ligandsVirtual Screening for MTS/HTS and Library Design: ligands in the order of their best scoresBinding SiteScoring Orientations1. Energy scoring (vdw and electrostatic)2. Contact scoring (shape complementarity)3. Chemical scoring4. Solvation termsMolecular Surfaceof Binding SiteFiltersLigands3D structureatomic chargespotentialslabelingSpheres describing theshape of binding site andfavorable locations ofpotential ligand atomsMatching heavy atoms ofligands to centers ofspheres to generate thousandsof binding orientations
49FlexX (Tripos/SYBYL)Fragment-based, descriptor matching, empirical scoring (Rarey et al. 1996)Procedures:Select a small set of base fragment suitable for placement using a simple scoring function.Place base fragments with the pose clustering algorithm: rigid, triplet matching of H-bond & hydrophobic interactions, Bohm's scoring functionBuild up the remainder of the ligand incrementally from other fragmentsLigand conformationsMIMUMBA model with CSD derived low energy torsional angles for each rotatable bond and ring from CORINA.Multiple conformations for each fragment in the ligand building stepsOther works: Explicit waters are placed into binding site during the docking procedure using pre-computed water positions(Rarey et al. 1999). Receptor flexibility using discrete alternative protein conformations (Claussen et al. 2001; Claussen & Hindle 2003)
50GOLD GA method, H-bond matching, FF scoring (Jones et al. 1997) A configuration is represented by two bit strings:The conformation of the ligand and the protein defined by the torsions;A mapping between H-bond partners in the protein and the ligand.For fitness evaluation, a 3D structure is created from the chromosome representation. The H-bond atoms are then superimposed to H-bond site points in the receptor site.Fitness (scoring) function: H-bond, the ligand internal energy, the protein-ligand van der Waals energyRotational flexibility for selected receptor hydrogens along with full ligand flexibilityHighlights:Validation test set: 100 complexes, 66 with rmsd<2A.The structure generation is biased towards inter-molecular H-bonds.Hydrophobic fitting points was added (GOLD 1.2, CCDC, Cambridge, UK 2001).
51LUDI: Matching polar and hydrophobic groups Calculate protein and ligand interaction sites (H-bond or hydrophobic), which are defined by centers and surface, fromnon-bonded contact distributions based on a search through the CSD,a set of geometric rules,the output from the program GRID (Goodford 1985) which calculates binding energies for a given probe with a receptor molecule.Fit fragments onto the interaction sites.distance between interaction sites on the receptoran RMSD superposition algorithm,A hashing scheme to access and match surface triangles onto a triangle query of a ligand interaction center.A list-merging algorithm creates all triangles based on lists of fitting triangle edges for two of the three query triangle edges.Join/grow fragments using the databases of fragments and the same fitting algorithm.
52GLIDE (www.schrodinger.com) Funnel: site point search -> diameter test -> subset test -> greedy score -> refinement -> grid-based energy optimization -> GlideScore.Approximates a complete systematic search of the conformational, orientational, and positional space of the docked ligand.Hierarchical filters, including a rough scoring function that recognizes hydrophobic and polar contacts, dramatically narrow the search spaceTorsionally flexible energy optimization on an OPLS-AA nonbonded potential grid for a few hundred surviving candidate poses.The very best candidates are further refined via a MC sampling of pose conformation.A modified ChemScore (Eldridge et al. 1997) that combines empirical and force-field-based terms.Validation: 282 complexes, new ligand conformation, the top-ranked pose: 50%<1 A, ~33% >2 A.
53Matrix of Accuracy & Success Drug <- Quality Novel Lead <- ActiveReproduce binding mode (X-ray crystal structures)Predict binding affinity (free energies)Rank diverse set of compounds (by binding affinity)Enhance hit rate for database miningReduce false positive (Nselected-Nhits) and false negative (Nall_hits-Nhits)Fast enough for iterative SBDDexpt.pred.
54Accuracy of Docking Reality Boundary Current Experimental errors: kcal/mol (18-53%) with MSR (maximum significant ratio) as much as 3 fold (0.65 kcal/mol)Free energy calculation accuracy: ~1 kcal/mol (5.4 fold) starting with an accurate geometric model & fully samplingEntropy and solvation estimation need a sufficiently long simulation run with an accurate force field, an ensemble of explicit of water molecules, and fully samplingCurrentReproduce X-ray structure with rmsd<2A: 50-90% achievableBinding affinity: 1.5~2 log unit ( fold, kcal/mol)Correlation between scores and affinities, r^2<0.3Enthalpy ranking with minimization: ±5 kcal/molHit rate enhancement : 2~50 fold with hit rate 1-20% (and high false negative rate if 1~5% of total compounds selected)
55Background & Motivation Docking = process of starting with a set of coordinates for two distinct molecules and generating a model of the bound complexNumerous methods which perform protein- protein docking exist todayFourier correlation approach (Ritchie and Kemp, 2000) enabled the generation of billions of possible docked conformation via defined scoring functionsProblem: Many false-positives (good surface complementarity) that are far from the native complexMotivation: Need to develop methods to filter and rank the docked conformations such that near-native complexes can be identifiedClusPro: an automated, fast rigid-body docking and discrimination algorithm that:1) Rapidly filters docked conformations2) Ranks the conformations using clustering of computed pairwise RMSD values
56Input and Method Outline Free EnergyFilteringCAPRIReceptor-LigandPairs2,000 conformationsw/ low desolvation orelectrostatic energies2,000 dockedconformations for 48receptor-ligand pairsDiscriminationVia ClusteringTop 10 Clusters(Centers)Compare withNative Structure(RMSD)
57Part I: Free-Energy Filtering Goal: to identify docked conformations having good surface complementarity by selecting those w/ lowest desolvation and electrostatic energiesSurface complementarity is an important criteria due to the observation that proteins tend to bury large surface areas after complex formationElectrostatic and desolvation potentials (capturing the free energy of association) are used independently since different binding mechanisms are governed by different ratios of electrostatic/desolvation contributions500 structures w/ lowest values of desolvation free energy retained1500 structures w/lowest electrostatic energy retainedElectrostatics more sensitive to small coordinate perturbations noisyCannot combine desolvation and electrostatics due to the noisy behavior of electrostatics potential
58Part II: Clustering based on Pairwise RMSD By examining free energy landscapes of partially solvated receptor-ligand complexes: native binding site is expected to be characterized by a local minima having greatest widthIn other words, the most probable conformation is expected to be surrounded by lots of other low-energy conformationsGoal: to use a hierarchical clustering method to select and rank docked conformations having the most “neighbors” given a defined cluster radius (in terms of C-alpha RMSD)Procedure:1) Need to define fixed molecule (receptor) and flexible molecule (ligand)Define a set of relevant ligand residues to be within 10 Angs of any atom in receptorFor each docked conformation X, calculate its pairwise ligand RMSD with 1999 other conformations- Pairwise ligand RMSD = deviations between coordinates of X’s defined set of ligand residues and corresponding coordinates of another conformationCluster the set of 2000 docked conformations using a 2000 by 2000 matrix of RMSD values, and a cluster radius constraint of 9 Angs RMSD from the centerPick largest cluster rank cluster center remove conformations within this cluster from matrixPick next largest cluster -> rank cluster center remove conformations within this cluster from matrix keep iterating until matrix is empty
59ResultsResult I:Tested the discrimination step of the method on a benchmark set of 48 interacting protein pairs (2000 docked conformations each)In 31/48 protein pairs, top 10 predictions include at least one near-native complex (average RMSD of 5 angs from native structure)Result II:Tested method in the CAPRI (Critical Assessment of Predictions of Interactions) experiment and generated predictions for 9 target complexesRound 3 (automated server): ClusPro prediction ranked as #3 for Target 8
60ClusPro Web ServerUser Input: PDB files of the 2 protein structures that user would like to analyze in terms complex formationOutput: 10 (default) top predictions of docked conformations closest to native structureFirst, docking of the 2 proteins is performed using 2 established FFT-based docking programs (DOT and ZDOCK)Then, filtering and discrimination is performedServer allows for customization of parameters:Clustering radiusSmaller protein smaller radius maybe more suitableRelative number of desolvation and electrostatic best hits used during filteringNumber of predictions to generate (1-30)
61Protein Drug Discovery Although small molecule drugs are more prevalent therapeutics in current drug discovery, protein drugs is a rapidly growing area in pharmaceuticalsIt is true that protein therapeutics can be much more costly (in terms of R&D and synthesis) than small-molecule therapeutics, but protein therapeutics can deliver biological mechanisms that are not possible with small-molecule therapeuticsMultiple blockbuster protein drugs are currently on the marketConservative estimation: there exist between 3,000 and 10,000 possible drug targetsMany of these new targets offer great opportunities for the development of protein drugsIn 2002, drug companies sold nearly $33 billion in protein drugsRising at an average annual growth rate (AAGR) of 12.2%, this market is expected to reach $71 billion in 2008.Examples of popular classes of drug targets:1) G-protein-coupled receptorsCompounds will be screened for their ability to inhibit (antagonist) or stimulate (agonist) the receptor2) Protein kinasesCompounds will be screened for their ability to inhibit the kinase
62Application to Protein Drug Discovery Ideal Drug: demonstrate high specificity and high affinity for the target proteinIn order to evaluate the affinity of the potential drug with the target, you must first predict what the binding interface looks like, and the relative positions of the potential drug and targetClusPro is the first integrated automated server that incorporates both docking and discrimination steps for structural predictions of protein-protein complexesUsing ClusPro, one can generate many relative orientation/conformations of the 2 proteins filter using desolvation + electrostatics potentials discriminate via clustering find the best fit (closest to native structure from x-ray crystallography results) between the 2 proteinsTop ranked predictions of ClusPro further manual refinement and discrimination using existing biochemical constraints and analysis to eliminate false positives test binding affinity of promising protein pairs in vitro lead compounds used as starting points for drug development/optimizationCan use ClusPro to screen databases of various existing, recombinant, or de novo proteins for their interaction to a protein target of interestClusPro can be used to predict either:How a protein drug may bind (either inhibit or stimulate) a receptorHow 2 proteins bind, and based on the structural details of the interaction design/screen for a drug that can inhibit that interaction
632.1 Rigid Docking Protein and ligand fixed. Search for the relative orientation of the two molecules with lowest energyFastest way to perform an initial screening of a small-molecule database-> virtual-screening initiative
64Rotamer Libraries Rigid docking of many conformations: • Precompute all low-energy conformations• Dock each precomputed conformations as rigid bodiesGlide [Friesner04]
65Rigid Docking MethodsAll rigid-body docking methods have in common that superposition of point sets is a fundamental sub-problem that has to be solved efficiently:Geometric hashingPose clusteringClique detection
66Geometric HashingOriginates from computer vision technology for recognizing partially occluded objects in camera scenesGiven a picture of a scene and a set of objects within the picture, both represented by points in 2d space, the goal is to recognize some of the models in the sceneObjects with certain geometric features can be accessed very fast through a geometric hashing table
68Pose-ClusteringOriginally developed to detect objects in 2-D scenes with unknown camera locationFor each triangle of receptor compute the transformation to each ligand matching triangle.Cluster transformations.If a cluster grows large, a location with a high number of matching features is found
69eg. The FlexX MethodThe base fragment (the ligand core) is automatically selected and is placed into the active site using a pattern recognition technique called pose clusteringNext, the remainder of the ligand is built up incrementally from other fragments.
70Clique-Detection Nodes comprise of matches between protein and ligand Edges connect distance compatible pairs of nodesIn a clique all pair of nodes are connected
71Eg. DOCK 6The rigid body orienting code is written as a direct implementation of the isomorphous subgraph matching method of Crippen and KuhlConceptually, the algorithm matchings the centers of the ligand heavy atom to the centers of the receptor site spheres.
72DOCK 6 The algorithm follows the steps below: 1) Generate node 2) Label as match if atom and sphere edges are equivalent 3) Extend match by adding more nodes 4) Exhaustively generate set of non-degenerate matches 5) Use matches to create transformation matrices to move the entire moleculenode = pairing of one heavy atom and one sphere centeredge length = Euclidean distance between atom or sphere centersOnce an orientation has been generated, the interaction between the ligand and the receptor can be energetically optimized (ligand is allowed to be flexible in optimization)
732.2. Rigid Receptor, Flexible Ligand Multiple steps in the receptor – ligand interaction:Approach• Desolvation of the ligand and the binding site of a protein• Penetration into the protein cavity• Change of the ligand orientation• Adoption of the correct “active” conformation• Establishing of new H-bonds, electrostatic and hydrophobic contactsFree energy function :
74Challenges Predicting energetics of protein-ligand binding Searching space of possible poses & conformationsRelative position (3 degrees of freedom)Relative orientation (3 degrees of freedom)Rotatable bonds in ligand (n degrees of freedom)Rotatable bonds in protein (m degrees of freedom)
752.3. Flexible Receptor, Flexible Ligand Protein ﬂexibility can be introduced through Monte Carlo or Molecular DynamicsProtein can be divided into rigid and ﬂexible parts-> only ﬂexible receptor site atoms are free to moveThe procedure is still very slowLeach* developed a docking algorithm that sequentially ﬁxes the degrees of freedom of the protein side-chain atomsBroughton** reported the use of conformational samples from short protein MD simulation runs+*Leach AR. Ligand docking to proteins with discrete side-chain ﬂexibility. J Mol Biol1994; 235:345–356**Broughton HB. A method for including protein ﬂexibility in protein–ligand docking: Improving tools for database mining and virtual screening. J Mol Graph Model 2000;18:247–257
76AutoDock 4AMBER FF-based energy grid, flexible ligands, rigid protein as represented in a gridGA as a global optimizer combined with energy minimization as a local search methodThe fitness function:a Lennard-Jones 12-6 dispersion/repulsion terma directional hydrogen bond terma coulombic electrostatic potentiala term proportional to the number of sp3 bonds in the ligand to represent unfavorable entropy of ligand bindinga desolvation term
77Comparison of Two Recent Versions Autodock 4Autodock VinaScoring Function is based on AMBER FFFF includes electrostatic interactions, hydrogen bonds, desolvation energy.“Torsion Tree” for Ligand FlexibilityProtein Flexibility by side-chain rotationsToo many torsions are problematicFaster than AutoDock 4More accurate than AutoDock 4More User-friendly than AutoDock 4 in case of calculation of grid maps and clusters
78Our Case: Triacylglyceride Docking into Lipase Lipase: Geobacillus thermocatenulatus Lipase (BTL2)Crystal Structure in 20092.2 Å Resolution (Carrasco-López C et al, J Biol Chem. 2009, PMID: )2 Triton X-100 Molecule found in the crystal allows identification of putative binding pockets for the acyl chains (sn-1, sn-2, sn-3) of triglyceride.Tributyrin (4 carbons in chain)Tricaprylin (8 carbons in chain)BTL2 (Apo-enzyme in open-conformation)
81Work-Flow of Docking Study Separating bound molecules from active site cleftApo-enzymeLigandDefinition of flexible/rigid bondsAutodock 4.2 and VinaAssesment of Docking OutcomesPoses, ScoresSelection of Best Binding Modes
82Preparation of Input Structures: Protein (BTL2) Open-Lid Conformation displaying catalytic residues for ligand bindingLocating search space (grid-box) for triglyceride binding
83Preparation of Input Structures: Ligand (tricaprylin)
84Preparation of Input Structures: Ligand (tributyrin)
85Results and Evaluation of Poses: Tricaprylin (8C) The predicted binding affinity is in kcal/mol.rmsd/lb(c1, c2) = max(rmsd'(c1, c2), rmsd'(c2, c1))This score matches each atom in one conformation with itself in the other conformation, ignoring any symmetry
86Results and Evaluation of Poses: Tricaprylin (8C) S114_OHTCPN_OMode_1F17
87Results and Evaluation of Poses: Tricaprylin (8C) S114_OHTCPN_O_2_1F17
88Results and Evaluation of Poses: Tricaprylin (8C) S114_OHTCPN_O_7F17
89Results and Evaluation of Poses: Tricaprylin (8C) S114_OHTCPN_O_8F17S114_OHTCPN_O_7F17
90Results and Evaluation of Poses: Tributyrin (4C) _OHTBTN_O_3F17_1
91Results and Evaluation of Poses: Tributyrin (4C) _OHTBTN_O_6F17_1
92Results and Evaluation of Poses: Tributyrin (4C) _OHTBTN_O_7F17_1
943.1. Force field-based scoring functions The parameters of the Lennard–Jones potential vary depending on the desired ‘hardness’ of the potential.D-Score: Higher terms, 12–6 Lennard–Jones potential,result in increasingly repulsive potentials and will be less forgiving of close contacts between receptor and ligand atomsG-score: Lower terms, 8–4 Lennard–Jones potential, make the potential softer
963.2. Empirical methodsGoals: reproduce the experimental values of binding energies and with its global minimum directed to the X-ray crystal structureAdvantages: fast & direct estimation of binding affinityDisadvantagesOnly a few complexes with both accurate structures & binding energies knownDiscrepancy in the binding affinities measured from different labsHeavy dependence on the placement of hydrogen atomsHeavy dependence of transferability on the training setNo effective penalty term for bad structures
1003.2. Empirical methods Autodock Vina: Combines advantages of empirical methods and knowledge-based potentialsAutoDock Vina can be several orders of magnitude faster than AutoDock 4
1013.3. Knowledge-based methods Designed to reproduce experimental structures rather than binding energies.Protein–ligand complexes are modelled using relatively simple atomic interaction-pair potentials.AdvantagesSimilar to empirical, but more general (much more distance data than binding energy data)DisadvantagesThe Boltzmann hypothesis originates from the statistics of a spatially uniform liquid, while receptor-ligand complex is a two-component non-uniform mediumPMF are typically pair-wise, while the probability to find atoms A and B at a distance r is non-pairwise and depends also on surrounding atoms
1023.3. Knowledge-based methods Parametrized Pairwise Potential (PMF) score:BoltzmannconstantRadial distribution function for a protein atom i and a ligand atom jLigand volumecorrection factor
104Multiple Method Approach systematic searchconformationsrigid DOCKminimizationMD/SA(Wang et al. 1999)filtersinitial posesfiner dockingfinal scoring(FRED, GLIDE, DOCK)Similarity-guided MD simulated annealing to improve accuracy (Wu & Vieth 2004).Shape similarity & clustering to speed up conformational search in docking (Makino & Kuntz 1998).Better input or constrains for the existing docking engines
105Computing Scoring Functions Point-based calculation:• Sum terms computed at positions of ligand atoms (this will be slow)
106Computing Scoring Functions Grid-based calculation:• Precompute “force field” for each term of scoring function for each conformation of protein (usually only one)• Sample force fields at positions of ligand atoms-> Accelerate calculation of scoring function by 100X[Huey & Morris]
107Consensus ScoringTypically evaluate the ranking of binding modes measured with diﬀerent scoring functions and favor those that rank consistently high in several of themReduces false positive rateExamplesSYBYL Cscore (Tripos) : FlexX, PMF, DOCK energy, GOLD scoreC2 (Accelrys) : LigScore2, PLP, PMF, Ludi, JainFRED (OpenEye) : ChemScore, PB-SA, ChemGauss, PLP, ScreenScoreDOCK: AMBER FF, PMF, contact scores, ChemScore
109Docking Software: Important Factors Sensitivity on and transferability of the parameters, including the starting conformationAdaptability to additional scoring functions, pre- and/or post- docking processing and filtersAbility for iteratively refining docking parameter/protocol based on new resultsDesign, components, and results of validation studiesSpeed, user interface & control, I/O, structural file formatsUser learning curve, customer supports, and costCode availability and upgrading possibility
110Docking Softwares DOCK 6.0 (Ewing & Kuntz 1997) de novo design tools AutoDOCK 4.0 (Morris et al. 1998)GOLD (Jones et al. 1997)FlexX: (Rarey et al. 1996)GLIDE: (Friesner et al. 2004)ADAM (Mizutani et al. 1994)CDOCKER (Wu et al. 2003)CombiDOCK (Sun et al. 1998)DIVALI (Clark & Ajay 1995)DockVision (Hart & Read 1992)FLOG (Miller et al. 1994)GEMDOCK (Yang & Chen 2004)Hammerhead (Welch et al. 1996)LIBDOCK (Diller & Merz 2001)MCDOCK (Liu & Wang 1999)SDOCKER (Wu et al. 2004)de novo design toolsLUDI (Boehm 1992),BUILDER (Roe & Kuntz 1995)SMOG (DeWitte et al. 1997)CONCEPTS (Pearlman & Murcko 1996)DLD/MCSS (Stultz & Karplus 2000)Genstar (Rotstein & Murcko 1993)Group-Build (Rotstein & Murcko 1993)Grow (Moon & Howe 1991)HOOK (Eisen et al. 1994)Legend (Nishibata & Itai 1993)MCDNLG (Gehlhaar et al. 1995)SPROUT (Gillet et al. 1993)
111FRED (OpenEye www.eyesopen.com) Systematic, nonstochastic, dockingMultiple active site comparisonsMultiple simultaneous scoring functions and hit listsRMS clustering of hit-listsAlgorithm:1. Exhaustive Docking(a) Enumerate all possible poses of the ligand around the active site by rigidly rotating and translating each conformer within the site.(b) Filter the resulting pose ensemble by rejecting poses that do not fit within the larger of the two volumes specified by the receptor file’s shape potential grid and a contour level.2. Systematic solid body optimization by Shapegauss, PLP, Chemgauss2, Chemgauss3, CGO, CGT, Chemscore, OEChemscore or Screenscore3. Rank poses via the Consensus Structure method and discard all but the top ranked poses
112DOCK 6.4Generates many possible orientations/conformations of a putative ligand within a user-selected region of a receptor structureOrientations may be scored using several schemes designed to measure steric and/or chemical complementarity of the receptor-ligand complexEvaluate likely orientations of a single ligand, or to rank molecules from a databaseSearch databases for DNA-binding compoundsExamine possible binding orientations of protein-protein and protein-DNA complexesDesign combinatorial libraries
113GOLD GA method, H-bond matching, FF scoring (Jones et al. 1997) A configuration is represented by two bit strings:The conformation of the ligand and the protein defined by the torsions;A mapping between H-bond partners in the protein and the ligand.For fitness evaluation, a 3D structure is created from the chromosome representation. The H-bond atoms are then superimposed to H-bond site points in the receptor site.Fitness (scoring) function: H-bond, the ligand internal energy, the protein-ligand van der Waals energyHighlights:Full ligand flexibilityPartial protein flexibility, including protein side chain and backbone flexibility for up to ten user-defined residuesA choice of GoldScore, ChemScore, Astex Statistical Potential (ASP) or Piecewise Linear Potential (PLP) scoring functionsGOLD's genetic algorithm parameters are optimised for virtual screening applications
114Hammerhead Focus on screening large databases of small molecules The algorithm is fast enough to allow screening of a library of roughly small organic compounds in a few daysEmpirical scoring functionStart with automatic pocket finderBreaking ligands into fragments, and aligning each of these onto the protein.At each stage of the fragment alignment computation, gradient-descent pose optimization improves the conformation and alignment of the growing ligandRelaxing van der waals surface interpenetrationsImproving hydrogen bond and hydrophobic surface contact geometries.
115LUDI: Matching polar and hydrophobic groups Calculate protein and ligand interaction sites (H-bond or hydrophobic), which are defined by centers and surface, fromnon-bonded contact distributions based on a search through the CSD,a set of geometric rules,the output from the program GRID (Goodford 1985) which calculates binding energies for a given probe with a receptor molecule.Fit fragments onto the interaction sites.distance between interaction sites on the receptoran RMSD superposition algorithm,A hashing scheme to access and match surface triangles onto a triangle query of a ligand interaction center.A list-merging algorithm creates all triangles based on lists of fitting triangle edges for two of the three query triangle edges.Join/grow fragments using the databases of fragments and the same fitting algorithm.
116GLIDE (www.schrodinger.com) Funnel: site point search -> diameter test -> subset test -> greedy score -> refinement -> grid-based energy optimization -> GlideScore.Approximates a complete systematic search of the conformational, orientational, and positional space of the docked ligand.Hierarchical filters, including a rough scoring function that recognizes hydrophobic and polar contacts, dramatically narrow the search spaceTorsionally flexible energy optimization on an OPLS-AA nonbonded potential grid for a few hundred surviving candidate poses.The very best candidates are further refined via a MC sampling of pose conformation.A modified ChemScore (Eldridge et al. 1997) that combines empirical and force-field-based terms.Validation: 282 complexes, new ligand conformation, the top-ranked pose: 50%<1 A, ~33% >2 A.
118GRAMM v1.03 Protein-Protein Docking and Protein-Ligand Docking exhaustive 6-dimensional search through the relative translations and rotations of the molecules.Empirical approach to smoothing the intermolecular energy function.The quality of the prediction depends on the accuracy of the structures.
119CDOCKER & SDOCKER Randomly generate ligand seeds in the binding site High temperature MD using a modified version of CHARMMLocate minima from all of the MD simulationsFully minimizationCluster on position and geometryRank by energy (interaction + ligand conformation)SDOCKER: X-ray structure of complex as templates to guide dockingWu et al. 2003;Wu et al
120Docking WebserversAssessment of CAPRI Predictions 2009
121ClusPro Webserver Fast rigid-body docking Ligand-Protein, Protein-Protein DockingUse FFT-based docking programs (DOT and ZDOCK)1) Rapidly filters docked conformations2) Ranks the conformations using clustering of computed pairwise RMSD valuesDesolvation and Electrostatic energies are calculated
122HaddockDriven by experimental knowledge (e.g., from mutagenesis, mass spectrometry or a variety of NMR experiments)Protein-Protein Docking serverSupports nucleic acidsAlgorithm:Rigid-body Energy Minimization,Semi-flexible Refinement In Torsion Angle SpaceFinal refinement in explicit solvent.The HADDOCK score : van der Waals, electrostatic, desolvation and restraint violation energies together with buried surface area
123GRAMM-X Protein-Protein Docking server Use FFT for the global search of the best rigid body conformations.Use a smoothed Lennard-Jones potential on a ﬁne gridAbility to smooth the protein surface to account for possible conformational changeThe smoothing of the intermolecular energy landscape is achieved by increasing potential range and lowering the value of the repulsion partSoftened Lennard-Jones potential function:
124PatchDock and SymmDock Server Based on a rigid-body geometric hashing algorithmAim: Good molecular shape complementarity yieldAlgorithm divides the Connolly dot surface representation of the molecules into concave, convex and flat patches.Then, complementary patches are matched in order to generate candidate transformations.Each candidate transformation is further evaluated by a scoring function that considers both geometric fit and atomic desolvation energy.
125PatchDock detects transformations with high shape complementarity SymmDock ServerSymmDock restricts its search to symmetric cyclic transformations of a given order n.
126FireDock server Fast rigid-body docking algorithms Protein-protein docking
127RosettaDock protein-protein docking server Computationally intensive approach incorporating models ﬂexibilityMulti-start, multi-scale Monte Carlo based algorithmStart with 1000 independent structures, and the server returns pictures, coordinate files and detailed scoring information for the 10 top-scoring modelsThe low-resolution phase:Random rigid-body perturbationsScoring : residue–residue contacts and bumps, knowledge-based terms for residue environment and residue–residue pair propensities and for antibody-antigen targets, a score to favor interactions with antibody complementarity determining regions.The high-resolution (all-atom, including hydrogens) phaseSmaller rigid-body perturbations, sidechain optimization via rotamer packing and continuous minimization, and explicit gradient-based minimization of the rigid-body displacement.Scoring: the energy is dominated by van der Waals energies , orientation-dependent hydrogen bonding , implicit Gaussian solvation, side-chain rotamer probabilities and a low-weighted electrostatics energy.
129HexServer In order to address the main limitations of the Cartesian FFT approaches, we developed the ‘Hex’ spherical polarFourier (SPF) approach which uses rotational correlations(10), and which reduces execution times to a matter ofminutes
131Bold entries in the first column correspond to programmes that can be run on a web server.(a) Refined with SMOOTHDOCK.(b) Uses DOT or ZDOCK as search methods;(c) Refined with RDOCK
132Virtual ScreeningDrug discovery costs are too high: ~$800 millions, 8~14 years, ~10,000 compounds (DiMasi et al. 2003; Dickson & Gagnon 2004)Drugs interact with their receptors in a highly specific and complementary manner.Core of the target-based structure-based drug design (SBDD) for lead generation and optimization.Lead is a compound thatshows biological activity,is novel, andhas the potential of being structurally modified for improved bioactivity, selectivity, and drugeability.
133Drug, Chemical & Structural Space Drug-like: MDDR (MDL Drug Data Report) >147,000 entries, CMC (Comprehensive Medicinal Chemistry) >8,600 entriesNon-drug-like: ACD (Available Chemicals Directory) ~3 million entriesLiteratures and databases, Beilstein (>8 million compounds), CAS & SciFinderCSD (Cambridge Structural Database, ~3 million X-ray crystal structures for >264,000 different compounds and >128,00 organic structuresAvailable compoundsAvailable without exclusivity: various vendors (& ACD)Available with limited exclusivity: Maybridge, Array, ChemDiv, WuXi Pharma, ChemExplorer, etc.Corporate databases: a few millions in large pharma companies
135Docking to Nucleic Acid Targets RNA and DNA as potential drug targetsRibosome RNA structures (Agalarov et al. 2000; Ban et al. 2000; Filikov et al. 2000; Nissen et al. 2000; Wimberly et al. 2000)Highly charged environments, well-defined binding pocketDOCK identified compounds selectively bind to RNA duplexes or DNA qudraplexes (Chen et al. 1996; Chen et al. 1997). The portions in the DOCK suite that calculate electrostatics, including solvation, partial charges, and scoring function were recently optimized for RNA targets (Downing et al. 2003; Kang et al. 2004).A MC minimization and an empirical scoring function which accounts for solvation, isomerization free energy, and changes in conformational entropy were used to rank compounds (Hermann & Westhof 1999).