Presentation is loading. Please wait.

Presentation is loading. Please wait.

Development and Validation of a Genetic Algorithm for Flexible Docking Gareth Jones, Peter Willet, Robert C. Glen, Andrew R. Leach and Robin Taylor J.

Similar presentations


Presentation on theme: "Development and Validation of a Genetic Algorithm for Flexible Docking Gareth Jones, Peter Willet, Robert C. Glen, Andrew R. Leach and Robin Taylor J."— Presentation transcript:

1 Development and Validation of a Genetic Algorithm for Flexible Docking Gareth Jones, Peter Willet, Robert C. Glen, Andrew R. Leach and Robin Taylor J. Mol. Biol., 1997, 267

2 Bioinformatics Seminar 2005 Matthias Dietzen2 Contents Introduction Docking Genetic Algorithm Development of GOLD Validation of GOLD Conclusions Discussion

3 Introduction

4 Bioinformatics Seminar 2005 Matthias Dietzen4 Introduction Nowadays, computer-aided design of therapeutic molecules is the method of choice Screening virtual libraries for novel chemical entities and predicting their binding modes for a given receptor would save both time and money  Satisfies „fail fast, fail cheap“

5 Docking Definition Problems

6 Bioinformatics Seminar 2005 Matthias Dietzen6 Docking Definition “Docking tries to find the energetically most feasible three-dimensional arrangement of two molecules in close contact with each other.” Use of Docking: Target Validation Lead Discovery Lead Optimization

7 Bioinformatics Seminar 2005 Matthias Dietzen7 Docking Problems 3 different complexities: Rigid (comparatively simple) Semi-flexible (hard) Flexible (undoable) Combinatorial explosion when accounting for flexibility of ligand and/or receptor forces the development of highly sophisticated algorithms One of these: Genetic Algorithm

8 Genetic Algorithm Definition Model Algorithm

9 Bioinformatics Seminar 2005 Matthias Dietzen9 Genetic Algorithm Definition “A Genetic Algorithm evolves the population of possible solutions through genetic operators to a final population, optimizing a predefined fitness function.” Underlying principle: Darwin‘s Theory of Evolution Population growth is limited by the food available Individuals using this food more efficiently will produce more offspring displacement of less adapted individuals  „Survival of the fittest“

10 Bioinformatics Seminar 2005 Matthias Dietzen10 Genetic Algorithm Model A Genetic Algorithm provides: Population(s) of individuals competing against each other Each individual represented as a set of chromosomes encoding the individual‘s features Genetic Operators modelling processes of evolution A Fitness Function ranking the individuals of one generation

11 Bioinformatics Seminar 2005 Matthias Dietzen11 Genetic Algorithm Algorithm 1. Select and initialize the set of genetic operators 2. Randomly create an initial population and rank by fitness 3. Select parents in dependence of their ranking 4. Breed children by the use of genetic operators 5. Evaluate the children‘s fitness 6. Replace least fit members of the population 7. Go to 3 until termination or convergence

12 Development of GOLD Chromosomes Fitness Function Genetic Operators

13 Bioinformatics Seminar 2005 Matthias Dietzen13 Development of GOLD Chromosomes 2 binary strings for conformation information of both ligand and protein 1 byte for each bond‘s rotation angle 2 integer strings for mapping of hydrogen bonds Acceptor (ligand) -> Donor (receptor) Donor (ligand) -> Acceptor (receptor) Use of least squares fitting to form as many hydrogen bonds as possible

14 Bioinformatics Seminar 2005 Matthias Dietzen14 Development of GOLD Fitness Function 3 energy terms H_Bond_Energy: sum of energies of all hydrogen bonds in the complex Complex_Energy: steric energy of interaction between ligand and receptor Internal_Energy: the ligand‘s steric and torsional energy based on molecular mechanics Final fitness score: -(H_Bond_Energy+Internal_Energy+Complex_Energy)

15 Bioinformatics Seminar 2005 Matthias Dietzen15 Development of GOLD Fitness Function - H_Bond_Energy E pair x distance_weight x angle_weight Geometrical arrangement of donor hydrogen, acceptor and any lone-pairs hydrogen-bond energy between a donor and an acceptor

16 Bioinformatics Seminar 2005 Matthias Dietzen16 Development of GOLD Fitness Function - H_Bond_Energy E pair Uses model fragments for donor (d) and acceptor (a) Accounts for displacement of water (w) Initially, Donor and acceptor are in solution, but when forming a hydrogen-bond, water is stripped off  E pair = (E da + E ww ) – (E dw + E aw )

17 Bioinformatics Seminar 2005 Matthias Dietzen17 Development of GOLD Genetic operators Island model: isolated subpopulations instead of one large population No increase effectiveness but efficiency five subpopulations, each with 100 individuals Use of four genetic operators: Crossover Mutation Migration Selection

18 Bioinformatics Seminar 2005 Matthias Dietzen18 Development of GOLD Genetic operators Crossover Inherits the parents‘ features by crossover of chromosomes Mutation Changes a single individual‘s chromosome randomly (bit flipping) Migration Copies an individual from one island to a neighbouring one Selection Relative probability to chose fittest individual as a parent Pressure: 1.1

19 Validation of GOLD Data set Classification Results

20 Bioinformatics Seminar 2005 Matthias Dietzen20 Validation of GOLD Data set Data set of 100 protein ligand complexes of pharmacological interest from PDB High Variance of test set: Heavy atoms between 6 and 55 Rotatable bonds between 0 and 30 Many functionally different protein types Metalloenzymes Hand-curated with respect to charges, protonation and tautomeric states

21 Bioinformatics Seminar 2005 Matthias Dietzen21 Validation of GOLD Classification 20 GA runs per complex Ensures to find best solution Four subjective categories: Good: binding mode, hydrogen-bonds, close contacts, metal coordination correct Close:result acceptable, but with some displacement of ligand groups from the experimental result Errors:Partially correct, but with significant errors Wrong:Completely incorrect Preference to rmsd, small rmsd may mask errors

22 Bioinformatics Seminar 2005 Matthias Dietzen22 Validation of GOLD Classification Left: good Right: errors

23 Bioinformatics Seminar 2005 Matthias Dietzen23 Validation of GOLD Results Prediction: 71/100 in categories good and close Complexes predicted after 2, 5, 10 runs GA runsCorrectly predicted 249/71 563/71 1065/71

24 Bioinformatics Seminar 2005 Matthias Dietzen24 Validation of GOLD Results Ligand composition:

25 Bioinformatics Seminar 2005 Matthias Dietzen25 Validation of GOLD Results Problems in resolution:

26 Bioinformatics Seminar 2005 Matthias Dietzen26 Validation of GOLD Results Summary:  71% prediction accuracy  In general, GOLD does not require 20 runs  fails for many heavy atoms/torsions due to complexity  fails for few hydrogen bonds due to fitness score  Prediction rate of 77% for resolution ≤2.5

27 Conclusions

28 Bioinformatics Seminar 2005 Matthias Dietzen28 Conclusions Genetic Algorithms in general: Random initialization (non-deterministic) Convergence to global minimum Solutions are suboptimal Need of a local minimizer GOLD: Bit vector mutation leads to solutions far from the original individual Problems of docking large, flexible, hydrophobic ligands

29 Thank you for your attention!

30 Discussion

31 Bioinformatics Seminar 2005 Matthias Dietzen31 Validation of GOLD Results Ligand composition (good+close/errors+wrong): Heavy AtomsTorsions & free corners% H-bonding Max 52/5528/4066.7/53.9 Avg 20.4/24.37.9/11.431.9/25.1 Min 6/90/08.8/4.8

32 Bioinformatics Seminar 2005 Matthias Dietzen32 Development The Fitness Function – H_Bond_Energy distance_wt 1, d ≤ 0.25 Å distance_wt: d (d max – d)/(d max – 0.25 Å), d in [0.25 Å,d max ] 0, d ≥ d max d max varies linearly from 4.0 Å (when the GA starts) to 1.5 Å (after 75.000 genetic operations)  allows long range interactions in the beginning but only close contacts in the end

33 Bioinformatics Seminar 2005 Matthias Dietzen33 Development The Fitness Function – H_Bond_Energy angle_wt Acceptor w/o lone-pair directional preference: angle_wt = 1 For acceptors with directionality in the plane of lone-pairs: 1, θ < 20° angle_wt: θ [(60°– θ) / (60°-20°)] 2, θ in [20°,60°] 0, θ > 60° For acceptors with directionality along the lone-pairs: 1, θ > 160° angle_wt: Φ [(160°– θ) / (160°-60°)] 2, θ in [60°,160°] 0, θ < 60°

34 Bioinformatics Seminar 2005 Matthias Dietzen34 Development The Fitness Function – Complex_Energy ∑ atoms i ∑ atoms j E ij E ij = A/d ij 8 – B/d ij 4 (8-4 potential) smoother than standard Lennard-Jones 12-6 potential A, B chosen to reproduce the minimum of 12-6 potential Adjustments for hydrogen bonds E ij = 0 for interaction of donor-H and acceptor Distance between donor and acceptor is scaled by 1.43  reduces vdW-radii by 70%

35 Bioinformatics Seminar 2005 Matthias Dietzen35 Development The Fitness Function – Complex_Energy Let –k ij be minimum energy of interaction between two atoms i and j For E ij > scale x k ij => E ij = 1.5 x scale x k ij scale varies logarithmically from 1.0 (when GA starts) to 120.0 (after 75.000 genetic operations)  Encourages to form close contacts early in a GA run, while avoiding steric clashes in the end

36 Bioinformatics Seminar 2005 Matthias Dietzen36 Development The Fitness Function – Internal_Energy Internal_Energy steric energy (for each two atoms i,j) E ij = C/d ij 12 - D/d ij 6 with C and D chosen such that E ij is minimal for d ij = r i +r j torsional energy (for four consecutively bonded atoms i,j,k,l) E ijkl = ½ V ijkl [1 + η ijkl / |η ijkl | cos(|η ijkl | x ω ijkl ) ] with ω torsional angle η periodicity (predefined) V barrier to rotation (predefined)


Download ppt "Development and Validation of a Genetic Algorithm for Flexible Docking Gareth Jones, Peter Willet, Robert C. Glen, Andrew R. Leach and Robin Taylor J."

Similar presentations


Ads by Google