Presentation is loading. Please wait.

Presentation is loading. Please wait.

A Multiobjective Approach to Combinatorial Library Design Val Gillet University of Sheffield, UK.

Similar presentations


Presentation on theme: "A Multiobjective Approach to Combinatorial Library Design Val Gillet University of Sheffield, UK."— Presentation transcript:

1 A Multiobjective Approach to Combinatorial Library Design Val Gillet University of Sheffield, UK

2 Outline  SELECT  GA based program for combinatorial library design  Combinatorial subset selection in product-space  Multiobjective optimisation via weighted-sum fitness function  Limitations of a weighted-sum approach  MoSELECT  Multiobjective optimisation via MOGA

3 Library Design is a Multiobjective Optimisation Problem  Early HTS results disappointing  Low hit rates  Hits too lipophilic; too flexible; high molecular weights…  Diverse libraries  Distance-based/cell-based diversity  Bioavailability; cost; ease of synthesis…  Focused/targeted libraries  Similarity to known active; predicted active by QSAR model; fit to receptor site  Bioavailability; cost,….

4 Product-Based Library Design  A two-component combinatorial library can be represented by a 2D array  A combinatorial subset can be defined by intersecting rows and columns of the array  Exploring all combinatorial subsets is equivalent to testing all permutations of the rows and columns of the array

5 R1R2 6  4 subset 1182307251011918  Chromosome encoding  each chromosome represents a combinatorial subset as an integer string  one partition for each reactant pool  the size of a partition equals the no. of reactants required from the corresponding pool  Crossover, mutation and roulette wheel parent selection are used to evolve new potential solutions Selecting Combinatorial Subsets Using a GA

6 Multiobjective Optimisation in SELECT  Weighted-sum fitness function  enumerate the combinatorial library represented by a chromosome  calculate descriptors for molecules in the library  Objectives are scaled and user defined weights are applied

7 Multiobjective Optimisation in SELECT cont.  Diversity indices  distance-based (e.g. sum of pairwise dissimilarities and Daylight fingerprints)  cell-based  Physical property terms  minimise the difference between the distribution in the library and some reference distribution, e.g. “drug-like” profile derived from WDI  Cost: £  minimise the cost of the library

8 Library Enumeration in SELECT  Virtual library is enumerated upfront  ADEPT (A Daylight Enumeration and Profiling Tool)  Identify potential reactants  Filter out unwanted ones  Enumerate virtual library Reaction Tookit (Reaction transforms; MTZ language)  Descriptors are calculated upfront  Combinatorial subset accessed via fast lookup

9 Example: Amide Library 0 5 10 15 20 25 0200400600800 Molecular weight Percentage of Compounds WDI Reactant-based Product-based  Product-based selection: diversity & molecular weight profile (Diversity 0.573)  10K virtual library 100 amines  100 carboxylic acids  30 x 30 amide subsets  WDI – World Drugs Index  Reactant-based selection: diversity (Diversity 0.564 )

10 Limitations of a Weighted-Sum Fitness Function  Definition of fitness function difficult especially for different types of objectives  e.g. molecular weight profile and cost  Setting of weights is non-intuitive  Can result in regions of search space being obscured especially when objectives are in competition  Difficult to monitor progress since >1 objective to follow simultaneously  A single solution is found

11 Varying Weights in SELECT  Objectives are in competition resulting in trade-offs  A family of alternative solutions exist that are all equivalent

12 Multiobjective Optimisation  Evolutionary algorithms, e.g., GAs  operate with a population of individuals  well suited to search for multiple solutions in parallel  readily adapted to deal with multiobjective optimisation  MOGA: MultiObjective Genetic Algorithm  Fonseca & Fleming. IEEE Transactions on Systems, Man, and Cybernetics-Part A: Systems and Humans, 28(1), 1998, 26-37.

13 MOGA  Multiple objectives are handled independently without summation and without weights  A hyper-surface is mapped out in the search space  represents a continuum of solutions where all solutions are seen as equivalent  represents compromises or trade-offs between the various objectives  solutions are called non-dominated, or Pareto solutions.  A family of non-dominated solutions is sought rather than a single solution

14 Dominance & Pareto Ranking  A non-dominated individual is one where an improvement in one objective results in a deterioration in one or more of the other objectives when compared with the other individuals in the population 0 0 0 0 0  Pareto ranking: an individual’s rank corresponds to the number of individuals in the current population by which it is dominated f2f2 f1f1 4 2 1 0 0 0 0 0 A B

15 SELECT Single solution Initialise Population Select parents Apply genetic operators Calculate objectives: a,b,c... Apply fitness function f=w 1 a + w 2 b + w 3 c +... Rank based on fitness Test for convergence MoSELECT* Family of solutions Initialise Population Apply genetic operators Calculate objectives: a,b,c... Calculate dominance: a, b,c Rank using Pareto Ranking: based on dominance Test for convergence Select parents * Patent Applied for

16 1000 iterations5000 iterations 0 iterations100 iterations MoSELECT: Search Progress

17  Each run of MoSELECT results in a family of solutions  Finding the same coverage of solutions using SELECT would require multiple runs using various combinations of weights  One run of MoSELECT takes the same cpu time as one run of SELECT Family of Solutions 5000 iterations 0.574 0.578 0.582 0.586 0.59 0.594 0.580.60.620.64  MW Diversity

18 Focused Library: Aminothiazoles  -bromoketones & thioureas extracted from ACD  ADEPT used to  filter reactants (MW < 300; RB < 8)  enumerate virtual library => 12850 products (74  -bromoketones & 170 thioureas)  MoSELECT used to design 15×30 subsets optimised on  Similarity to a target compound (Daylight fingerprints)  Cost ($/g)

19 MoSELECT Solutions: 1 0 iterations 5000 iterations

20 MoSELECT Solutions: 2 5000 iterations Running MoSELECT with niching

21 Moving to > 2 Objectives: Parallel Graph Representation Each objective is scaled using the Max and Min values achieved when the objective is optimised independently 5000 iterations 0.578 0.582 0.586 0.59 0.594 0.580.60.620.64  MW Diversity

22 Focused Library: Amides  100 × 100 virtual library  MoSELECT used to design 10 × 10 subsets  Objectives  Similarity to a target Sum of similarities using Daylight fps  Predicted bioavailability Each compound rated from 1 to 4 Sum of ratings  Hydrogen bond profile  Rotatable bond profile

23 MoSELECT Solutions  Population size 50  Iteration 5000  Niching 30%  Number of solutions = 11  CPU 53s (R12K 360 MHz)

24 Conclusions  Advantages of MoSELECT  a family of equivalent solutions is obtained in a single run with each solution representing one combinatorial library  this is achieved at vastly reduced computational cost compared to performing multiple runs of SELECT  no need to determine weights for objectives  optimisation of different types of objectives is readily achieved  visualisation of the search progress allows trade-offs between objectives to be observed  the user can make an informed choice on which solution(s) to explore

25 Acknowledgements  Illy Khatib, Peter Willett; Information Studies, University of Sheffield  Peter Fleming; Automatic Control and Systems Engineering, University of Sheffield  Darren Green, Andrew Leach; GlaxoSmithKline, UK  Funding by GlaxoSmithKline, UK  John Bradshaw; Daylight  Daylight for software support


Download ppt "A Multiobjective Approach to Combinatorial Library Design Val Gillet University of Sheffield, UK."

Similar presentations


Ads by Google