Presentation on theme: "A Multiobjective Approach to Combinatorial Library Design Val Gillet University of Sheffield, UK."— Presentation transcript:
A Multiobjective Approach to Combinatorial Library Design Val Gillet University of Sheffield, UK
Outline SELECT GA based program for combinatorial library design Combinatorial subset selection in product-space Multiobjective optimisation via weighted-sum fitness function Limitations of a weighted-sum approach MoSELECT Multiobjective optimisation via MOGA
Library Design is a Multiobjective Optimisation Problem Early HTS results disappointing Low hit rates Hits too lipophilic; too flexible; high molecular weights… Diverse libraries Distance-based/cell-based diversity Bioavailability; cost; ease of synthesis… Focused/targeted libraries Similarity to known active; predicted active by QSAR model; fit to receptor site Bioavailability; cost,….
Product-Based Library Design A two-component combinatorial library can be represented by a 2D array A combinatorial subset can be defined by intersecting rows and columns of the array Exploring all combinatorial subsets is equivalent to testing all permutations of the rows and columns of the array
R1R2 6 4 subset 1182307251011918 Chromosome encoding each chromosome represents a combinatorial subset as an integer string one partition for each reactant pool the size of a partition equals the no. of reactants required from the corresponding pool Crossover, mutation and roulette wheel parent selection are used to evolve new potential solutions Selecting Combinatorial Subsets Using a GA
Multiobjective Optimisation in SELECT Weighted-sum fitness function enumerate the combinatorial library represented by a chromosome calculate descriptors for molecules in the library Objectives are scaled and user defined weights are applied
Multiobjective Optimisation in SELECT cont. Diversity indices distance-based (e.g. sum of pairwise dissimilarities and Daylight fingerprints) cell-based Physical property terms minimise the difference between the distribution in the library and some reference distribution, e.g. “drug-like” profile derived from WDI Cost: £ minimise the cost of the library
Library Enumeration in SELECT Virtual library is enumerated upfront ADEPT (A Daylight Enumeration and Profiling Tool) Identify potential reactants Filter out unwanted ones Enumerate virtual library Reaction Tookit (Reaction transforms; MTZ language) Descriptors are calculated upfront Combinatorial subset accessed via fast lookup
Limitations of a Weighted-Sum Fitness Function Definition of fitness function difficult especially for different types of objectives e.g. molecular weight profile and cost Setting of weights is non-intuitive Can result in regions of search space being obscured especially when objectives are in competition Difficult to monitor progress since >1 objective to follow simultaneously A single solution is found
Varying Weights in SELECT Objectives are in competition resulting in trade-offs A family of alternative solutions exist that are all equivalent
Multiobjective Optimisation Evolutionary algorithms, e.g., GAs operate with a population of individuals well suited to search for multiple solutions in parallel readily adapted to deal with multiobjective optimisation MOGA: MultiObjective Genetic Algorithm Fonseca & Fleming. IEEE Transactions on Systems, Man, and Cybernetics-Part A: Systems and Humans, 28(1), 1998, 26-37.
MOGA Multiple objectives are handled independently without summation and without weights A hyper-surface is mapped out in the search space represents a continuum of solutions where all solutions are seen as equivalent represents compromises or trade-offs between the various objectives solutions are called non-dominated, or Pareto solutions. A family of non-dominated solutions is sought rather than a single solution
Dominance & Pareto Ranking A non-dominated individual is one where an improvement in one objective results in a deterioration in one or more of the other objectives when compared with the other individuals in the population 0 0 0 0 0 Pareto ranking: an individual’s rank corresponds to the number of individuals in the current population by which it is dominated f2f2 f1f1 4 2 1 0 0 0 0 0 A B
SELECT Single solution Initialise Population Select parents Apply genetic operators Calculate objectives: a,b,c... Apply fitness function f=w 1 a + w 2 b + w 3 c +... Rank based on fitness Test for convergence MoSELECT* Family of solutions Initialise Population Apply genetic operators Calculate objectives: a,b,c... Calculate dominance: a, b,c Rank using Pareto Ranking: based on dominance Test for convergence Select parents * Patent Applied for
Each run of MoSELECT results in a family of solutions Finding the same coverage of solutions using SELECT would require multiple runs using various combinations of weights One run of MoSELECT takes the same cpu time as one run of SELECT Family of Solutions 5000 iterations 0.574 0.578 0.582 0.586 0.59 0.594 0.580.60.620.64 MW Diversity
Focused Library: Aminothiazoles -bromoketones & thioureas extracted from ACD ADEPT used to filter reactants (MW < 300; RB < 8) enumerate virtual library => 12850 products (74 -bromoketones & 170 thioureas) MoSELECT used to design 15×30 subsets optimised on Similarity to a target compound (Daylight fingerprints) Cost ($/g)
MoSELECT Solutions: 2 5000 iterations Running MoSELECT with niching
Moving to > 2 Objectives: Parallel Graph Representation Each objective is scaled using the Max and Min values achieved when the objective is optimised independently 5000 iterations 0.578 0.582 0.586 0.59 0.594 0.580.60.620.64 MW Diversity
Focused Library: Amides 100 × 100 virtual library MoSELECT used to design 10 × 10 subsets Objectives Similarity to a target Sum of similarities using Daylight fps Predicted bioavailability Each compound rated from 1 to 4 Sum of ratings Hydrogen bond profile Rotatable bond profile
MoSELECT Solutions Population size 50 Iteration 5000 Niching 30% Number of solutions = 11 CPU 53s (R12K 360 MHz)
Conclusions Advantages of MoSELECT a family of equivalent solutions is obtained in a single run with each solution representing one combinatorial library this is achieved at vastly reduced computational cost compared to performing multiple runs of SELECT no need to determine weights for objectives optimisation of different types of objectives is readily achieved visualisation of the search progress allows trade-offs between objectives to be observed the user can make an informed choice on which solution(s) to explore
Acknowledgements Illy Khatib, Peter Willett; Information Studies, University of Sheffield Peter Fleming; Automatic Control and Systems Engineering, University of Sheffield Darren Green, Andrew Leach; GlaxoSmithKline, UK Funding by GlaxoSmithKline, UK John Bradshaw; Daylight Daylight for software support
Your consent to our cookies if you continue to use this website.