De Novo design tools for the generation of synthetically accessible ligands Peter Johnson, Krisztina Boda, Shane Weaver, Aniko Valko, Vilmos Valko.

De Novo design tools for the generation of synthetically accessible ligands Peter Johnson, Krisztina Boda, Shane Weaver, Aniko Valko, Vilmos Valko

To suggest potential leads that  bind strongly to a given protein because of shape and electrostatic complementarity  Are easy to synthesise Receptor Structure Based Drug Design  Docking methods (preferably flexible docking) identify new lead structures by rapidly screening a database of 3-D structures of known compounds  De novo design methods (such as SPROUT) construct a diverse set of entirely novel potential leads from scratch Approaches: Objective:

Detects potential binding pockets of the protein structures Identifies favourable hydrogen bonding interaction sites (H-bonding, hydrophobic, covalent, metal, user defined) Docks structures to target interaction sites Generates 3D molecular structures of novel ligands by linking the docked starting fragments together in an incremental construction scheme Scores, sorts and clusters the solutions SPROUT Components

De novo design programs such as SPROUT can suggest large sets of entirely novel potential leads Problem with Large Answer Sets Powerful heuristics are necessary to evaluate (and reduce) often large answer sets Eliminate candidates with poor estimated binding affinity Binding Affinity Score Eliminate candidates with complex molecular structures Synthetic Feasibility

For de novo design prediction of synthetic accessibilty is equally important Hypothetical ligands, including those predicted to bind very strongly, have no practical value unless they can be readily synthesised. Our Attempts to Provide Solutions: CAESA (estimates synthetic accessibility) Complexity Analysis (estimates structural complexity and drug-likeness) SynSPROUT ( avoids the problem by building constraints into the structure generation process)

CAESA Computer Assisted Estimation of Synthetic Accessibility Glenn Myatt Jon Baber

Goals of CAESA Project  Clear need for automated method of ranking hypothetical compounds according to perceived ease of synthesis  Good synthetic chemists can do this job themselves on small number of compounds but are unwilling to do it for hundreds or thousands of compounds  CAESA attempts to do the same job but never gets bored!

Estimation of Synthetic Accessibility: Criteria used by CAESA CAESA scores the synthetic accessibility of structures using two main criteria: a)An estimate of structural complexity:  stereocentres  complex topological features (fusions etc.)  functional group complexity b)Availability of good starting materials:  rapid retrosynthetic analysis  database of commercially available materials  reaction rule base (editable)

CAESA Components

Automatic Selection of Starting Materials Starting Materials and Synthetic Accessibility  Availability of suitable starting materials very important factor - good starting materials can dramatically reduce the difficulty of synthesising a compound.  Good starting materials for part of the target molecule means the analysis of structural synthetic difficulty or complexity can be directed to just those portions of the target molecule that cannot be made from available starting materials  Finding good starting materials through retrosynthetic analysis also provides possible synthetic routes as a byproduct

Traditional Retrosynthetic Analysis

Bidirectional Search for Synthetic Routes

Example of Starting Material Selection

Summary of CAESA Features  CAESA carries out a retrosynthetic analysis which terminates when a starting material from a database (such as ACD) is found  Found starting materials are scored according to length and difficulty of reaction sequence and coverage of target compound  All chemistry rules and transformations are described in editable text knowledge bases easily modified by chemists  Quality of the analysis depends on the chemistry included in the knowledge bases and the comprehensiveness of the starting material libraries  But CAESA is relatively slow and speedier methods needed for pruning of large data sets

Alternative Approach Complexity Analysis Based on statistical distribution of various substitution patterns found in databases of existing drugs and available starting materials. Molecular Complexity Analysis of de Novo Designed Ligands Krisztina Boda and A. Peter Johnson J. Med. Chem.; 2006; ASAP Web Release Date: 26-Jan-2006

If a molecular structure contains ring and chain substitution patterns which are common in Assumption Complexity analysis based on statistical distribution of various substitution patterns existing drugs than the structure is likely to be “drug-like” as well as readily synthesisable available starting materials, then the structure is likely to be readily synthesisable

Building Complexity Database Input structure Enumerate chain patterns 1-centred 2-centred 3-centred 4-centred Enumerate ring/ring substitution patterns Database of chains Database of rings/ring substitutions

Atom Substitution Hierarchy Ring (and chain) substitutions are organised in hierarchies 3591 158649468853762 610420 83 352 2130 3780 266 632 The hierarchy stores: Atom type sequence Number of occurrences Binding properties Total occurrences of the topology: 11,801

Ligand Complexity Analysis 3. Match canonical name against the hierarchy roots of the database 4. Retrieval of frequency of occurrences → Calculate score DATABASE of hierarchies + frequency of occurrences 5. Rank structures by complexity score 1. Enumerate ring and chain patterns 2. Generate canonical names for each atom pattern Canonical name : ACanonical name : BCanonical name : C [More Patterns] Speed of Complexity Analysis ~ 1000-1200 structures / minute on Linux PC (3GHz)

CONCEPT Calculation of Complexity Score Penalise atom patterns which are infrequent or not present in the complexity database. In SPROUT the complexity analysis is followed by ranking the putative ligands according to their evaluated complexity score. Penalty values can be altered to tailor the system for different applications. The penalty values used in the examples presented here are 25, 20, 15, 10 for 1-,2-,3- and 4-centred chain patterns, 40 and 30 for rings and ring substitutions.

Validation Experiment Comparison with CAESA Both methods used to estimate synthetic accessibility for the same set of 50 top selling drugs

CAESA vs. Complexity Analysis Elapsed time: CAESA : 703 sec Complexity Analysis : 8 sec Complexity scores are calculated using the complexity database derived from available SMs + 2.0 penalty for each identified stereo centre in the structures.

Complexity Analysis vs CAESA  More suitable for prioritization of thousands of structures within a reasonable time frame.  Provides acceptable compromise between the speed of the analysis and the accuracy of calculated scores.  Because this approach is based on characteristics of existing readily available compounds, simple but novel structural features may be wrongly identified as complex

Yet another alternative approach Build synthetic feasibility into the structure generation process ~

SynSPROUT Approach Readily synthetisable putative ligand structures Reliable high yielding reactions SyntheticKnowledgeBase Pool of readily available starting materials FragmentLibrary fuse spiro new bond Classic SPROUT Built in / user defined reactions: Amide formation Ether formation Ester formation Amine alkylation Reductive amination etc. SynSPROUT Ease of synthesis is a key factor in drug development Build synthetic constraints into structure generation process VIRTUAL SYNTHESIS IN RECEPTOR CAVITY SynSPROUT Scheme

Current Status  Promising structures with estimated high binding affinity  SynSPROUT provides the equivalent to screening a large number of combinatorial libraries  Potential for suggesting starting points for new combinatorial libraries  Combination of a large starting material library with a large reaction knowledgebase causes a combinatorial problem – even with parallel processing  Restricting either size of library or number of synthetic reactions gives acceptable run times

De Novo Structure Generation vs. Lead Optimization De Novo Structure Generation Lead Optimization No structural information from any existing bound ligand is utilised To generate diverse putative ligands from scratch To suggest better ligands structurally similar to the bound one The structure of a good bound ligand provides a starting point (core) AIM AIM

Variations on the SynSPROUT Theme SPROUT LeadOpt Two modes for structure based lead optimisation  Core Extension – Extends core structure (derived from lead) by virtual synthetic chemistry  Monomer Replacement – Replaces monomers which have been identified by retrosynthetic analysis of a lead compound

Core Extension  Import the modified bound ligand (core) + identify substitution points (functional groups)  Generate core + monomer product by performing virtual synthetic reaction(s) at selected functional groups  Estimate binding affinity for products

List of reactions (between functional groups) Synthetic Knowledge Base Core Extension Scheme CORE Simulatesynthetic reaction in the 3D context of receptorsite CORE R 23 R 13 CORE R 12 R 22 R 33 R 32 R 31 R 11 R 21 Multiple low energy conformers + detected functional groups Core Structure Monomer Library General Scheme All possible core + monomer combinations are generated

Automatic Monomer Library Generation SDF file of 3D monomers Perception Knowledge Base o Aromaticity o Normalisation o Hybridisation o H-bonding properties Synthetic rules Functional Groups Synthetic Knowledge Base Atom & Ring Perception Detect Functional Groups (joining points) Multiple low energy conformers + detected functional groups Monomer Library …

CHEMICAL-LABEL C[SPCENTRE=2](=O)-O[HS=1] CHEMICAL-LABEL C-N[HS=2];[CONNECTION=1] Synthetic Knowledge Base  Steps of formation  Hybridization changes  Bond type  Bond length  Dihedral penalty/angle Steps of Joining Rules EXPLANATION Amide Formation IF Carboxylic Acid INTER Primary Amine THEN delete-atom 3 change-hybridization 5 to SP2 form-bond - between 1 and 5 DIHEDRAL-ATOMS 2 1 5 4 DIHEDRAL 0 0 BOND-LENGTH 1.35 END-THEN 1 3 2 4 5 +

Importing the Core Structure (from MOL/PDB file in Elephant module) Importing from a pdb file pdb → mol converter is invoked Functional group(s) are automatically detected when the core structure is imported into the system Hydrogen donor/acceptor or spheric target sites anchor the imported core structure inside the receptor cavity, partially restricting the displacement of the core during lead optimization, but allowing slight movements in order to avoid boundary violations.

Product Generation I. R1R1 Sulphonamide Formation Amide Formation Core R2R2 Generate products by mimicking synthetic reactions between core + monomers Step I.

Product Generation II. Secondary conformers generated by twisting about rotatable bonds of the low energy monomer conformers User defined parameters: Max deviation Sampling of dihedral angles Max penalty Primary monomer conformers generated by (a) CORINA + ROTATE (b) sampling discrete dihedral angles around formed bonds Rigid body docking R1R1 R2R2 Core Ligand flexibility = generate multiple low energy conformers Step II.

Product Generation III. Docking + rejection of conformers with High internal energy Boundary violation Step III.

Multiple Extension Points Combinatorial Problem  Clients-Master-Slaves architecture  Mixed SGI/Linux cluster network (TCP/IP socket network communication) Master Client 1 Client 2 Client 3 … … LinuxSGI Slave 1 CORE R3R3R3R3 R1R1R1R1 R2R2R2R2 Slave 2 CORE R3R3R3R3 R1R1R1R1 R2R2R2R2 Slave 3 CORE R3R3R3R3 R1R1R1R1 R2R2R2R2 Each slave performs optimization on different core + monomer combination

PDB: 1KE8 Case Study (CDK2) CORE R1R1R1R1 R2R2R2R2

ROTATE 1171 2D structures CORINA 4557 3D conformers Monomer Library At least one of the following functional groups:  Carboxylic Acid  Primary Amine  Primary Alkyl Halide  Carbonyl Applied filters  Number of heavy atoms ≥ 8  Number of heavy atoms ≤ 16  Number of acceptor atom ≤ 5  Number of donor atoms ≤ 3  Number of rotatable bonds ≤ 2  Max chain length ≤ 3  Allowed atom types: H, B, C, N, O, F, S, Cl, Br  Number of rings ≤ 3  Stereo centres ≤ 1  No 3,4,7,8,9 –membered ring Maybridge & Aldrich (~140.000) 2D structures Monomer Reagent Library Generation Case Study (CDK2)

Primary amine in sulphonamide formation Sulphonyl chloride reacts with Carboxylic acid in amide reaction Primary aryl halide in amine alkylation reaction Carbonyl in reductive amination and imine formation Primary amine reacts with CORE R1R1R1R1 R2R2R2R2 Case Study (CDK2)

CORE R1R1R1R1 R2R2R2R2 523 Primary Amine R 1 Monomer Library Elapsed time ~ 5 Hours (with 100 slave processors) R 1 +Core + R 2 combinations: Screened 81.23% Failed 4.87 % Accepted 13.90 % (54,123) Results 293 Carboxylic Acid 93 Primary Alkyl Halide 393 Carbonyl R 2 Monomer Library x= 432,345 combinations Case Study (CDK2)

-7.95 -7.82 -7.75 -7.60 -7.47 -7.56 -7.45 -7.07 Case Study (Generated Products)

Monomer Replacement Many lead compounds are composed of readily available starting materials (monomers) linked by reliable high yielding reactions Retrosynthetic analysis can be used to identify the monomers Structurally related analogues could be generated by exhaustive monomer replacement Considerable efficiency gains if monomer library is arranged in a hierarchy based on substructural relationships

Amide Substructure No overlap Substructure Superstructure No overlap Hierarchy Construction

Amide Hierarchy Usage

Monomer Replacement Do they exist in starting materials HIERARCHY ? Retro-synthetic analysis

CASE STUDY Optimisation of SPROUT designed inhibitors of p falciparum Dihydro-orotate Dehydrogenase using Monomer Replacement Initial lead compound MD-155 Sprout score -7.88 Retrosynthetic analysis finds amide formation and Ullmann/Suzuki reaction for monomer formation Monomer library: aryl halides and p- halo-anilines 2D structures: 1923 conformations: 26916

High scoring monomer replacement results Monomer replacement gave 840 new structures (including multiple conformers of the same structure) Scores – 7.50 to 9.30.

Experimental Results for Some Ligands Suggested by SPROUT LeadOpt Monomer Replacement Starting Point MD-155 PfDHODH Ki 3.0 mM HsDHODH Ki 11.0 nM MD-204 PfDHODH Ki 733 nM HsDHODH Ki 21.0 nM 4 fold enhancement in Ki for PfDHODH MD-213 PfDHODH Ki 478 nM HsDHODH Ki 21.7 nM 6 fold enhancement in Ki for PfDHODH

Conclusions  Scoring functions for assessment of binding affinity of the hypothetical compounds produced by de novo design are far from perfect  Hence only readily synthesisable putative ligands will undergo experimental evaluation by medicinal chemists  Assessment of synthetic feasibility is a tractable problem

Acknowledgements  Matt Davies, Phil Bone and Timo Heikkala for experimental work  Molecular Networks GmbH for providing CORINA & ROTATE  MDL for providing MDDR, one of the databases used in the complexity analysis project  for sponsoring the lead optimization project

De Novo design tools for the generation of synthetically accessible ligands Peter Johnson, Krisztina Boda, Shane Weaver, Aniko Valko, Vilmos Valko.

Similar presentations

Presentation on theme: "De Novo design tools for the generation of synthetically accessible ligands Peter Johnson, Krisztina Boda, Shane Weaver, Aniko Valko, Vilmos Valko."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

De Novo design tools for the generation of synthetically accessible ligands Peter Johnson, Krisztina Boda, Shane Weaver, Aniko Valko, Vilmos Valko.

Similar presentations

Presentation on theme: "De Novo design tools for the generation of synthetically accessible ligands Peter Johnson, Krisztina Boda, Shane Weaver, Aniko Valko, Vilmos Valko."— Presentation transcript:

Similar presentations

About project

Feedback