Presentation on theme: "A new paradigm for virtual screening A Research Council’s Basic Technology Research Programme."— Presentation transcript:
A new paradigm for virtual screening A Research Council’s Basic Technology Research Programme
Background Cross research council endeavour –administered by EPSRC Funding for research to create a new technology Change the way we do science Underpin the future industrial base
Atom based modelling QSAR & QSPR Almost all modelling techniques are based on atomistic descriptions of molecules Although these techniques have been successful over several decades, they have disadvantages –poor scaling characteristics –lack of a solid physical justification, e.g. scoring functions –interpretation difficult due to abstract nature of many descriptors –tendency to produce high dimensional models
What is the true dimensionality of chemical space? This has been investigated as follows: –1.C hoose 26 descriptors that appear again and again in our QSPR-models –2. Calculate them for the entire Maybridge database –3. Calculate the principal components (factors) –4. What is the dimensionality of physical property space, what are the descriptors?
Improved molecular modelling? Can we define a more parsimonious and explicit description of molecules than has so far been achieved using atomistic models? –leading to better prediction AND a clearer understanding of the properties of molecules and how they arise
A non-atom based approach We are developing an alternative approach in which molecules are described by their surfaces Benzodiazepine analogues
A non-atom based approach The approach is based on calculation of a set of local properties at or near the molecular surface the local molecular electrostatic potential (MEP) the local ionisation energy (LIE, IE L ) the local electron affinity (LEA, EA L ) the local polarisability (LP, L )
Calculation of the surface properties Molecules defined as isodensity surfaces –using semi-empirical AM1 electron density –can also be defined using a shrink-wrap or a marching cube algorithm Fitted to a spherical harmonic expansion –the shape of the shrink-wrapped surface, or –the four local properties MEP, LIE, LEA & LP
Describing surface shape: spherical harmonic expansion The accuracy of the surface description is a function of the order N of the expansion The greater N, the larger the computational penalty
Advantages of this approach This gives a completely analytical description of the molecule’s shape & the 4 local properties – intermolecular binding properties & chemical reactivity Spherical harmonics can be truncated at low orders for fast QSAR scans (HTS), fast superposition of molecules & rapid calculation of similarity indices –for ligands (MW < 750), N = 6-8 –for peptides & proteins (MW > 5,000), N = 25-30
Putative resolutions for in silico screening For ligands N=6 For receptors N=25
Application to QSAR & QSPR Several classes of QSAR/QSPR descriptors can be derived from the local properties, including: –the spherical harmonics coefficients for constant order N the number of coefficients is invariant of the number of atoms in a molecule –the critical points for each surface property maxima, minima & saddle points –the distribution of field intensities at the molecular surface four fields with local intensities varying between molecules sample using grid points? –the surface integrals for each field
Public domain datasets Small Consensus Set of 74 Drug Molecules (diverse) QSAR set (31 CoMFA steroids) Medium WDI subset (2,400 compounds) Harvard Chembank dataset (2,000 compounds) Large WDI (50,000) Maybridge (50,000)
An example grid of surface points A grid is placed on this molecular surface in order to reduce the number of surface points from 4038 to 55
Gradient flows & molecular surface property graphs Characterize the behaviour of a property f : S on a molecular surface S, in terms of a directed graph G on S derived from the gradient vector field x = grad f(x) The molecular surface property graph G is defined by –Vertices (G) = fixed points of grad f = critical points of f –Edges (G) = stable and unstable manifolds of the saddle points
Allopurinol RGB Surfaces LIE encoded on Red channel LEA encoded on Green Channel LP or MEP encoded on Blue Channel
Critical points of allopurinol 8 maxima 7 minima 13 saddles No. of maxima – no. of saddles + no. of minima = Euler characteristic (S) = 2
Distribution based descriptors 34 descriptors were measured including maximum field intensity minimum field intensity mean field intensity range of field intensities variance of field intensities The Principal Components of the descriptors were calculated to provide a set of orthogonal descriptors derived from the local properties at the molecular surface
QSPR & QSAR models Models derived from Local Properties Drug Likeness SOMs trained on WDI (drugs) & Maybridge (general) Parameters from PC of Local Property Descriptors Medium sized datasets superimposed on SOMs Surface Integral Model for Solvation Energy RMS Error ~ 0.75 Kcal
Physical-Property Mapping Maybridge used as the “chemistry“ dataset Use the top six principal components to train a 100 100 Kohonen net (unsupervised training) 2,105 compounds selected from the World Drug Index as real drugs used as the drug dataset
Physical Property Map “chemistry“ Train Kohonen Net “Drugs“
Surface-integral free energies –Critical for scoring functions, which otherwise use the force-field intermolcular energies –Provide an attractive alternative to descriptor- plus-interpolation QSPR-models –Solvation , lattice energies ?, vapour pressures , partition coefficients ?, solubilities ?.....
Surface-integral models P = target property A i = area of triangle i ntri = number of triangles
Free energies & enthalpies of hydration, free energies of solvation for n-octanol & chloroform
Pattern matching on molecular surfaces Can we recognise similar surfaces? Can we recognise similar surface fragments? Can we identify the most similar surface to our target? How do we compare field descriptors on the molecular surface?
Surface comparison Two different approaches: 1.Using spherical harmonic molecular surfaces [ J. Comp. Chem. 20(4) 383-395; Ritchie and Kemp 2000; University of Aberdeen ]. 2.Partial molecular alignment via local structure analysis [ J. Chem. Inf. Comput. Sci. 40(2) 503-512 ; Robinson, Lyne and Richards 1999; University of Oxford ].
Voting pairs provide possible local alignments Try all possible voting pairs to produce a large number of alignments. The choice of voting pairs can have a critical effect on the quality of the surface alignment.
Pattern matching of surface properties: RMSD = 0.75 A B
ParaSurf v1.0 Surfaces Isodensity Surfaces Shrink Wrap Marching Cube Surfaces fit to Spherical Harmonics Properties MEP, LIE, LEA and LP Encoded at points on the surface Encoded as Spherical Harmonic Expansions
GRID Computing ParaSurf compiled on SGI IRIX Windows Linux (SUSE) IBM AIX Future Platforms SUN Solaris GRID enabling at Portsmouth, Southampton and Oxford.
Summary Molecular surfaces Aberdeen QM properties on surface Erlangen Compound screening Pattern matching on surfaces Southampton/Oxford Critical features Portsmouth Data reduction and QSAR Portsmouth Spherical harmonic representation Aberdeen
Conclusions Properties can be calculated at the surface of molecules These properties can be RGB encoded The properties are local Descriptor sets derived from these properties can be used for robust QSPR & QSAR models The algorithms will soon be available commercially for use in virtual high throughput screening
ParaSurf – in silico Screening Technology Basic Technology Funding for October 2003 to September 2004 –Proof of concept studies –Consortia building networking Academic partners –University of Portsmouth –University of Erlangen –University of Southampton –University of Aberdeen –University of Oxford