Download presentation
Presentation is loading. Please wait.
Published byKerrie Taylor Modified over 9 years ago
1
http://www.simbiosys.ca eHiTS Score Darryl Reid, Zsolt Zsoldos, Bashir S. Sadjad, Aniko Simon, The next stage in scoring function evolution: a new statistically derived empirical scoring function.
2
http://www.simbiosys.ca Overview ● eHiTS_Score: new scoring function that takes advantage of the temperature factors in PDB files to better capture the interaction geometries between ligands and receptors. ● An "empirical" function is fitted to represent the statistical interaction data and trained using experimentally derived binding affinities ● This novel scoring function has the additional benefit of family training based on automatic clustering of input receptor structures. ● Very good correlation to known binding affinities on very large and diverse test set of 884 PDB structures
3
http://www.simbiosys.ca eHiTS Algorithm ● Ligands are divided into rigid fragments and flexible connecting chains ● Rigid Dock: Each fragment is docked INDEPENDENTLY everywhere in the receptor ● Pose Match: A fast graph matching algorithm finds all matching solutions to reconstruct the original molecule ● Local Energy Optimization: structure is optimized within the receptor ● Ranking: structures are ranked based on scoring function H2H2 H2H2 H2H2 H2H2 H2H2 H2H2 H2H2 H2H2 Reconnected Ligand Pose: H2H2
4
http://www.simbiosys.ca Novel Approach to Scoring ● In PDB files the given coordinates are derived from a space and time averages of observed positions ● There is a temperature factor that describes the three dimensional probability density of the displacement of the atom from the specified coordinates (the resonance) ● Therefore rather than using the PDB coordinates we have used the probability functions to create a continuous function for interactions
5
http://www.simbiosys.ca Interaction Surface Point (ISP) Types ● Interactions can not be described by distance alone, the angles to the surface points, shown as LP and H, (α,β) as well as the torsions between (δ) them must be considered H LP d α β δ ● METAL ● CHARGED_HPLUS ● PRIMARY_AMINE_HLP ● HDONOR ● WEAK_HDONOR ● CHARGED_LONEPAIR ● ACID_LONEPAIR ● LONEPAIR ● HYDROPHOB ● H_AROM_EDGE ● WS_LIPO ● NEUTRAL ● PI_AROMATIC ● PI_RESON_POLAR ● PI_RESON_CARBON ● AMBIVALENT_HLP ● ROTATABLE_H ● ROTATABLE_LP ● WEAK_LONEPAIR ● PI_SP2_POLAR ● PI_SP2_CARBON ● HALOGEN ● SULFUR 23 Surface point types:
6
http://www.simbiosys.ca Interaction Surface Point (ISP) Types
7
http://www.simbiosys.ca Statistically derived empirical scoring function ● Gathered interaction statistics from 2500 PDB structures (Gold-Astex/PDBbind, high resolution <2.5Å) ● The probability of the geometric descriptors (d,α,β,δ) falling into specific ranges is based on the temperature factors using volumetric integrals ● Sum the integral values for all observed interactions in the complexes and deposit into a 4D data array ● 4 variable analytic functions are fitted to the 4D data array ● These functions form the terms of the new scoring function
8
http://www.simbiosys.ca Family-based Training 1420 PDB Complexes eHiTS Training eHiTS Scoring Functions 2. Complexes are clustered automatically into 97 protein families, plus one default, global set 1. 2500 PDB complexes chosen to represent a wide range of protein families 3. eHiTS training utility optimizes scoring functions (weights) for each family 4. Scoring functions for each family are outputted and used as default scoring functions of eHiTS
9
http://www.simbiosys.ca Additional scoring terms The 276 interaction functions are mapped to 6 weighting factors which are varied during the family-based training. In addition to these the weights of following additional terms are also optimized on a per family basis. ● steric clash (quadratic penalty function) ● depth value within binding pocket ● solvation ● family-coverage ● conformational strain energy of the ligand ● intra-molecular interactions within the ligand ● entropy loss due to frozen rotatable bonds
10
http://www.simbiosys.ca Tuning the component weights ● Goal function combines 4 terms: – Convergence of local minimisation (funnel shape) – Solution pose ranking (identify low RMSD as best) – Correlation to experimental binding energy – Separation of actives from decoys (enrichment) ● Stochastic (simulated annealing) + Powell engine ● Overfitting test: tune on half, test on the other half
11
http://www.simbiosys.ca Results: Docking 1568 complexes - Resolution <= 2.5Å - 97 protein families (5+) - 349 singletons - PDB-bind 2004 - Astex-GOLD validation Closest average: 0.73Å TopRank ave.: 1.10Å Closest Top Rank
12
http://www.simbiosys.ca ● eHiTS (far right) docked 59 of the 69 complexes within 1.5Å of the x-ray pose and 67 of 69 within 3.5Å, outperforming the published[1] results of the other 5 docking tools on this set of proteins 1 Maria Kontayianni, Laura M. McClellan, and Glenn S. Sokol, Evaluation of Docking Performance: Comparative Data on Docking Algorithms. J. Med. Chem. 2004, 47. 558-565. Docking accuracy comparison
13
http://www.simbiosys.ca Correlation to binding affinity 884 PDB complexes R = 0.75 q = 1.61
14
http://www.simbiosys.ca VHTS Filter: eHiTS Filter The eHiTS Filter is based on ligand surface points. All chemically interesting points on the surface of the ligand are assigned surface point types (SPT), indicated by triangles on the histidine ring shown. Each SPT has associated chemical properties (indicated by their color), such as H-bond donor, H- bond acceptor, hydrophobic, π-stacking, etc. The count each of the 23 surface point types creates the feature vector for that ligand. The Filter is based on the assumption that ligands with similar feature vectors have similar activity. Feature Vector: Ligand DB Feature Vectors acti ve inacti ve Neural Network Trai ned Net work file Feature Vectors Trai ned Net work file eHiTS Filter eHiTS Docking 0.9999 0.0000 Score + pose Ranked List Re-ranked docked poses Ligands 10 2 1 3 Training eHiTS FilterScreening with eHiTS Filter Docking
15
http://www.simbiosys.ca Diversity of Actives and decoys ● For each set of actives, the average feature vectors was calculated (represented by the blue star) ● The RMSD from this feature vector was calculated for each active and decoy. The plot below shows the average RMSD for the actives and the decoys, as well as the MAX RMSD for the actives ● For 15 of the 18 codes even the max RMSD of the actives is less than the average RMSD of the decoys x ✶ x x x x x x x x x ✶ ✶ ✶ ✶ ✶ ✶ ✶ ✶ ✶ ✶ x x x x x x x x x x x x ✶ x x
16
http://www.simbiosys.ca Enrichment results of eHiTS_Filter eHiTS_Filter was used to screen a dataset of 869 decoys plus actives (ranging from 5 to 20). The results show remarkable enrichment across a wide range of receptor families, with the average enrichment of ~80% of the actives recovered in the top 10% of the ranked database. Pham, T.A. and Jain, A.N. Parameter Estimation for Scoring Protein-Ligand Interactions Using Negative Training Data J. Med. Chem., 2005, 10.1021
17
http://www.simbiosys.ca Scoring Function Let's define some helper functions: a(x):=P 0 *x+ P 1 *x 2 +P 2 *sqrt(x)+ P 3 b(x):=P 4 *x+ P 5 *x 2 +P 6 *sqrt(x)+ P 7 g(x):=P 8 *(x-P 9 ) c(x):=cos(g(x)) if g(x)>- п and g(x)< п, -1 otherwise d(x):=P 10 *x+ P 11 *x 2 + P 12 *x 3 + P 13 *g(x)*g(x)+ P 14 *c(x)+ P 15 t(x):=P 16 *x+ P 17 *x 2 +P 18 *sqrt(x)+ P 19 Then the scoring function is: f( ,dist, )= a( ) * b( ) * d(dist) * t( )
Similar presentations
© 2025 SlidePlayer.com Inc.
All rights reserved.