Download presentation

Presentation is loading. Please wait.

Published byHannah Lawrence Modified over 4 years ago

1
Domain of Applicability A Cluster-Based Measure of Domain of Applicability of a QSAR Model Robert Stanforth 6 September 2005

2
D C = D D + D M + D A - c © IDBS 2005 What is QSAR? Motivation Modelling the Dataset Measure of Distance from Domain Validation Overview

3
D C = D D + D M + D A - c © IDBS 2005 What is QSAR? Quantitative Structure-Activity Relationships BiologicalActivity = f ( ChemicalStructure ) + Error Descriptor-based QSAR Descriptors measure chemical structure E.g. topological indices of chemical graph Use Multivariate Linear Regression Regress activity onto high-dimensional descriptor space Problem of extrapolation 3 c =0 3 c =0.289 3 c =0.408 3 c =0.667 3 c =1.802

4
D C = D D + D M + D A - c © IDBS 2005 Motivation QSAR model only valid in domain of its training set Measure membership of this domain of applicability Provides assurance of: External test set k-fold cross validation Prediction ? ?

5
D C = D D + D M + D A - c © IDBS 2005 Bounding Box Convex Hull Distance to Centroid Nearest Neighbour and k-NN Methods Existing Methods ? ?

6
D C = D D + D M + D A - c © IDBS 2005 Use clusters to model the shape of the dataset K-Means algorithm iteratively adjusts partitioning into clusters to increase accuracy of the model Computationally feasible K-Means for Clustering

8
D C = D D + D M + D A - c © IDBS 2005 Use the K-Means Model Base on distances to cluster centroids Fuzzy cluster membership Weighted average of distances to cluster centroids, weighted according to cluster membership Computationally efficient Measure of Distance

9
D C = D D + D M + D A - c © IDBS 2005 Contour Plot First contour defines boundary of applicability domain Measure of Distance

12
D C = D D + D M + D A - c © IDBS 2005 Assess stability of distance measure Use k-fold cross validation Leave out one group at a time Retrain distance measure Mean relative change in distance of compounds left out Internal Validation

13
D C = D D + D M + D A - c © IDBS 2005 Internal Validation MethodAveraged Relative Deviation Bounding Box53.2% Leverage80.5% k-NN83.1% Cluster-based43.2%

14
D C = D D + D M + D A - c © IDBS 2005 External Validation Assess relationship between distance and prediction error Analyse mean-square prediction error over: 50 new compounds Those inside domain Those outside domain

15
D C = D D + D M + D A - c © IDBS 2005 External Validation Mean Square Prediction Error MethodAll (50) Inside Domain Outside Domain Bounding Box2.763.08 (27) 2.40 (23) Leverage2.762.81 (48) 1.61 (2) k-NN2.762.73 (45) 3.11 (5) Cluster-based2.762.70 (46) 3.58 (4)

16
D C = D D + D M + D A - c © IDBS 2005 Need quantitative measure of applicability of a descriptor- based QSAR model to a structure Existing methods are all either too crude or too slow Our new method is computationally efficient, and copes well with non-convex domains Conclusions

Similar presentations

OK

Essential Statistics Chapter 51 Least Squares Regression Line u Regression line equation: y = a + bx ^ –x is the value of the explanatory variable –“y-hat”

Essential Statistics Chapter 51 Least Squares Regression Line u Regression line equation: y = a + bx ^ –x is the value of the explanatory variable –“y-hat”

© 2018 SlidePlayer.com Inc.

All rights reserved.

To make this website work, we log user data and share it with processors. To use this website, you must agree to our Privacy Policy, including cookie policy.

Ads by Google

Ppt on word association test Ppt on e banking in india Ppt on musical instruments in hindi Ppt on learning styles Ppt on central limit theorem statistics Ppt on earthquake resistant buildings in india Ppt on object-oriented programming examples Free ppt on festivals of india Ppt on lean six sigma Full ppt on electron beam machining application