We think you have liked this presentation. If you wish to download it, please recommend it to your friends in any social system. Share buttons are a little bit lower. Thank you!
Presentation is loading. Please wait.
Published byHannah Lawrence
Modified over 2 years ago
Domain of Applicability A Cluster-Based Measure of Domain of Applicability of a QSAR Model Robert Stanforth 6 September 2005
D C = D D + D M + D A - c © IDBS 2005 What is QSAR? Motivation Modelling the Dataset Measure of Distance from Domain Validation Overview
D C = D D + D M + D A - c © IDBS 2005 What is QSAR? Quantitative Structure-Activity Relationships BiologicalActivity = f ( ChemicalStructure ) + Error Descriptor-based QSAR Descriptors measure chemical structure E.g. topological indices of chemical graph Use Multivariate Linear Regression Regress activity onto high-dimensional descriptor space Problem of extrapolation 3 c =0 3 c = c = c = c =1.802
D C = D D + D M + D A - c © IDBS 2005 Motivation QSAR model only valid in domain of its training set Measure membership of this domain of applicability Provides assurance of: External test set k-fold cross validation Prediction ? ?
D C = D D + D M + D A - c © IDBS 2005 Bounding Box Convex Hull Distance to Centroid Nearest Neighbour and k-NN Methods Existing Methods ? ?
D C = D D + D M + D A - c © IDBS 2005 Use clusters to model the shape of the dataset K-Means algorithm iteratively adjusts partitioning into clusters to increase accuracy of the model Computationally feasible K-Means for Clustering
D C = D D + D M + D A - c © IDBS 2005 Use the K-Means Model Base on distances to cluster centroids Fuzzy cluster membership Weighted average of distances to cluster centroids, weighted according to cluster membership Computationally efficient Measure of Distance
D C = D D + D M + D A - c © IDBS 2005 Contour Plot First contour defines boundary of applicability domain Measure of Distance
D C = D D + D M + D A - c © IDBS 2005 Assess stability of distance measure Use k-fold cross validation Leave out one group at a time Retrain distance measure Mean relative change in distance of compounds left out Internal Validation
D C = D D + D M + D A - c © IDBS 2005 Internal Validation MethodAveraged Relative Deviation Bounding Box53.2% Leverage80.5% k-NN83.1% Cluster-based43.2%
D C = D D + D M + D A - c © IDBS 2005 External Validation Assess relationship between distance and prediction error Analyse mean-square prediction error over: 50 new compounds Those inside domain Those outside domain
D C = D D + D M + D A - c © IDBS 2005 External Validation Mean Square Prediction Error MethodAll (50) Inside Domain Outside Domain Bounding Box (27) 2.40 (23) Leverage (48) 1.61 (2) k-NN (45) 3.11 (5) Cluster-based (46) 3.58 (4)
D C = D D + D M + D A - c © IDBS 2005 Need quantitative measure of applicability of a descriptor- based QSAR model to a structure Existing methods are all either too crude or too slow Our new method is computationally efficient, and copes well with non-convex domains Conclusions
Version 1.0 – 19 Jan 2009 Functional Genomics and Microarray Analysis (2)
Data Mining Techniques and Applications, 1 st edition Hongbo Du ISBN © 2010 Cengage Learning Chapter Four Basic techniques for cluster.
Coherent Laplacian 3D protrusion segmentation Oxford Brookes Vision Group Queen Mary, University of London, 11/12/2009 Fabio Cuzzolin.
Chapter 2 Overview of the Data Mining Process 1. Introduction Data Mining – Predictive analysis Tasks of Classification & Prediction Core of Business.
Chapter 6 Three Simple Classification Methods The Naïve Rule Naïve Bayes k-Nearest Neighbor 1.
Classification and Prediction Monash University Semester 1, March 2006.
ADBIS 2007 Aggregating Multiple Instances in Relational Database Using Semi-Supervised Genetic Algorithm-based Clustering Technique Rayner Alfred Dimitar.
Lecture 4. Linear Models for Regression. Outline Linear Regression Least Square Solution Subset Least Square subset selection/forward/backward Penalized.
Surface normals and principal component analysis (PCA) 3DM slides by Marc van Kreveld 1.
1 Simple Linear Regression 1. review of least squares procedure 2. inference for least squares lines.
ECE 250 Algorithms and Data Structures Douglas Wilhelm Harder, M.Math. LEL Department of Electrical and Computer Engineering University of Waterloo Waterloo,
Indexing Techniques for Multimedia Databases Multimedia Similarity Search Structure Image Indexing Video Indexing.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved Lecture Slides Elementary Statistics Eleventh Edition and the Triola.
Chapter 3: Supervised Learning. CS583, Bing Liu, UIC 2 Road Map Basic concepts Decision tree induction Evaluation of classifiers Rule induction Classification.
Chapter 5 Multiple Linear Regression 1. Introduction Fit a linear relationship between a quantitative dependent variable and a set of predictors. Assume.
1 Classification using instance-based learning. 3 March, 2000Advanced Knowledge Management2 Introduction (lazy vs. eager learning) Notion of similarity.
©Ian Sommerville 2000 Software Engineering, 6th edition. Chapter 2Slide 1 Chapter 2 Computer-Based System Engineering.
Discrimination amongst k populations. We want to determine if an observation vector comes from one of the k populations For this purpose we need to partition.
Carnegie Mellon Parallel Splash Belief Propagation Joseph E. Gonzalez Yucheng Low Carlos Guestrin David O’Hallaron TexPoint fonts used in EMF. Read the.
Experiments on a New Inter- Subject Registration Method John Ashburner 2007.
Survey design and implementation Tampere Design, Implementation & Analysis of Innovation Surveys With a note on presenting results Anthony Arundel.
Detecting Statistical Interactions with Additive Groves of Trees Daria Sorokina, Rich Caruana, Mirek Riedewald, Daniel Fink.
1 Previsão | Pedro Paulo Balestrassi | 4 – Exponential Smoothing Methods.
Graph Visualisation Kai Xu Middlesex University, UK.
1 MRes Wednesday 11 th March 2009 Logistic regression.
Sampling Techniques for Probabilistic and Deterministic Graphical models Bozhena Bidyuk Vibhav Gogate Rina Dechter.
© Negnevitsky, Pearson Education, Lecture 7 Artificial neural networks: Supervised learning Introduction, or how the brain works Introduction, or.
11 Sum-Product Networks: A New Deep Architecture Hoifung Poon Microsoft Research Joint work with Pedro Domingos.
Decision Support Tools for River Quality Management Martin Paisley, David Trigg and William Walley Centre for Intelligent Environmental Systems, Faculty.
Facility Location Lindsey Bleimes Charlie Garrod Adam Meyerson.
© 2016 SlidePlayer.com Inc. All rights reserved.