CZ3253: Computer Aided Drug design Lecture 7: Drug Design Methods II: SVM Prof. Chen Yu Zong Tel: 6874-6877

Slides:



Advertisements
Similar presentations
Introduction to Support Vector Machines (SVM)
Advertisements

Analysis of High-Throughput Screening Data C371 Fall 2004.
ECG Signal processing (2)
VC theory, Support vectors and Hedged prediction technology.
Principal Component Analysis Based on L1-Norm Maximization Nojun Kwak IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008.
Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?
An Introduction of Support Vector Machine
Support Vector Machines
SVM—Support Vector Machines
Support vector machine
Machine learning continued Image source:
CSCI 347 / CS 4206: Data Mining Module 07: Implementations Topic 03: Linear Models.
Groundwater 3D Geological Modeling: Solving as Classification Problem with Support Vector Machine A. Smirnoff, E. Boisvert, S. J.Paradis Earth Sciences.
Discriminative and generative methods for bags of features
Support Vector Machine
Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?
Support Vector Machines (SVMs) Chapter 5 (Duda et al.)
Principal Component Analysis
An Introduction to Kernel-Based Learning Algorithms K.-R. Muller, S. Mika, G. Ratsch, K. Tsuda and B. Scholkopf Presented by: Joanna Giforos CS8980: Topics.
Support Vector Classification (Linearly Separable Case, Primal) The hyperplanethat solves the minimization problem: realizes the maximal margin hyperplane.
Support Vector Machines Pattern Recognition Sergios Theodoridis Konstantinos Koutroumbas Second Edition A Tutorial on Support Vector Machines for Pattern.
Reduced Support Vector Machine
October 2-4, 2000M20001 Support Vector Machines: Hype or Hallelujah? Kristin Bennett Math Sciences Dept Rensselaer Polytechnic Inst.
Quantitative Structure-Activity Relationships (QSAR) Comparative Molecular Field Analysis (CoMFA) Gijs Schaftenaar.
Support Vector Machines Kernel Machines
Classification Problem 2-Category Linearly Separable Case A- A+ Malignant Benign.
Support Vector Machines and Kernel Methods
BL5203: Molecular Recognition & Interaction Lecture 5: Drug Design Methods Ligand-Protein Docking (Part I) Prof. Chen Yu Zong Tel:
Bioinformatics Challenge  Learning in very high dimensions with very few samples  Acute leukemia dataset: 7129 # of gene vs. 72 samples  Colon cancer.
Data mining and statistical learning - lecture 13 Separating hyperplane.
SVM Support Vectors Machines
What is Learning All about ?  Get knowledge of by study, experience, or being taught  Become aware by information or from observation  Commit to memory.
Lecture 7: Computer aided drug design: Statistical approach. Lecture 7: Computer aided drug design: Statistical approach. Chen Yu Zong Department of Computational.
Overview of Kernel Methods Prof. Bennett Math Model of Learning and Discovery 2/27/05 Based on Chapter 2 of Shawe-Taylor and Cristianini.
This week: overview on pattern recognition (related to machine learning)
Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.
Prediction model building and feature selection with SVM in breast cancer diagnosis Cheng-Lung Huang, Hung-Chang Liao, Mu- Chen Chen Expert Systems with.
Support Vector Machine & Image Classification Applications
CS 8751 ML & KDDSupport Vector Machines1 Support Vector Machines (SVMs) Learning mechanism based on linear programming Chooses a separating plane based.
LSM3241: Bioinformatics and Biocomputing Lecture 3: Machine learning method for protein function prediction Prof. Chen Yu Zong Tel:
COMMON EVALUATION FINAL PROJECT Vira Oleksyuk ECE 8110: Introduction to machine Learning and Pattern Recognition.
1 SUPPORT VECTOR MACHINES İsmail GÜNEŞ. 2 What is SVM? A new generation learning system. A new generation learning system. Based on recent advances in.
Kernel Methods A B M Shawkat Ali 1 2 Data Mining ¤ DM or KDD (Knowledge Discovery in Databases) Extracting previously unknown, valid, and actionable.
SVM Support Vector Machines Presented by: Anas Assiri Supervisor Prof. Dr. Mohamed Batouche.
Classifiers Given a feature representation for images, how do we learn a model for distinguishing features from different classes? Zebra Non-zebra Decision.
Kernels Usman Roshan CS 675 Machine Learning. Feature space representation Consider two classes shown below Data cannot be separated by a hyperplane.
Lecture 27: Recognition Basics CS4670/5670: Computer Vision Kavita Bala Slides from Andrej Karpathy and Fei-Fei Li
Support Vector Machine Data Mining Olvi L. Mangasarian with Glenn M. Fung, Jude W. Shavlik & Collaborators at ExonHit – Paris Data Mining Institute University.
CS 1699: Intro to Computer Vision Support Vector Machines Prof. Adriana Kovashka University of Pittsburgh October 29, 2015.
CS558 Project Local SVM Classification based on triangulation (on the plane) Glenn Fung.
1  The Problem: Consider a two class task with ω 1, ω 2   LINEAR CLASSIFIERS.
Support vector machine LING 572 Fei Xia Week 8: 2/23/2010 TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A 1.
Dec 21, 2006For ICDM Panel on 10 Best Algorithms Support Vector Machines: A Survey Qiang Yang, for ICDM 2006 Panel Partially.
CZ5225: Modeling and Simulation in Biology Lecture 7, Microarray Class Classification by Machine learning Methods Prof. Chen Yu Zong Tel:
Support Vector Machines (SVM): A Tool for Machine Learning Yixin Chen Ph.D Candidate, CSE 1/10/2002.
CZ5225 Methods in Computational Biology Lecture 2-3: Protein Families and Family Prediction Methods Prof. Chen Yu Zong Tel:
CZ5226: Advanced Bioinformatics Lecture 7: Statistical Learning Methods Prof. Chen Yu Zong Tel:
Feature Selction for SVMs J. Weston et al., NIPS 2000 오장민 (2000/01/04) Second reference : Mark A. Holl, Correlation-based Feature Selection for Machine.
Introduction Support Vector Regression QSAR Problems and Data SVMs for QSAR Linear Program Feature Selection Model Selection and Bagging Computational.
1 Kernel Machines A relatively new learning methodology (1992) derived from statistical learning theory. Became famous when it gave accuracy comparable.
Roughly overview of Support vector machines Reference: 1.Support vector machines and machine learning on documents. Christopher D. Manning, Prabhakar Raghavan.
Day 17: Duality and Nonlinear SVM Kristin P. Bennett Mathematical Sciences Department Rensselaer Polytechnic Institute.
Knowledge-Based Nonlinear Support Vector Machine Classifiers Glenn Fung, Olvi Mangasarian & Jude Shavlik COLT 2003, Washington, DC. August 24-27, 2003.
Support Vector Machines (SVMs) Chapter 5 (Duda et al.) CS479/679 Pattern Recognition Dr. George Bebis.
PREDICT 422: Practical Machine Learning
CZ3253: Computer Aided Drug design Introduction about the module Prof
An Introduction to Support Vector Machines
CS 2750: Machine Learning Support Vector Machines
Support Vector Machine _ 2 (SVM)
University of Wisconsin - Madison
Presentation transcript:

CZ3253: Computer Aided Drug design Lecture 7: Drug Design Methods II: SVM Prof. Chen Yu Zong Tel: Room 07-24, level 7, SOC1, National University of Singapore

2 Classification of Drugs by SVM A drug is classified as either belong (+) or not belong (-) to a class Examples of drug class: inhibitor of a protein, BBB penetrating, genotoxic Examples of protein class: enzyme EC3.4 family, DNA-binding By screening against all classes, the property of a drug or the function of a protein can be identified Drug Class-1 SVM Class-2 SVM Class-3 SVM Drug belongs to Family

3 Classification of Drugs or Proteins by SVM What is SVM? Support vector machines, a machine learning method, learning by examples, statistical learning, classify objects into one of the two classes. Advantages of SVM: Diversity of class members (no racial discrimination). Use of structure-derived physico-chemical features as basis for drug classification (no structure-similarity required in the algorithm).

4 SVM References C. Burges, "A tutorial on support vector machines for pattern recognition", Data Mining and Knowledge Discovery, Kluwer Academic Publishers,1998 (on-line). R. Duda, P. Hart, and D. Stork, Pattern Classification, John-Wiley, 2nd edition, 2001 (section 5.11, hard-copy). S. Gong et al. Dynamic Vision: From Images to Face Recognition, Imperial College Pres, 2001 (sections 3.6.2, 3.7.2, hard copy). Online lecture notes ( )Online lecture noteshttp:// Publications of SVM drug prediction: –J. Chem. Inf. Comput. Sci. 44,1630 (2004) –J. Chem. Inf. Comput. Sci. 44, 1497 (2004) –Toxicol. Sci. 79,170 (2004).

5 Machine Learning Method Inductive learning: Example-based learning Descriptor Positive examples Negative examples

6 Machine Learning Method A=(1, 1, 1) B=(0, 1, 1) C=(1, 1, 1) D=(0, 1, 1) E=(0, 0, 0) F=(1, 0, 1) Feature vectors: Descriptor Feature vector Positive examples Negative examples

7 SVM Method Feature vectors in input space: A=(1, 1, 1) B=(0, 1, 1) C=(1, 1, 1) D=(0, 1, 1) E=(0, 0, 0) F=(1, 0, 1) Z Input space X Y B A E F Feature vector

8 SVM Method Border New border Project to a higher dimensional space Protein family members Nonmembers Protein family members Nonmembers

9 SVM method Support vector New border Protein family members Nonmembers

10 SVM Method Protein family members Nonmembers New border Support vector

11 Best Linear Separator?

12 Best Linear Separator?

13 Find Closest Points in Convex Hulls c d

14 Plane Bisect Closest Points d c

15 Find using quadratic program Many existing and new solvers.

16 Best Linear Separator: Supporting Plane Method Maximize distance Between two parallel supporting planes Distance = “Margin” =

17 Best Linear Separator?

18 SVM Method Border line is nonlinear

19 SVM method Non-linear transformation: use of kernel function

20 SVM method Non-linear transformation

21 SVM Method

22 SVM Method

23 SVM Method

24 SVM Method

25 SVM for Classification of Drugs How to represent a drug? Each structure represented by specific feature vector assembled from structural, physico-chemical properties: –Simple molecular properties (molecular weight, no. of rotatable bonds etc. 18 in total) –Molecular Connectivity and shape (28 in total) –Electro-topological state polarity (84 in total) –Quantum chemical properties (electric charge, polaritability etc. 13 in total) –Geometrical properties (molecular size vector, van der Waals volume, molecular surface etc. 16 in total) J. Chem. Inf. Comput. Sci. 44,1630 (2004) J. Chem. Inf. Comput. Sci. 44, 1497 (2004) Toxicol. Sci. 79,170 (2004).

26 SVM Feature Selection CACO descriptors Average of 10 Models Test Q2 =.7073 Q2 is MSE scaled by variance: = (mean square error) / (true variance)

27 Feature Selection Using subset of descriptors might greatly improve results. Do feature selection using Linear SVM with 1-norm regularization 1-norm 2-norm

28 Feature Selection via Sparse SVM/LP Construct linear  -SVM using 1-norm LP: Pick best C,  for SVM Keep descriptors with nonzero coefficients

29 Bagged Feature Selection Partition Training Data Training Set Validation Set Linear SVM Algorithm For Feature Selection A Linear Regression Model Bag B Models and Obtain Subset of Features Repeat B times Random Variable - r

30 Bagged SVM (RBF) CACO Descriptors Test Q2 =.134

31 Starplot Caco Descriptors

32 Chemistry In/Out Modeling Feature Selection Visualize Features Assess Chemistry Construct SVM Nonlinear model Data+Descriptors SVM Model Test Data Predict bioactivities Chemistry Interpretation

33 Bagged SVM (RBF) CACO Descriptors Test Q2 =.166

34 CACO2 – 15 Variables CACO2 – 15 Variables a.don KB54 SMR.VSA2 ANGLEB45 DRNB10 ABSDRN6 PEOE.VSA.FPPOS DRNB00 PEOE.VSA.FNEG ABSKMIN SIKIA pmiZ BNPB31 FUKB14 SlogP.VSA0

35 Chemical Insights Hydrophobicity - a.don SIZE and Shape ABSDRN6, SMR.VSA2, ANGLEB45, PmiZ Large is bad. Flat is bad. Globular is good. Polarity – PEOE.VSA.FPPOS, PEOE.VSA.FNEG: negative partial charge good. Correspond to conventional wisdom – rule of 5.

36 Hybrid TAE/SHAPE Shape important overall factor –DRNB10, DRNB00: del rho dot N –BNP31: bare nuclear potential –KB54: kinetic energy descriptors very large lipophilic molecules don’t work –FUKB14: Fukui Surface Interpretations difficult Point to chemistry challenges/hypotheses

37 Final SVM Approach Construct large set of descriptors. Perform feature selection: –Sensitivity Analysis or SVM-LP Construct many SVM models –Optimize using QP or LP –Evaluate by Validation Set or Leave-one-out –Select best models by grid or pattern search Bag best k models to create final function

38 Drug Discovery Results (LOO) Data# Sampl e # Var. Full # Var. FS ( Avg ) Q2 Full Q2 FS Caco Barrier HIV Cancer LCCK Aquasol

SVM-based drug design and property prediction software Useful for inhibitor/activator/substrate prediction, drug safety and pharmacokinetic prediction. Computer loaded with SVMProt Support vector machines classifier for every Drug class Identifiedclasses Drug designed or property predicted Send structure to classifier J. Chem. Inf. Comput. Sci. 44,1630 (2004) J. Chem. Inf. Comput. Sci. 44, 1497 (2004) Toxicol. Sci. 79,170 (2004). Input structure through internet Option 2 Option 1 Input structure on local machine Your drug structure Which class your drug belongs to? Drug Chemical Structure Chemical Structure

SVM Drug Prediction Results Protein inhibitor/activator/substrate prediction: 86% of the 129 estrogen receptor activators and 84% of 101 non-activators correctly predicted. 81% of 116 P-glycoprotein substrates and 79% of 85 non-substrates correctly predicted Drug Toxicity Prediction: 97% of 102 TdP+ and 84% of 243 TdP- agents correctly predicted 73% of 229 genotoxic and 93% of 631 non-genotoxic agents correctly predicted Pharmacokinetics prediction : 95% of 276 BBB+ and 82% of 139 BBB- agents correctly predicted 90% of 131 human intestine absorption and 80% of 65 non-absoption agents correctly predicted. J. Chem. Inf. Comput. Sci. 44,1630 (2004) J. Chem. Inf. Comput. Sci. 44, 1497 (2004) Toxicol. Sci. 79,170 (2004).