Presentation is loading. Please wait.

Presentation is loading. Please wait.

CZ3253: Computer Aided Drug design Lecture 7: Drug Design Methods II: SVM Prof. Chen Yu Zong Tel: 6874-6877

Similar presentations


Presentation on theme: "CZ3253: Computer Aided Drug design Lecture 7: Drug Design Methods II: SVM Prof. Chen Yu Zong Tel: 6874-6877"— Presentation transcript:

1 CZ3253: Computer Aided Drug design Lecture 7: Drug Design Methods II: SVM Prof. Chen Yu Zong Tel: 6874-6877 Email: csccyz@nus.edu.sg http://xin.cz3.nus.edu.sg Room 07-24, level 7, SOC1, National University of Singapore csccyz@nus.edu.sg http://xin.cz3.nus.edu.sgcsccyz@nus.edu.sg http://xin.cz3.nus.edu.sg

2 2 Classification of Drugs by SVM A drug is classified as either belong (+) or not belong (-) to a class Examples of drug class: inhibitor of a protein, BBB penetrating, genotoxic Examples of protein class: enzyme EC3.4 family, DNA-binding By screening against all classes, the property of a drug or the function of a protein can be identified Drug Class-1 SVM Class-2 SVM Class-3 SVM Drug belongs to Family-3 - - + - -

3 3 Classification of Drugs or Proteins by SVM What is SVM? Support vector machines, a machine learning method, learning by examples, statistical learning, classify objects into one of the two classes. Advantages of SVM: Diversity of class members (no racial discrimination). Use of structure-derived physico-chemical features as basis for drug classification (no structure-similarity required in the algorithm).

4 4 SVM References C. Burges, "A tutorial on support vector machines for pattern recognition", Data Mining and Knowledge Discovery, Kluwer Academic Publishers,1998 (on-line). R. Duda, P. Hart, and D. Stork, Pattern Classification, John-Wiley, 2nd edition, 2001 (section 5.11, hard-copy). S. Gong et al. Dynamic Vision: From Images to Face Recognition, Imperial College Pres, 2001 (sections 3.6.2, 3.7.2, hard copy). Online lecture notes (http://www.cs.unr.edu/~bebis/MathMethods/SVM/lecture.pdf )Online lecture noteshttp://www.cs.unr.edu/~bebis/MathMethods/SVM/lecture.pdf Publications of SVM drug prediction: –J. Chem. Inf. Comput. Sci. 44,1630 (2004) –J. Chem. Inf. Comput. Sci. 44, 1497 (2004) –Toxicol. Sci. 79,170 (2004).

5 5 Machine Learning Method Inductive learning: Example-based learning Descriptor Positive examples Negative examples

6 6 Machine Learning Method A=(1, 1, 1) B=(0, 1, 1) C=(1, 1, 1) D=(0, 1, 1) E=(0, 0, 0) F=(1, 0, 1) Feature vectors: Descriptor Feature vector Positive examples Negative examples

7 7 SVM Method Feature vectors in input space: A=(1, 1, 1) B=(0, 1, 1) C=(1, 1, 1) D=(0, 1, 1) E=(0, 0, 0) F=(1, 0, 1) Z Input space X Y B A E F Feature vector

8 8 SVM Method Border New border Project to a higher dimensional space Protein family members Nonmembers Protein family members Nonmembers

9 9 SVM method Support vector New border Protein family members Nonmembers

10 10 SVM Method Protein family members Nonmembers New border Support vector

11 11 Best Linear Separator?

12 12 Best Linear Separator?

13 13 Find Closest Points in Convex Hulls c d

14 14 Plane Bisect Closest Points d c

15 15 Find using quadratic program Many existing and new solvers.

16 16 Best Linear Separator: Supporting Plane Method Maximize distance Between two parallel supporting planes Distance = “Margin” =

17 17 Best Linear Separator?

18 18 SVM Method Border line is nonlinear

19 19 SVM method Non-linear transformation: use of kernel function

20 20 SVM method Non-linear transformation

21 21 SVM Method

22 22 SVM Method

23 23 SVM Method

24 24 SVM Method

25 25 SVM for Classification of Drugs How to represent a drug? Each structure represented by specific feature vector assembled from structural, physico-chemical properties: –Simple molecular properties (molecular weight, no. of rotatable bonds etc. 18 in total) –Molecular Connectivity and shape (28 in total) –Electro-topological state polarity (84 in total) –Quantum chemical properties (electric charge, polaritability etc. 13 in total) –Geometrical properties (molecular size vector, van der Waals volume, molecular surface etc. 16 in total) J. Chem. Inf. Comput. Sci. 44,1630 (2004) J. Chem. Inf. Comput. Sci. 44, 1497 (2004) Toxicol. Sci. 79,170 (2004).

26 26 SVM Feature Selection CACO2 - 718 descriptors Average of 10 Models Test Q2 =.7073 Q2 is MSE scaled by variance: = (mean square error) / (true variance)

27 27 Feature Selection Using subset of descriptors might greatly improve results. Do feature selection using Linear SVM with 1-norm regularization 1-norm 2-norm

28 28 Feature Selection via Sparse SVM/LP Construct linear  -SVM using 1-norm LP: Pick best C,  for SVM Keep descriptors with nonzero coefficients

29 29 Bagged Feature Selection Partition Training Data Training Set Validation Set Linear SVM Algorithm For Feature Selection A Linear Regression Model Bag B Models and Obtain Subset of Features Repeat B times Random Variable - r

30 30 Bagged SVM (RBF) CACO2 - 31 Descriptors Test Q2 =.134

31 31 Starplot Caco2 - 31 Descriptors

32 32 Chemistry In/Out Modeling Feature Selection Visualize Features Assess Chemistry Construct SVM Nonlinear model Data+Descriptors SVM Model Test Data Predict bioactivities Chemistry Interpretation

33 33 Bagged SVM (RBF) CACO2 - 15 Descriptors Test Q2 =.166

34 34 CACO2 – 15 Variables CACO2 – 15 Variables a.don KB54 SMR.VSA2 ANGLEB45 DRNB10 ABSDRN6 PEOE.VSA.FPPOS DRNB00 PEOE.VSA.FNEG ABSKMIN SIKIA pmiZ BNPB31 FUKB14 SlogP.VSA0

35 35 Chemical Insights Hydrophobicity - a.don SIZE and Shape ABSDRN6, SMR.VSA2, ANGLEB45, PmiZ Large is bad. Flat is bad. Globular is good. Polarity – PEOE.VSA.FPPOS, PEOE.VSA.FNEG: negative partial charge good. Correspond to conventional wisdom – rule of 5.

36 36 Hybrid TAE/SHAPE Shape important overall factor –DRNB10, DRNB00: del rho dot N –BNP31: bare nuclear potential –KB54: kinetic energy descriptors very large lipophilic molecules don’t work –FUKB14: Fukui Surface Interpretations difficult Point to chemistry challenges/hypotheses

37 37 Final SVM Approach Construct large set of descriptors. Perform feature selection: –Sensitivity Analysis or SVM-LP Construct many SVM models –Optimize using QP or LP –Evaluate by Validation Set or Leave-one-out –Select best models by grid or pattern search Bag best k models to create final function

38 38 Drug Discovery Results (LOO) Data# Sampl e # Var. Full # Var. FS ( Avg ) Q2 Full Q2 FS Caco2 27713410.330.29 Barrier 62569510.310.28 HIV 64561170.460.40 Cancer 46362340.500.16 LCCK 66350690.400.37 Aquasol 197525570.080.06

39 SVM-based drug design and property prediction software Useful for inhibitor/activator/substrate prediction, drug safety and pharmacokinetic prediction. Computer loaded with SVMProt Support vector machines classifier for every Drug class Identifiedclasses Drug designed or property predicted Send structure to classifier J. Chem. Inf. Comput. Sci. 44,1630 (2004) J. Chem. Inf. Comput. Sci. 44, 1497 (2004) Toxicol. Sci. 79,170 (2004). Input structure through internet Option 2 Option 1 Input structure on local machine http://jing.cz3.nus.edu.sg/cgi-bin/svmprot.cgi Your drug structure Which class your drug belongs to? Drug Chemical Structure Chemical Structure

40 SVM Drug Prediction Results Protein inhibitor/activator/substrate prediction: 86% of the 129 estrogen receptor activators and 84% of 101 non-activators correctly predicted. 81% of 116 P-glycoprotein substrates and 79% of 85 non-substrates correctly predicted Drug Toxicity Prediction: 97% of 102 TdP+ and 84% of 243 TdP- agents correctly predicted 73% of 229 genotoxic and 93% of 631 non-genotoxic agents correctly predicted Pharmacokinetics prediction : 95% of 276 BBB+ and 82% of 139 BBB- agents correctly predicted 90% of 131 human intestine absorption and 80% of 65 non-absoption agents correctly predicted. J. Chem. Inf. Comput. Sci. 44,1630 (2004) J. Chem. Inf. Comput. Sci. 44, 1497 (2004) Toxicol. Sci. 79,170 (2004).


Download ppt "CZ3253: Computer Aided Drug design Lecture 7: Drug Design Methods II: SVM Prof. Chen Yu Zong Tel: 6874-6877"

Similar presentations


Ads by Google