An Introduction to Support Vector Machine Classification Bioinformatics Lecture 7/2/2003 by Pierre Dönnes.

An Introduction to Support Vector Machine Classification Bioinformatics Lecture 7/2/2003 by Pierre Dönnes

Outline What do we mean with classification, why is it useful Machine learning- basic concept Support Vector Machines (SVM) –Linear SVM – basic terminology and some formulas –Non-linear SVM – the Kernel trick An example: Predicting protein subcellular location with SVM Performance measurments

Classification Everyday, all the time we classify things. Eg crossing the street: –Is there a car coming? –At what speed? –How far is it to the other side? –Classification: Safe to walk or not!!!

Decision tree learning IF (Outlook = Sunny) ^ (Humidity = High) THEN PlayTennis =NO IF (Outlook = Sunny)^ (Humidity = Normal) THEN PlayTennis = YES Training examples: Day Outlook Temp. Humidity Wind PlayTennis D1 Sunny Hot High Weak No D2 Overcast Hot High Strong Yes …… Outlook SunnyOvercastRain Humidity Yes Wind High Normal YesNo StrongWeak YesNo

Classification tasks in Bioinformatics Learning Task – Given: Expression profiles of leukemia patients and healthy persons. – Compute: A model distinguishing if a person has leukemia from expression data. Classification Task – Given: Expression profile of a new patient + a learned model – Determine: If a patient has leukemia or not.

Problems in classifying biological data Often high dimension of data. Hard to put up simple rules. Amount of data. Need automated ways to deal with the data. Use computers – data processing, statistical analysis, try to learn patterns from the data (Machine Learning)

Examples are: - Support Vector Machines - Artificial Neural Networks - Boosting - Hidden Markov Models

Black box view of Machine Learning Magic black box (learning machine) Training data Model Training data: -Expression patterns of some cancer + expression data from healty person Model: - The model can distinguish between healty and sick persons. Can be used for prediction. Test-data Prediction = cancer or not Model

Tennis example 2 Humidity Temperature = play tennis = do not play tennis

Linear Support Vector Machines x1 x2 =+1 =-1 Data:, i=1,..,l x i  R d y i  {-1,+1}

=-1 =+1 Data:, i=1,..,l x i  R d y i  {-1,+1} All hyperplanes in R d are parameterize by a vector (w) and a constant b. Can be expressed as wx+b=0 (remember the equation for a hyperplane from algebra!) Our aim is to find such a hyperplane f(x)=sign(wx+b), that correctly classify our data. f(x) Linear SVM 2

d+d+ d-d- Definitions Define the hyperplane H such that: x iw+b  +1 when y i =+1 x iw+b  -1 when y i =-1 d+ = the shortest distance to the closest poitive point d- = the shortest distance to the closest negative point The margin of a separating hyperplane is d + + d -. H H1 and H2 are the planes: H1: x iw+b = +1 H2: x iw+b = -1 The points on the planes H1 and H2 are the Support Vectors H1 H2

Maximizing the margin d+ d- We want a classifier with as big margin as possible. Recall the distance from a point(x 0,y 0 ) to a line: Ax+By+c = 0 is|A x 0 +B y 0 +c|/sqrt(A 2 +B 2 ) The distance between H and H1 is: |wx+b|/||w||=1/||w|| The distance between H1 and H2 is: 2/||w|| In order to maximize the margin, we need to minimize ||w||. With the condition that there are no datapoints between H1 and H2: x iw+b  +1 when y i =+1 x iw+b  -1 when y i =-1 Can be combined into yi(x iw)  1 H1 H2 H

The Lagrangian trick Reformulate the optimization problem: A ”trick” often used in optimization is to do an Lagrangian formulation of the problem.The constraints will be replace by constraints on the Lagrangian multipliers and the training data will only occur as dot products. Gives us the task: Max Ld =  i – ½  i  j x i x j, Subject to: w =  i y i x i  i y i = 0 What we need to see: x i and x j (input vectors) appear only in the form of dot product – we will soon see why that is important.

Problems with linear SVM =-1 =+1 What if the decison function is not a linear?

Non-linear SVM 1 The Kernel trick =-1 =+1 Imagine a function  that maps the data into another space:  =Rd  =-1 =+1 Remember the function we want to optimize: Ld =  i – ½  i  j x i x j, x i and x j as a dot product. We will have  (x i )  (x j ) in the non-linear case. If there is a ”kernel function” K such as K(xi,xj) =  (xi)  (xj), we do not need to know  explicitly. One example: Rd  

Non-linear svm2 The function we end up optimizing is: Max Ld =  i – ½  i  j K(xix j ), Subject to: w =  i y i x i  i y i = 0 Another kernel example: The polynomial kernel K(xi,xj) = (xixj + 1) p, where p is a tunable parameter. Evaluating K only require one addition and one exponentiation more than the original dot product.

Solving the optimization problem In many cases any general purpose optimization package that solves linearly constrained equations will do. –Newtons’ method –Conjugate gradient descent Other methods involves nonlinear programming techniques.

Overtraining/overfitting =-1 =+1 An example: A botanist really knowing trees.Everytime he sees a new tree, he claims it is not a tree. A well known problem with machine learning methods is overtraining. This means that we have learned the training data very well, but we can not classify unseen examples correctly.

Overtraining/overfitting 2 It can be shown that: The portion, n, of unseen data that will be missclassified is bound by: n  No of support vectors / number of training examples A measure of the risk of overtraining with SVM (there are also other measures). Ockham´s razor principle: Simpler system are better than more complex ones. In SVM case: fewer support vectors mean a simpler representation of the hyperplane. Example: Understanding a certain cancer if it can be described by one gene is easier than if we have to describe it with 5000.

A practical example, protein localization Proteins are synthesized in the cytosol. Transported into different subcellular locations where they carry out their functions. Aim: To predict in what location a certain protein will end up!!!

Subcellular Locations

Method Hypothesis: The amino acid composition of proteins from different compartments should differ. Extract proteins with know subcellular location from SWISSPROT. Calculate the amino acid composition of the proteins. Try to differentiate between: cytosol, extracellular, mitochondria and nuclear by using SVM

Input encoding Prediction of nuclear proteins: Label the known nuclear proteins as +1 and all others as –1. The input vector xi represents the amino acid composition. Eg xi =(4.2,6.7,12,….,0.5) A, C, D,….., Y) Nuclear All others SVM Model

Cross-validation Cross validation: Split the data into n sets, train on n-1 set, test on the set left out of training. Nuclear All others 1 2 3 1 2 3 Test set 1 1 Training set 2 3 2 3

Performance measurments Model =+1 =-1 Predictions +1 Test data TP FP TN FN SP = TP /(TP+FP), the fraction of predicted +1 that actually are +1. SE = TP /(TP+FN), the fraction of the +1 that actually are predicted as +1. In this case: SP=5/(5+1) =0.83 SE = 5/(5+2) = 0.71

Results We definetely get some predictive power out of our models. Seems to be a difference in composition of proteins from different subcellular locations. Another questions: What about nuclear proteins. Is there a difference between DNA-binding proteins and others???

Conclusions We have (hopefully) learned some basic concepts and terminology of SVM. We know about the risk of overtraining and how to put a measure on the risk of bad generalization. SVMs can be useful for example in predicting subcellular location of proteins.

You can’t input anything into a learning machine!!! Image classification of tanks. Autofire when an enemy tank is spotted. Input data: Photos of own and enemy tanks. Worked really good with the training set used. In reality it failed completely. Reason: All enemy tank photos taken in the morning. All own tanks in dawn. The classifier could recognize dusk from dawn!!!!

References http://www.kernel-machines.org/ AN INTRODUCTION TO SUPPORT VECTOR MACHINES (and other kernel-based learning methods) N. Cristianini and J. Shawe-Taylor Cambridge University Press 2000 ISBN: 0 521 78019 5 http://www.support-vector.net/ Papers by Vapnik C.J.C. Burges: A tutorial on Support Vector Machines. Data Mining and Knowledge Discovery 2:121-167, 1998.

An Introduction to Support Vector Machine Classification Bioinformatics Lecture 7/2/2003 by Pierre Dönnes.

Similar presentations

Presentation on theme: "An Introduction to Support Vector Machine Classification Bioinformatics Lecture 7/2/2003 by Pierre Dönnes."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

An Introduction to Support Vector Machine Classification Bioinformatics Lecture 7/2/2003 by Pierre Dönnes.

Similar presentations

Presentation on theme: "An Introduction to Support Vector Machine Classification Bioinformatics Lecture 7/2/2003 by Pierre Dönnes."— Presentation transcript:

Similar presentations

About project

Feedback