Support Vector Machines Kernel Machines

Slides:

Advertisements

Similar presentations

Introduction to Support Vector Machines (SVM)

Advertisements

Support Vector Machine

7. Support Vector Machines (SVMs)

Lecture 9 Support Vector Machines

Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?

Support Vector Machine & Its Applications Mingyue Tan The University of British Columbia Nov 26, 2004 A portion (1/3) of the slides are taken from Prof.

SVM - Support Vector Machines A new classification method for both linear and nonlinear data It uses a nonlinear mapping to transform the original training.

Classification / Regression Support Vector Machines

An Introduction of Support Vector Machine

Support Vector Machines

SVM—Support Vector Machines

Support vector machine

Machine learning continued Image source:

CSCI 347 / CS 4206: Data Mining Module 07: Implementations Topic 03: Linear Models.

Separating Hyperplanes

Support Vector Machine

Support Vector Machines (and Kernel Methods in general)

Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?

Support Vector Machines (SVMs) Chapter 5 (Duda et al.)

University of Texas at Austin Machine Learning Group Department of Computer Sciences University of Texas at Austin Support Vector Machines.

1 Classification: Definition Given a collection of records (training set ) Each record contains a set of attributes, one of the attributes is the class.

Support Vector Classification (Linearly Separable Case, Primal) The hyperplanethat solves the minimization problem: realizes the maximal margin hyperplane.

Classification Problem 2-Category Linearly Separable Case A- A+ Malignant Benign.

Sketched Derivation of error bound using VC-dimension (1) Bound our usual PAC expression by the probability that an algorithm has 0 error on the training.

Support Vector Machines and Kernel Methods

Support Vector Machines

Support Vector Machines

Lecture 10: Support Vector Machines

Greg GrudicIntro AI1 Support Vector Machine (SVM) Classification Greg Grudic.

An Introduction to Support Vector Machines Martin Law.

Linear hyperplanes as classifiers Usman Roshan. Hyperplane separators.

Support Vector Machine & Image Classification Applications

CS 8751 ML & KDDSupport Vector Machines1 Support Vector Machines (SVMs) Learning mechanism based on linear programming Chooses a separating plane based.

Support Vector Machine (SVM) Based on Nello Cristianini presentation

1 SUPPORT VECTOR MACHINES İsmail GÜNEŞ. 2 What is SVM? A new generation learning system. A new generation learning system. Based on recent advances in.

计算机学院计算感知 Support Vector Machines. 2 University of Texas at Austin Machine Learning Group 计算感知计算机学院 Perceptron Revisited: Linear Separators Binary classification.

Kernel Methods A B M Shawkat Ali 1 2 Data Mining ¤ DM or KDD (Knowledge Discovery in Databases) Extracting previously unknown, valid, and actionable.

SVM Support Vector Machines Presented by: Anas Assiri Supervisor Prof. Dr. Mohamed Batouche.

Support Vector Machines Reading: Ben-Hur and Weston, “A User’s Guide to Support Vector Machines” (linked from class web page)

CS Statistical Machine learning Lecture 18 Yuan (Alan) Qi Purdue CS Oct

Classifiers Given a feature representation for images, how do we learn a model for distinguishing features from different classes? Zebra Non-zebra Decision.

An Introduction to Support Vector Machines (M. Law)

1 Chapter 6. Classification and Prediction Overview Classification algorithms and methods Decision tree induction Bayesian classification Lazy learning.

CISC667, F05, Lec22, Liao1 CISC 667 Intro to Bioinformatics (Fall 2005) Support Vector Machines I.

CS 478 – Tools for Machine Learning and Data Mining SVM.

Kernel Methods: Support Vector Machines Maximum Margin Classifiers and Support Vector Machines.

Ohad Hageby IDC Support Vector Machines & Kernel Machines IP Seminar 2008 IDC Herzliya.

Linear hyperplanes as classifiers Usman Roshan. Hyperplane separators.

An Introduction to Support Vector Machine (SVM)

CS 1699: Intro to Computer Vision Support Vector Machines Prof. Adriana Kovashka University of Pittsburgh October 29, 2015.

1  The Problem: Consider a two class task with ω 1, ω 2   LINEAR CLASSIFIERS.

Support vector machine LING 572 Fei Xia Week 8: 2/23/2010 TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A 1.

Dec 21, 2006For ICDM Panel on 10 Best Algorithms Support Vector Machines: A Survey Qiang Yang, for ICDM 2006 Panel Partially.

1  Problem: Consider a two class task with ω 1, ω 2   LINEAR CLASSIFIERS.

Support Vector Machines Tao Department of computer science University of Illinois.

Support Vector Machines. Notation Assume a binary classification problem. –Instances are represented by vector x   n. –Training examples: x = (x 1,

Support Vector Machines Exercise solutions Ata Kaban The University of Birmingham.

Support Vector Machines Reading: Ben-Hur and Weston, “A User’s Guide to Support Vector Machines” (linked from class web page)

Linear hyperplanes as classifiers Usman Roshan. Hyperplane separators.

CpSc 810: Machine Learning Support Vector Machine.

Greg GrudicIntro AI1 Support Vector Machine (SVM) Classification Greg Grudic.

Kernel Methods: Support Vector Machines Maximum Margin Classifiers and Support Vector Machines.

Support Vector Machine: An Introduction. (C) by Yu Hen Hu 2 Linear Hyper-plane Classifier For x in the side of o : w T x + b  0; d = +1; For.

An Introduction of Support Vector Machine In part from of Jinwei Gu.

Roughly overview of Support vector machines Reference: 1.Support vector machines and machine learning on documents. Christopher D. Manning, Prabhakar Raghavan.

A Brief Introduction to Support Vector Machine (SVM) Most slides were from Prof. A. W. Moore, School of Computer Science, Carnegie Mellon University.

Support Vector Machines Reading: Textbook, Chapter 5 Ben-Hur and Weston, A User’s Guide to Support Vector Machines (linked from class web page)

An Introduction to Support Vector Machines

Support Vector Machines Introduction to Data Mining, 2nd Edition by

CS 2750: Machine Learning Support Vector Machines

Presentation transcript:

Support Vector Machines Kernel Machines Ata Kaban The University of Birmingham

Remember the XOR problem? Today we learn how to solve it.

Remember the XOR problem?

Support Vector Machines (SVM) Method for supervised learning problems Classification Regression Two key ideas Assuming linearly separable classes, learn separating hyperplane with maximum margin Expand input into high-dimensional space to deal with linearly non-separable cases (such as the XOR)

Separating hyperplane Training set: (xi, yi), i=1,2,…N; yi{+1,-1} Hyperplane: wx+b=0 This is fully determined by (w,b) where x=(x1, x2, …, xd), w=(w1, w2, …, wd), wx=(w1x1+w2x2+…+wdxd) dot product

Maximum margin According to a theorem from Learning Theory, from all possible linear decision functions the one that maximises the margin of the training set will minimise the generalisation error. [of course, given enough data points and assuming that data is not noisy!]

Maximum margin w wx+b>0 wx+b<0 wx+b=0 Note1: with c being a constant, the decision functions (w,b) and (cw, cb) are the same. Note2: but margins as measured by the outputs of the function xwx+b are not the same if we take (cw, cb). Definition: geometric margin: the margin given by the canonical decision function, which is when c=1/||w|| Strategy: 1) we need to maximise the geometric margin! (cf result from learning theory) 2) subject to the constraint that training examples are classified correctly w wx+b>0 wx+b<0 wx+b=0

Maximum margin According to Note1, we can demand the function output for the nearest points to be +1 and –1 on the two sides of the decision function. This removes the scaling freedom. Denoting a nearest positive example x+ and a nearest negative example x-, this is Computing the geometric margin (that has to be maximised): And here are the constraints:

Maximum margin – summing up Given a linearly separable training set (xi, yi), i=1,2,…N; yi{+1,-1} Minimise ||w||2 Subject to  This is a quadratic programming problem with linear inequality constraints. There are well known procedures for solving it  wx+b>1 wx+b<1 wx+b=1 wx+b=0 wx+b=-1

Support vectors The training points that are nearest to the separating function are called support vectors. What is the output of our decision function for these points?

Solving (not req for exam) Construct & minimise the Lagrangian Take derivatives wrt. w and b, equate them to 0 The Lagrange multipliers i are called ‘dual variables’ Each training point has an associated dual variable. parameters are expressed as a linear combination of training points only SVs will have non-zero i

(not req for exam)

Solving (not req for exam) Plug this back into the Lagrangian to obtain the dual formulation (homework) The resulting dual that is solved for  by using a QP solver: The b does not appear in the dual so it is determined separately from the initial constraints (homework) Data enters only in the form of dot products!

Classifying new data points (not req for exam) Once the parameters (*, b*) are found by solving the required quadratic optimisation on the training set of points, the SVM is ready to be used for classifying new points. Given new point x, its class membership is sign[f(x, *, b*)], where Data enters only in the form of dot products!

Solution The solution of the SVM, i.e. of the quadratic programming problem with linear inequality constraints has the nice property that the data enters only in the form of dot products! Dot product (notation & memory refreshing): given x=(x1,x2,…xn) and y=(y1,y2,…yn), then the dot product of x and y is xy=(x1y1+x2y2+…+xnyn). This is nice because it allows us to make SVMs non-linear without complicating the algorithm  see on next slide. If you want to use SVM in practice, many sw are available, see e.g. the Resources page at the back of this handout. If you want to understand what the sw does, then you need to master the previous slides marked as ‘not req for exam’.

Non-linear SVMs Transform x  (x) The linear algorithm depends only on xxi, hence transformed algorithm depends only on (x)(xi) Use kernel function K(x,y) such that K(x,y)= (x)(y)

Examples of kernels Example1: x, y points in 2D input, 3D feature space Example2: the  that corresponds to this kernel has infinite dimension. Note: Not every function is a proper kernel. There is a theorem called Mercer Theorem that characterises proper kernels. To test a new input x when working with kernels (square of dot product)

Making new kernels from the old New kernels can be made from valid kernels by allowed operations e.g. addition, multiplication and rescaling of kernels gives a proper kernel as long as the resulting Gram matrix is positive definite. Also, given a real-valued function f(x) over inputs x, then the following is a valid kernel

Using SVM for classification Prepare the data matrix Select the kernel function to use Execute the training algorithm using a QP solver to obtain the i values Unseen data can be classified using the i values and the support vectors

Applications Handwritten digits recognition Of interest to the US Postal services 4% error was obtained about 4% of the training data were SVs only Text categorisation Face detection DNA analysis …many others

Discriminative versus generative classification methods SVMs learn the discrimination boundary. They are called discriminatory approaches. This is in contrast to learning a model for each class, like e.g. Bayesian classification does. This latter approach is called generative approach. SVM tries to avoid overfitting in high dimensional spaces (cf regularisation)

Conclusions SVMs learn linear decision boundaries (cf perceptrons) They pick the hyperplane that maximises the margin The optimal hyperplane turns out to be a linear combination of support vectors Transform nonlinear problems to higher dimensional space using kernel functions; then there is more chance that in the transformed space the classes will be linearly separable.

Resources SW & practical guide to SVM for beginners http://www.csie.ntu.edu.tw/~cjlin/libsvm/ Kernel machines website: http://www.kernel-machines.org/ Burges, C.J. C: A tutorial on Support Vector Machines for Pattern Recognition. Data Mining and Knowledge Discovery, Vol.2, nr.2, pp.121—167, 1998. Available from http://svm.research.bell-labs.com/SVMdoc.html Cristianini & Shawe-Taylor: SVM book (in the School library)