Kernel Methods: Support Vector Machines Maximum Margin Classifiers and Support Vector Machines.

Slides:



Advertisements
Similar presentations
Introduction to Support Vector Machines (SVM)
Advertisements

Support Vector Machine
Support Vector Machines
Lecture 9 Support Vector Machines
SVM - Support Vector Machines A new classification method for both linear and nonlinear data It uses a nonlinear mapping to transform the original training.
Support Vector Machines Instructor Max Welling ICS273A UCIrvine.
An Introduction of Support Vector Machine
Support Vector Machines
SVM—Support Vector Machines
Support vector machine
SVMs Reprised. Administrivia I’m out of town Mar 1-3 May have guest lecturer May cancel class Will let you know more when I do...
Support Vector Machines
Support Vector Machines and Kernel Methods
Support Vector Machines (SVMs) Chapter 5 (Duda et al.)
University of Texas at Austin Machine Learning Group Department of Computer Sciences University of Texas at Austin Support Vector Machines.
1 Classification: Definition Given a collection of records (training set ) Each record contains a set of attributes, one of the attributes is the class.
Rutgers CS440, Fall 2003 Support vector machines Reading: Ch. 20, Sec. 6, AIMA 2 nd Ed.
Support Vector Machines Based on Burges (1998), Scholkopf (1998), Cristianini and Shawe-Taylor (2000), and Hastie et al. (2001) David Madigan.
Support Vector Machines Kernel Machines
Constrained Optimization Rong Jin. Outline  Equality constraints  Inequality constraints  Linear Programming  Quadratic Programming.
Sketched Derivation of error bound using VC-dimension (1) Bound our usual PAC expression by the probability that an algorithm has 0 error on the training.
Support Vector Machines and Kernel Methods
Support Vector Machines
1 Computational Learning Theory and Kernel Methods Tianyi Jiang March 8, 2004.
SVMs Finalized. Where we are Last time Support vector machines in grungy detail The SVM objective function and QP Today Last details on SVMs Putting it.
2806 Neural Computation Support Vector Machines Lecture Ari Visa.
Lecture outline Support vector machines. Support Vector Machines Find a linear hyperplane (decision boundary) that will separate the data.
SVMs Reprised Reading: Bishop, Sec 4.1.1, 6.0, 6.1, 7.0, 7.1.
Support Vector Machines
Lecture 10: Support Vector Machines
Greg GrudicIntro AI1 Support Vector Machine (SVM) Classification Greg Grudic.
Constrained Optimization Rong Jin. Outline  Equality constraints  Inequality constraints  Linear Programming  Quadratic Programming.
SVMs, cont’d Intro to Bayesian learning. Quadratic programming Problems of the form Minimize: Subject to: are called “quadratic programming” problems.
Linear hyperplanes as classifiers Usman Roshan. Hyperplane separators.
Support Vector Machine & Image Classification Applications
CS 8751 ML & KDDSupport Vector Machines1 Support Vector Machines (SVMs) Learning mechanism based on linear programming Chooses a separating plane based.
1 SUPPORT VECTOR MACHINES İsmail GÜNEŞ. 2 What is SVM? A new generation learning system. A new generation learning system. Based on recent advances in.
计算机学院 计算感知 Support Vector Machines. 2 University of Texas at Austin Machine Learning Group 计算感知 计算机学院 Perceptron Revisited: Linear Separators Binary classification.
10/18/ Support Vector MachinesM.W. Mak Support Vector Machines 1. Introduction to SVMs 2. Linear SVMs 3. Non-linear SVMs References: 1. S.Y. Kung,
An Introduction to Support Vector Machine (SVM) Presenter : Ahey Date : 2007/07/20 The slides are based on lecture notes of Prof. 林智仁 and Daniel Yeung.
Kernel Methods A B M Shawkat Ali 1 2 Data Mining ¤ DM or KDD (Knowledge Discovery in Databases) Extracting previously unknown, valid, and actionable.
Support Vector Machines Reading: Ben-Hur and Weston, “A User’s Guide to Support Vector Machines” (linked from class web page)
CS Statistical Machine learning Lecture 18 Yuan (Alan) Qi Purdue CS Oct
Computational Intelligence: Methods and Applications Lecture 23 Logistic discrimination and support vectors Włodzisław Duch Dept. of Informatics, UMK Google:
CISC667, F05, Lec22, Liao1 CISC 667 Intro to Bioinformatics (Fall 2005) Support Vector Machines I.
CS 478 – Tools for Machine Learning and Data Mining SVM.
Kernel Methods: Support Vector Machines Maximum Margin Classifiers and Support Vector Machines.
Ohad Hageby IDC Support Vector Machines & Kernel Machines IP Seminar 2008 IDC Herzliya.
Linear hyperplanes as classifiers Usman Roshan. Hyperplane separators.
An Introduction to Support Vector Machine (SVM)
University of Texas at Austin Machine Learning Group Department of Computer Sciences University of Texas at Austin Support Vector Machines.
Linear Discriminant Functions  Discriminant Functions  Least Squares Method  Fisher’s Linear Discriminant  Probabilistic Generative Models.
Support Vector Machine Debapriyo Majumdar Data Mining – Fall 2014 Indian Statistical Institute Kolkata November 3, 2014.
Support Vector Machines
1  Problem: Consider a two class task with ω 1, ω 2   LINEAR CLASSIFIERS.
Support Vector Machines Tao Department of computer science University of Illinois.
Support Vector Machines. Notation Assume a binary classification problem. –Instances are represented by vector x   n. –Training examples: x = (x 1,
Final Exam Review CS479/679 Pattern Recognition Dr. George Bebis 1.
Support Vector Machines Reading: Ben-Hur and Weston, “A User’s Guide to Support Vector Machines” (linked from class web page)
Greg GrudicIntro AI1 Support Vector Machine (SVM) Classification Greg Grudic.
SVMs in a Nutshell.
Support Vector Machine: An Introduction. (C) by Yu Hen Hu 2 Linear Hyper-plane Classifier For x in the side of o : w T x + b  0; d = +1; For.
An Introduction of Support Vector Machine In part from of Jinwei Gu.
Support Vector Machines Reading: Textbook, Chapter 5 Ben-Hur and Weston, A User’s Guide to Support Vector Machines (linked from class web page)
Support Vector Machines (SVMs) Chapter 5 (Duda et al.) CS479/679 Pattern Recognition Dr. George Bebis.
Support Vector Machine
Support Vector Machines
Support Vector Machines Introduction to Data Mining, 2nd Edition by
Support Vector Machines Most of the slides were taken from:
Support Vector Machines
Presentation transcript:

Kernel Methods: Support Vector Machines Maximum Margin Classifiers and Support Vector Machines

Generalized Linear Discriminant Functions A linear discriminant function g(x) can be written as: g(x) = wo + Σ i w i x i i = 1, …, d (d is number of features). We could add additional terms to obtain a quadratic function: g(x) = wo + Σ i w i x i + Σ i Σ j w ij x i x j The quadratic discriminant function introduces d(d-1)/2 coefficients corresponding to the products of attributes. The surfaces are thus more complicated(hyperquadric surfaces).

Generalized Linear Discriminant Functions We could even add more terms w ijk x i x j x k and obtain the class of polynomial discriminant functions. The generalized form is g(x) = Σ i a i φ i (x) g(x) = a t φ Where the summation goes over all functions φ i (x). The φ i (x) functions are called the phi or φ functions. The function is now linear on the φ i (x). The functions map a d-dimensional x-space into a d’ dimensional y-space. Example: g(x) = a 1 + a 2 x + a 3 x 2 φ = (1 x x 2 ) t

Figure 5.5

Support Vector Machines What are support vector machines (SVMs)? A very popular classifier that is based on the concepts previously discussed on linear discriminants and the new concept of margins. To begin, SVMs preprocess the data by representing all examples in a higher dimensional space. With sufficiently high dimensions the classes can be separated by a hyperplane.

The Margin

The Goal in Support Vector Machines Now, let t be 1 or – 1 depending on the example x being of class positive or negative. A separating hyperplane ensures that: t g(x) >= 0 The goal in support vector machines is to find the separating hyperplane with the “largest” margin. Margin is the distance between the hyperplane and the closest example to it.

The Support Vectors Now the distance from a pattern x to a hyperplane is g(x) / ||w||. So let’s change our objective to finding a vector w that maximizes the margin m in the equation: t g(x) / ||w|| >= m We can also say that the support vectors are those patterns x for which t g(x) / ||w|| = 1, because we can rescale the w vector and leave the hyperplane in the same place. Support vectors are equally close to the hyperplane. These are the patterns that are most difficult to separate. These are the most “informative” patterns.

The Support Vectors We said we want to find a vector w that maximizes the equation: t g(x) / ||w|| >= 1 This means all we really need to do is to maximize ||w|| -1 under certain constraints. So we have the following optimization problem: arg min w ½ ||w|| 2 subject to t g(x) >= 1 This can be solved using Lagrange Multipliers

The Support Vectors What happens when there are unavoidable errors? arg min w ½ ||w|| 2 + λ ∑ e i subject to t g(x i ) >= 1 - e i where e i is the error incurred by example x i These are known as slack variables.

The Support Vectors We can write this in a dual form (Karush-Kuhn-Tucker construction). max ∑  i – ½ ∑ ∑  i  j t i t j (x i. x j ) subject to 0 <=  i <= λ and ∑  i x i = 0

The Support Vectors The final result is a set of  i, one for each training example. The optimal hyperplane can be expressed in the dual representation as: f(x) = ∑ y i  i + b where w = ∑ y i  i x i

The Support Vectors We can use kernel functions to map from the original space to a new space. max ∑  i – ½ ∑ ∑  i  j t i t j (  (x i ).  (x j ) ) subject to 0 <=  i <= λ and ∑  i x i = 0

The Support Vectors Computing the dot product is simplified: Polynomial kernels:  (x i ).  (x j ) = ∑ x i x j + ∑ x i 2 x j 2 + … But fortunately that is equal to: (1 + x i. x j ) 2 = K( x i, x j ) In general all we need is to compute the dot product of all examples in the original space. This results in the Gram matrix K

The Support Vectors The final formulation is as follows: max ∑  i – ½ ∑ ∑  i  j t i t j K (x i. x j ) subject to 0 <=  i <= λ and ∑  i x i = 0

Historical Background Vladimir Vapnik: Publications: 6 books and over a hundred research papers. Developed a theory for expected risk minimization. Invented Support Vector Machines

Historical Background Alexey Chervonenkis With Vladimir Vapnik developed the concept of the Vapnik-Chervonenkis dimension.

An Example The XOR problem is known to be non-separable: x1 x We use phi functions (1, 1.41x1, 1.41x2, 1.41x1x2, x1 2, x2 2 ) (hidden in the kernel function).

An Example The optimal hyperplane is found to be g(x1,x2) = x1x2 = 0. The margin is p = x x1x b = 1.41 g = 0

Benefits of SVMs Benefits:  The complexity of the classifier is based on the number of support vectors rather than the dimensionality of the feature space.  This makes the algorithm less prone to overfitting