Support Vector Machines Introduction to Data Mining, 2nd Edition by

Slides:



Advertisements
Similar presentations
Introduction to Support Vector Machines (SVM)
Advertisements

Support Vector Machine
Christoph F. Eick Questions and Topics Review Nov. 30, Give an example of a problem that might benefit from feature creation 2.How does DENCLUE.
Lecture 9 Support Vector Machines
ECG Signal processing (2)
SVM - Support Vector Machines A new classification method for both linear and nonlinear data It uses a nonlinear mapping to transform the original training.
Classification / Regression Support Vector Machines
Christoph F. Eick Questions and Topics Review Nov. 22, Assume you have to do feature selection for a classification task. What are the characteristics.
An Introduction of Support Vector Machine
Support Vector Machines
SVM—Support Vector Machines
Support vector machine
CSCI 347 / CS 4206: Data Mining Module 07: Implementations Topic 03: Linear Models.
Classification and Decision Boundaries
Support Vector Machines (SVMs) Chapter 5 (Duda et al.)
University of Texas at Austin Machine Learning Group Department of Computer Sciences University of Texas at Austin Support Vector Machines.
CES 514 – Data Mining Lecture 8 classification (contd…)
Support Vector Classification (Linearly Separable Case, Primal) The hyperplanethat solves the minimization problem: realizes the maximal margin hyperplane.
Support Vector Machine (SVM) Classification
Support Vector Machines and Kernel Methods
Support Vector Machines
1 Computational Learning Theory and Kernel Methods Tianyi Jiang March 8, 2004.
Lecture outline Support vector machines. Support Vector Machines Find a linear hyperplane (decision boundary) that will separate the data.
Lecture 10: Support Vector Machines
Greg GrudicIntro AI1 Support Vector Machine (SVM) Classification Greg Grudic.
An Introduction to Support Vector Machines Martin Law.
Ch. Eick: Support Vector Machines: The Main Ideas Reading Material Support Vector Machines: 1.Textbook 2. First 3 columns of Smola/Schönkopf article on.
Support Vector Machine & Image Classification Applications
计算机学院 计算感知 Support Vector Machines. 2 University of Texas at Austin Machine Learning Group 计算感知 计算机学院 Perceptron Revisited: Linear Separators Binary classification.
Kernel Methods A B M Shawkat Ali 1 2 Data Mining ¤ DM or KDD (Knowledge Discovery in Databases) Extracting previously unknown, valid, and actionable.
Support Vector Machines Reading: Ben-Hur and Weston, “A User’s Guide to Support Vector Machines” (linked from class web page)
An Introduction to Support Vector Machines (M. Law)
1 Chapter 6. Classification and Prediction Overview Classification algorithms and methods Decision tree induction Bayesian classification Lazy learning.
Kernels Usman Roshan CS 675 Machine Learning. Feature space representation Consider two classes shown below Data cannot be separated by a hyperplane.
CISC667, F05, Lec22, Liao1 CISC 667 Intro to Bioinformatics (Fall 2005) Support Vector Machines I.
CS 478 – Tools for Machine Learning and Data Mining SVM.
Kernel Methods: Support Vector Machines Maximum Margin Classifiers and Support Vector Machines.
Support vector machine LING 572 Fei Xia Week 8: 2/23/2010 TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A 1.
CSE4334/5334 DATA MINING CSE4334/5334 Data Mining, Fall 2014 Department of Computer Science and Engineering, University of Texas at Arlington Chengkai.
Support Vector Machine Debapriyo Majumdar Data Mining – Fall 2014 Indian Statistical Institute Kolkata November 3, 2014.
Support Vector Machines. Notation Assume a binary classification problem. –Instances are represented by vector x   n. –Training examples: x = (x 1,
Final Exam Review CS479/679 Pattern Recognition Dr. George Bebis 1.
Support Vector Machines Reading: Ben-Hur and Weston, “A User’s Guide to Support Vector Machines” (linked from class web page)
Greg GrudicIntro AI1 Support Vector Machine (SVM) Classification Greg Grudic.
Kernel Methods: Support Vector Machines Maximum Margin Classifiers and Support Vector Machines.
SVMs in a Nutshell.
Support Vector Machine: An Introduction. (C) by Yu Hen Hu 2 Linear Hyper-plane Classifier For x in the side of o : w T x + b  0; d = +1; For.
Support Vector Machines Reading: Textbook, Chapter 5 Ben-Hur and Weston, A User’s Guide to Support Vector Machines (linked from class web page)
Support Vector Machines (SVMs) Chapter 5 (Duda et al.) CS479/679 Pattern Recognition Dr. George Bebis.
Support Vector Machine Slides from Andrew Moore and Mingyue Tan.
Support Vector Machine
CS 9633 Machine Learning Support Vector Machines
PREDICT 422: Practical Machine Learning
Support Vector Machine
Support Vector Machines
Geometrical intuition behind the dual problem
Support Vector Machines
An Introduction to Support Vector Machines
Kernels Usman Roshan.
Statistical Learning Dong Liu Dept. EEIS, USTC.
CS 2750: Machine Learning Support Vector Machines
Support Vector Machines Most of the slides were taken from:
COSC 4335: Other Classification Techniques
Support Vector Machines
Machine Learning Week 3.
Lecture 18. SVM (II): Non-separable Cases
Usman Roshan CS 675 Machine Learning
COSC 4368 Machine Learning Organization
CISC 841 Bioinformatics (Fall 2007) Kernel Based Methods (I)
SVMs for Document Ranking
Presentation transcript:

Support Vector Machines Introduction to Data Mining, 2nd Edition by Tan, Steinbach, Karpatne, Kumar

Support Vector Machines Find a linear hyperplane (decision boundary) that will separate the data

Support Vector Machines One Possible Solution

Support Vector Machines Another possible solution

Support Vector Machines Other possible solutions

Support Vector Machines Which one is better? B1 or B2? How do you define better?

Support Vector Machines Find hyperplane maximizes the margin => B1 is better than B2

Support Vector Machines

Linear SVM Linear model: Learning the model is equivalent to determining the values of How to find from training data?

Learning Linear SVM Objective is to maximize: Which is equivalent to minimizing: Subject to the following constraints: or This is a constrained optimization problem Solve it using Lagrange multiplier method

Example of Linear SVM Support vectors

Learning Linear SVM Decision boundary depends only on support vectors If you have data set with same support vectors, decision boundary will not change How to classify using SVM once w and b are found? Given a test record, xi

Support Vector Machines What if the problem is not linearly separable?

Support Vector Machines What if the problem is not linearly separable? Introduce slack variables Need to minimize: Subject to: If k is 1 or 2, this leads to same objective function as linear SVM but with different constraints (see textbook)

Support Vector Machines Find the hyperplane that optimizes both factors

Nonlinear Support Vector Machines What if decision boundary is not linear?

Nonlinear Support Vector Machines Trick: Transform data into higher dimensional space Decision boundary:

Learning Nonlinear SVM Optimization problem: Which leads to the same set of equations (but involve (x) instead of x)

Learning NonLinear SVM Issues: What type of mapping function  should be used? How to do the computation in high dimensional space? Most computations involve dot product (xi) (xj) Curse of dimensionality?

Learning Nonlinear SVM Kernel Trick: (xi) (xj) = K(xi, xj) K(xi, xj) is a kernel function (expressed in terms of the coordinates in the original space) Examples:

Example of Nonlinear SVM SVM with polynomial degree 2 kernel

Learning Nonlinear SVM Advantages of using kernel: Don’t have to know the mapping function  Computing dot product (xi) (xj) in the original space avoids curse of dimensionality Not all functions can be kernels Must make sure there is a corresponding  in some high-dimensional space Mercer’s theorem (see textbook)

Characteristics of SVM Since the learning problem is formulated as a convex optimization problem, efficient algorithms are available to find the global minima of the objective function (many of the other methods use greedy approaches and find locally optimal solutions) Overfitting is addressed by maximizing the margin of the decision boundary, but the user still needs to provide the type of kernel function and cost function Difficult to handle missing values Robust to noise High computational complexity for building the model