Ch. Eick: Support Vector Machines: The Main Ideas Reading Material Support Vector Machines: 1.Textbook 2. First 3 columns of Smola/Schönkopf article on.

Slides:



Advertisements
Similar presentations
Introduction to Support Vector Machines (SVM)
Advertisements

CHAPTER 13: Alpaydin: Kernel Machines
Christoph F. Eick Questions and Topics Review Nov. 30, Give an example of a problem that might benefit from feature creation 2.How does DENCLUE.
Lecture 9 Support Vector Machines
ECG Signal processing (2)
Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?
SVM - Support Vector Machines A new classification method for both linear and nonlinear data It uses a nonlinear mapping to transform the original training.
Classification / Regression Support Vector Machines

Support Vector Machines Instructor Max Welling ICS273A UCIrvine.
CHAPTER 10: Linear Discrimination
Christoph F. Eick Questions and Topics Review Nov. 22, Assume you have to do feature selection for a classification task. What are the characteristics.
An Introduction of Support Vector Machine
Support Vector Machines and Kernels Adapted from slides by Tim Oates Cognition, Robotics, and Learning (CORAL) Lab University of Maryland Baltimore County.
Support Vector Machines
SVM—Support Vector Machines
Support vector machine
CSCI 347 / CS 4206: Data Mining Module 07: Implementations Topic 03: Linear Models.
INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Classification and Decision Boundaries
Support Vector Machines (and Kernel Methods in general)
Support Vector Machines and Kernel Methods
Support Vector Machines (SVMs) Chapter 5 (Duda et al.)
University of Texas at Austin Machine Learning Group Department of Computer Sciences University of Texas at Austin Support Vector Machines.
Rutgers CS440, Fall 2003 Support vector machines Reading: Ch. 20, Sec. 6, AIMA 2 nd Ed.
CES 514 – Data Mining Lecture 8 classification (contd…)
Support Vector Classification (Linearly Separable Case, Primal) The hyperplanethat solves the minimization problem: realizes the maximal margin hyperplane.
Classification Problem 2-Category Linearly Separable Case A- A+ Malignant Benign.
Support Vector Machines
SVMs Finalized. Where we are Last time Support vector machines in grungy detail The SVM objective function and QP Today Last details on SVMs Putting it.
Lecture outline Support vector machines. Support Vector Machines Find a linear hyperplane (decision boundary) that will separate the data.
Support Vector Machines
Lecture 10: Support Vector Machines
Greg GrudicIntro AI1 Support Vector Machine (SVM) Classification Greg Grudic.
Optimization Theory Primal Optimization Problem subject to: Primal Optimal Value:
SVMs, cont’d Intro to Bayesian learning. Quadratic programming Problems of the form Minimize: Subject to: are called “quadratic programming” problems.
Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)
An Introduction to Support Vector Machines Martin Law.
Support Vector Machine & Image Classification Applications
CS 8751 ML & KDDSupport Vector Machines1 Support Vector Machines (SVMs) Learning mechanism based on linear programming Chooses a separating plane based.
DATA MINING LECTURE 10 Classification k-nearest neighbor classifier Naïve Bayes Logistic Regression Support Vector Machines.
1 SUPPORT VECTOR MACHINES İsmail GÜNEŞ. 2 What is SVM? A new generation learning system. A new generation learning system. Based on recent advances in.
计算机学院 计算感知 Support Vector Machines. 2 University of Texas at Austin Machine Learning Group 计算感知 计算机学院 Perceptron Revisited: Linear Separators Binary classification.
Kernel Methods A B M Shawkat Ali 1 2 Data Mining ¤ DM or KDD (Knowledge Discovery in Databases) Extracting previously unknown, valid, and actionable.
Support Vector Machines Reading: Ben-Hur and Weston, “A User’s Guide to Support Vector Machines” (linked from class web page)
An Introduction to Support Vector Machines (M. Law)
Kernels Usman Roshan CS 675 Machine Learning. Feature space representation Consider two classes shown below Data cannot be separated by a hyperplane.
CS 478 – Tools for Machine Learning and Data Mining SVM.
Kernel Methods: Support Vector Machines Maximum Margin Classifiers and Support Vector Machines.
University of Texas at Austin Machine Learning Group Department of Computer Sciences University of Texas at Austin Support Vector Machines.
Support vector machine LING 572 Fei Xia Week 8: 2/23/2010 TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A 1.
CSE4334/5334 DATA MINING CSE4334/5334 Data Mining, Fall 2014 Department of Computer Science and Engineering, University of Texas at Arlington Chengkai.
Support Vector Machines Tao Department of computer science University of Illinois.
Support Vector Machines. Notation Assume a binary classification problem. –Instances are represented by vector x   n. –Training examples: x = (x 1,
Final Exam Review CS479/679 Pattern Recognition Dr. George Bebis 1.
Support Vector Machines Reading: Ben-Hur and Weston, “A User’s Guide to Support Vector Machines” (linked from class web page)
Greg GrudicIntro AI1 Support Vector Machine (SVM) Classification Greg Grudic.
Kernel Methods: Support Vector Machines Maximum Margin Classifiers and Support Vector Machines.
SVMs in a Nutshell.
Support Vector Machines Reading: Textbook, Chapter 5 Ben-Hur and Weston, A User’s Guide to Support Vector Machines (linked from class web page)
Non-separable SVM's, and non-linear classification using kernels Jakob Verbeek December 16, 2011 Course website:
Support Vector Machines (SVMs) Chapter 5 (Duda et al.) CS479/679 Pattern Recognition Dr. George Bebis.
PREDICT 422: Practical Machine Learning
Support Vector Machine
An Introduction to Support Vector Machines
Support Vector Machines Introduction to Data Mining, 2nd Edition by
COSC 4335: Other Classification Techniques
CSCE833 Machine Learning Lecture 9 Linear Discriminant Analysis
COSC 4368 Machine Learning Organization
Linear Discrimination
Presentation transcript:

Ch. Eick: Support Vector Machines: The Main Ideas Reading Material Support Vector Machines: 1.Textbook 2. First 3 columns of Smola/Schönkopf article on SV Regression 3.

2 Likelihood- vs. Discriminant- based Classification Likelihood-based: Assume a model for p(x|C i ), use Bayes’ rule to calculate P(C i |x) g i (x) = log P(C i |x) Discriminant-based: Assume a model for g i (x| Φ i ); no density estimation Prototype-based: Make classification decisions based on nearest prototypes without constructing decision boundaries (kNN, kMeans approach) Estimating the boundaries is enough; no need to accurately estimate the densities/probability inside the boundaries; we are just interested in learning decision boundaries (lines for which the densities of two classes is the same), and many popular classification techniques learn decision boundaries without explicitly constructing density functions. Eick: Support Vector Machines: The Main Ideas

Support Vector Machines SVMs use a single hyperplane; one Possible Solution Eick: Support Vector Machines: The Main Ideas

Support Vector Machines Another possible solution Eick: Support Vector Machines: The Main Ideas

Support Vector Machines Other possible solutions Eick: Support Vector Machines: The Main Ideas

Support Vector Machines Which one is better? B1 or B2? How do you define better? Eick: Support Vector Machines: The Main Ideas

Support Vector Machines Find a hyperplane maximizing the margin => B1 is better than B2 Eick: Support Vector Machines: The Main Ideas

8 Key Properties of Support Vector Machines 1. Use a single hyperplane which subdivides the space into two half- spaces, one which is occupied by Class1 and the other by Class2 2. They maximize the margin of the decision boundary using quadratic optimization techniques which find the optimal hyperplane. 3. When used in practice, SVM approaches frequently map (using  ) the examples to a higher dimensional space and find margin maximal hyperplanes in the mapped space, obtaining decision boundaries which are not hyperplanes in the original space. 4. Moreover, versions of SVMs exist that can be used when linear separability cannot be accomplished. Eick: Support Vector Machines: The Main Ideas

Support Vector Machines Examples are: (x 1,..,x n,y) with y  {-1,1} Eick: Support Vector Machines: The Main Ideas L2 Norm: Dot-Product:

Support Vector Machines We want to maximize:  Which is equivalent to minimizing:  But subjected to the following N constraints:  This is a constrained convex quadratic optimization problem that can be solved in polynominal time Numerical approaches to solve it (e.g., quadratic programming) exist The function to be optimized has only a single minimum  no local minimum problem Eick: Support Vector Machines: The Main Ideas Dot-Product:

Support Vector Machines What if the problem is not linearly separable? Eick: Support Vector Machines: The Main Ideas

Linear SVM for Non-linearly Separable Problems What if the problem is not linearly separable?  Introduce slack variables  Need to minimize:  Subject to (i=1,..,N):  C is chosen using a validation set trying to keep the margins wide while keeping the training error low. Measures prediction error Inverse size of margin between hyperplanes Parameter Slack variable allows constraint violation to a certain degree Eick: Support Vector Machines: The Main Ideas No kernel

Nonlinear Support Vector Machines What if decision boundary is not linear? Alternative 1: Use technique that Employs non-linear decision boundaries Non-linear function Eick: Support Vector Machines: The Main Ideas

Nonlinear Support Vector Machines 1. Transform data into higher dimensional space 2. Find the best hyperplane using the methods introduced earlier Alternative 2: Transform into a higher dimensional attribute space and find linear decision boundaries in this space Eick: Support Vector Machines: The Main Ideas

Nonlinear Support Vector Machines 1. Choose a non-linear function  to transform into a different, usually higher dimensional, attribute space 2. Minimize  but subjected to the following N constraints: Find a good hyperplane in the transformed space Eick: Support Vector Machines: The Main Ideas Remark: The Soft Margin SVM can be generalized similarly.

Example: Polynomial Kernel Function Polynomial Kernel Function:  (x1,x2)=(x1 2,x2 2,sqrt(2)*x1,sqrt(2)*x2,1) K(u,v)=  (u)  (v)= (u  v + 1) 2 A Support Vector Machine with polynomial kernel function classifies a new example z as follows: sign((  i y i  (x i )  (z))+b) = sign((  i y i  (x i  z +1) 2 ))+b) Remark: i and b are determined using the methods for linear SVMs that were discussed earlier Kernel function trick: perform computations in the original space, although we solve an optimization problem in the transformed space  more efficient; more details  Topic14.

Other Material on SVMs Support Vector Machines in Rapid Miner to-some-good-svm-tutorial to-some-good-svm-tutorial Adaboost/SVM Relationship Lecture: Eick: Support Vector Machines: The Main Ideas

Summary Support Vector Machines Support vector machines learn hyperplanes that separate two classes maximizing the margin between them (the empty space between the instances of the two classes). Support vector machines introduce slack variables—in the case that classes are not linear separable—trying to maximize margins while keeping the training error low. The most popular versions of SVMs use non-linear kernel functions and map the attribute space into a higher dimensional space to facilitate finding “good” linear decision boundaries in the modified space. Support vector machines find “margin optimal” hyperplanes by solving a convex quadratic optimization problem. However, this optimization process is quite slow and support vector machines tend to fail if the number of examples goes beyond 500/5000/50000… In general, support vector machines accomplish quite high accuracies, if compared to other techniques. In the last 10 years, support vector machines have been generalized for other tasks such as regression, PCA, outlier detection,… Eick: Support Vector Machines: The Main Ideas

19 Kernels—What can they do for you? Some machine learning/statistical problems only depend on the dot- product of the objects in the dataset O={x 1,..,x n } and not on other characteristics of the objects in the dataset; in other words, those techniques only depend on the gram matrix of O which stores x 1  x 1, x 1  x 2,…x n  x n ( ( These techniques can be generalized by mapping the dataset into a higher dimensional space as long as the non-linear mapping  can be kernelized; that is, a kernel function K can be found such that: K(u,v)=  (u)  (v) In this case the results are computed in the mapped space based on K(x 1,x 1), K(x 1,x 2 ),…,K(x n,x n ) which is called the kernel trick: Kernels have been successfully used to generalize PCA, K-means, support vector machines, and many other techniques, allowing them to use non-linear coordinate systems, more complex decision boundaries, or more complex cluster boundaries. We will revisit kernels later discussing transparencies 13-25, of the Vasconcelos lecture. Eick: Support Vector Machines: The Main Ideas