Lecture notes for Stat 231: Pattern Recognition and Machine Learning 1. Stat 231. A.L. Yuille. Fall 2004 Practical Issues with SVM. Handwritten Digits:

Slides:



Advertisements
Similar presentations
Introduction to Support Vector Machines (SVM)
Advertisements

ECG Signal processing (2)
Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?
SVM - Support Vector Machines A new classification method for both linear and nonlinear data It uses a nonlinear mapping to transform the original training.
Lecture notes for Stat 231: Pattern Recognition and Machine Learning 1. Stat 231. A.L. Yuille. Fall 2004 PAC Learning and Generalizability. Margin Errors.
An Introduction of Support Vector Machine
Support Vector Machines
SVM—Support Vector Machines
Search Engines Information Retrieval in Practice All slides ©Addison Wesley, 2008.
Machine learning continued Image source:
CSCI 347 / CS 4206: Data Mining Module 07: Implementations Topic 03: Linear Models.
SVMs Reprised. Administrivia I’m out of town Mar 1-3 May have guest lecturer May cancel class Will let you know more when I do...
Face Recognition & Biometric Systems Support Vector Machines (part 2)
Thesis title: “Studies in Pattern Classification – Biological Modeling, Uncertainty Reasoning, and Statistical Learning” 3 parts: (1)Handwritten Digit.
Discriminative and generative methods for bags of features
Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?
Support Vector Machines (SVMs) Chapter 5 (Duda et al.)
University of Texas at Austin Machine Learning Group Department of Computer Sciences University of Texas at Austin Support Vector Machines.
Support Vector Machines Kernel Machines
Sketched Derivation of error bound using VC-dimension (1) Bound our usual PAC expression by the probability that an algorithm has 0 error on the training.
Support Vector Machines
SVMs Finalized. Where we are Last time Support vector machines in grungy detail The SVM objective function and QP Today Last details on SVMs Putting it.
Data mining and statistical learning - lecture 13 Separating hyperplane.
SVMs Reprised Reading: Bishop, Sec 4.1.1, 6.0, 6.1, 7.0, 7.1.
Support Vector Machines
What is Learning All about ?  Get knowledge of by study, experience, or being taught  Become aware by information or from observation  Commit to memory.
SVM (Support Vector Machines) Base on statistical learning theory choose the kernel before the learning process.
Greg GrudicIntro AI1 Support Vector Machine (SVM) Classification Greg Grudic.
Statistical Learning Theory: Classification Using Support Vector Machines John DiMona Some slides based on Prof Andrew Moore at CMU:
SVMs, cont’d Intro to Bayesian learning. Quadratic programming Problems of the form Minimize: Subject to: are called “quadratic programming” problems.
An Introduction to Support Vector Machines Martin Law.
This week: overview on pattern recognition (related to machine learning)
Based on: The Nature of Statistical Learning Theory by V. Vapnick 2009 Presentation by John DiMona and some slides based on lectures given by Professor.
Support Vector Machine & Image Classification Applications
CS 8751 ML & KDDSupport Vector Machines1 Support Vector Machines (SVMs) Learning mechanism based on linear programming Chooses a separating plane based.
CSE 473/573 Computer Vision and Image Processing (CVIP) Ifeoma Nwogu Lecture 24 – Classifiers 1.
Lecture note for Stat 231: Pattern Recognition and Machine Learning 4. Maximum Likelihood Prof. A.L. Yuille Stat 231. Fall 2004.
ECE 8443 – Pattern Recognition Objectives: Error Bounds Complexity Theory PAC Learning PAC Bound Margin Classifiers Resources: D.M.: Simplified PAC-Bayes.
计算机学院 计算感知 Support Vector Machines. 2 University of Texas at Austin Machine Learning Group 计算感知 计算机学院 Perceptron Revisited: Linear Separators Binary classification.
Machine Learning Using Support Vector Machines (Paper Review) Presented to: Prof. Dr. Mohamed Batouche Prepared By: Asma B. Al-Saleh Amani A. Al-Ajlan.
Kernel Methods A B M Shawkat Ali 1 2 Data Mining ¤ DM or KDD (Knowledge Discovery in Databases) Extracting previously unknown, valid, and actionable.
Yang, Luyu.  Postal service for sorting mails by the postal code written on the envelop  Bank system for processing checks by reading the amount of.
SVM Support Vector Machines Presented by: Anas Assiri Supervisor Prof. Dr. Mohamed Batouche.
Classifiers Given a feature representation for images, how do we learn a model for distinguishing features from different classes? Zebra Non-zebra Decision.
An Introduction to Support Vector Machines (M. Law)
START OF DAY 5 Reading: Chap. 8. Support Vector Machine.
CS 478 – Tools for Machine Learning and Data Mining SVM.
Sparse Kernel Methods 1 Sparse Kernel Methods for Classification and Regression October 17, 2007 Kyungchul Park SKKU.
University of Texas at Austin Machine Learning Group Department of Computer Sciences University of Texas at Austin Support Vector Machines.
1  The Problem: Consider a two class task with ω 1, ω 2   LINEAR CLASSIFIERS.
Support Vector Machines and Gene Function Prediction Brown et al PNAS. CS 466 Saurabh Sinha.
1  Problem: Consider a two class task with ω 1, ω 2   LINEAR CLASSIFIERS.
Support Vector Machines. Notation Assume a binary classification problem. –Instances are represented by vector x   n. –Training examples: x = (x 1,
Support Vector Machines (SVM): A Tool for Machine Learning Yixin Chen Ph.D Candidate, CSE 1/10/2002.
Virtual Examples for Text Classification with Support Vector Machines Manabu Sassano Proceedings of the 2003 Conference on Emprical Methods in Natural.
Lecture notes for Stat 231: Pattern Recognition and Machine Learning 1. Stat 231. A.L. Yuille. Fall 2004 Linear Separation and Margins. Non-Separable and.
Support Vector Machines Reading: Ben-Hur and Weston, “A User’s Guide to Support Vector Machines” (linked from class web page)
Greg GrudicIntro AI1 Support Vector Machine (SVM) Classification Greg Grudic.
Finding Clusters within a Class to Improve Classification Accuracy Literature Survey Yong Jae Lee 3/6/08.
Introduction to Machine Learning Prof. Nir Ailon Lecture 5: Support Vector Machines (SVM)
LECTURE 20: SUPPORT VECTOR MACHINES PT. 1 April 11, 2016 SDS 293 Machine Learning.
Support Vector Machines Reading: Textbook, Chapter 5 Ben-Hur and Weston, A User’s Guide to Support Vector Machines (linked from class web page)
Support Vector Machines (SVMs) Chapter 5 (Duda et al.) CS479/679 Pattern Recognition Dr. George Bebis.
PREDICT 422: Practical Machine Learning
Support Vector Machine
Support Vector Machines
CS 2750: Machine Learning Support Vector Machines
network of simple neuron-like computing elements
Support Vector Machines
Presentation transcript:

Lecture notes for Stat 231: Pattern Recognition and Machine Learning 1. Stat 231. A.L. Yuille. Fall 2004 Practical Issues with SVM. Handwritten Digits: US Post Office, MNIST Datasets. No Handout. For people seriously interested in this material see Learning with Kernels by B. Schoelkopf and A.J. Smola. MIT Press

Lecture notes for Stat 231: Pattern Recognition and Machine Learning 2. Practical SVM Support Vector Machines first showed major success for the task of handwritten digit/character recognition. US Post-Office database. MNIST database. Issues with real problems: (1) Multiclassification – not just yes/no. (2) Large datasets – quadratic programming impractical. (3) Invariance in data. Prior Knowledge. (4) Which kernels? When are kernels generalizing?

Lecture notes for Stat 231: Pattern Recognition and Machine Learning 3. Multiclassification. Two solutions for M classes. (A): One versus Rest. For each class, i =1,...,M construct a binary classifier where (n no. of data samples). Classify Comment: simple, but heuristic.

Lecture notes for Stat 231: Pattern Recognition and Machine Learning 4. Multiclassification (B): Hyperplanes for each class label Data and Slack Variables: Quadratic Programming:

Lecture notes for Stat 231: Pattern Recognition and Machine Learning 5. Multiclass and Data Size Empirically, methods A and B give similar quality results. Method (B) is more attractive. But the solution is more computationally intensive. This leads to issue: (2) Large Datasets. The Quadratic Programming problem is most easily formulated in terms of the dual For large datasets n is enormous. Quadratic Programming is computationally expensive.

Lecture notes for Stat 231: Pattern Recognition and Machine Learning 6. Large Datasets Chunking is the favored solution. Observe that the will be non-zero only for the support vectors. “Chunk” the training data into k sets of size n/k. Train on these k sets and keep the support vectors for these sets. Then train on the combined support vectors of the k sets. Note: need to check the original data to make sure that it is correctly classified. If not, add more support vectors.

Lecture notes for Stat 231: Pattern Recognition and Machine Learning 7. Large Datasets. Chunking is successful and computationally efficient provided the number of support vectors is small. This happens if there is a hyperplane/hypersurface with large margin separating the classes. It is harder if data from the classes overlap – e.g. when there are a large number of data points which need non-zero slack variables (i.e. support vectors). In either case, more support vectors are needed for the combined multiclass, case (B), than for the heuristic (A). Note: other approximate methods for when chunking fails.

Lecture notes for Stat 231: Pattern Recognition and Machine Learning 8. Invariances and Priors (3) Invariances in the Data. Recognizing handwritten digits. The classifier should be insensitive to small changes to the data. For example, small rotations, and small translations.

Lecture notes for Stat 231: Pattern Recognition and Machine Learning 9. Invariances and Priors Virtual Support Vectors (VSV). Strategy: (i) Train on the original dataset to get support vectors. (ii) Generate artificial examples by applying the transformations to the support vectors. (iii) Train again on the “virtual examples” generated by (ii).

Lecture notes for Stat 231: Pattern Recognition and Machine Learning 10. Virtual Support Vectors

Lecture notes for Stat 231: Pattern Recognition and Machine Learning 11. Invariances and Priors Other methods include: (i) Hand-designing features which are invariant to the problem. (ii) Training on virtual examples, before constructing support vectors. (Computationally expensive). (iii) Designing criteria allowing for data transformations. (iv) Learning features which are invariants (TPA.) In general, it is best to select your input features using as much prior knowledge as you have about the problem.

Lecture notes for Stat 231: Pattern Recognition and Machine Learning 12. MNIST Results MNIST dataset of handwritten digits. Summary of Results: page 341 S.S. Best classifier uses a polynomial 8 VSV means 8 invariance samples per data. (1 pixel translation, plus rotation). MNIST Dataset has 600,000 handwritten digits. LeNet is a multilayer network with special training plus boosting,

Lecture notes for Stat 231: Pattern Recognition and Machine Learning 13. MNIST Results

Lecture notes for Stat 231: Pattern Recognition and Machine Learning 14. Summary. Applying SVM’s to real problems requires: Multiclass: Method (A) One-versus-Rest, (B) Full solutions. Computational practically – Chunking by dividing dataset into subsets, and using the support vectors from each set. Invariance – generate new samples by apply translations to support vectors to generate virtual support vectors. Very successful on the DNIST and US Postal Office datasets. Simpler than the LeNet approach (closest rival.)