Data mining and statistical learning - lecture 13 Separating hyperplane.

Slides:



Advertisements
Similar presentations
Generative Models Thus far we have essentially considered techniques that perform classification indirectly by modeling the training data, optimizing.
Advertisements

SVM - Support Vector Machines A new classification method for both linear and nonlinear data It uses a nonlinear mapping to transform the original training.
Lectures 17,18 – Boosting and Additive Trees Rice ECE697 Farinaz Koushanfar Fall 2006.
Data Mining Classification: Alternative Techniques
Ensemble Methods An ensemble method constructs a set of base classifiers from the training data Ensemble or Classifier Combination Predict class label.
Data Mining Classification: Alternative Techniques
Pattern Recognition and Machine Learning
Support Vector Machines
1 Lecture 5 Support Vector Machines Large-margin linear classifier Non-separable case The Kernel trick.
CSCI 347 / CS 4206: Data Mining Module 07: Implementations Topic 03: Linear Models.
Model Assessment, Selection and Averaging
Model assessment and cross-validation - overview
Chapter 4: Linear Models for Classification
Support Vector Machines
Support Vector Machines (and Kernel Methods in general)
Support Vector Machines and Kernel Methods
Support Vector Machines (SVMs) Chapter 5 (Duda et al.)
Machine Learning CUNY Graduate Center Lecture 3: Linear Regression.
Data mining and statistical learning - lecture 12 Neural networks (NN) and Multivariate Adaptive Regression Splines (MARS)  Different types of neural.
Statistical Learning Theory: Classification Using Support Vector Machines John DiMona Some slides based on Prof Andrew Moore at CMU:
Data mining and statistical learning - lecture 11 Neural networks - a model class providing a joint framework for prediction and classification  Relationship.
Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)
An Introduction to Support Vector Machines Martin Law.
Ch. Eick: Support Vector Machines: The Main Ideas Reading Material Support Vector Machines: 1.Textbook 2. First 3 columns of Smola/Schönkopf article on.
Machine Learning CUNY Graduate Center Lecture 3: Linear Regression.
Outline Separating Hyperplanes – Separable Case
Based on: The Nature of Statistical Learning Theory by V. Vapnick 2009 Presentation by John DiMona and some slides based on lectures given by Professor.
Chapter 10 Boosting May 6, Outline Adaboost Ensemble point-view of Boosting Boosting Trees Supervised Learning Methods.
Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.
Support Vector Machine & Image Classification Applications
CS 8751 ML & KDDSupport Vector Machines1 Support Vector Machines (SVMs) Learning mechanism based on linear programming Chooses a separating plane based.
Machine Learning Seminar: Support Vector Regression Presented by: Heng Ji 10/08/03.
LOGO Ensemble Learning Lecturer: Dr. Bo Yuan
1 SUPPORT VECTOR MACHINES İsmail GÜNEŞ. 2 What is SVM? A new generation learning system. A new generation learning system. Based on recent advances in.
SVM Support Vector Machines Presented by: Anas Assiri Supervisor Prof. Dr. Mohamed Batouche.
An Introduction to Support Vector Machines (M. Law)
Ensemble Learning Spring 2009 Ben-Gurion University of the Negev.
CLASSIFICATION: Ensemble Methods
BAGGING ALGORITHM, ONLINE BOOSTING AND VISION Se – Hoon Park.
1 Chapter 6. Classification and Prediction Overview Classification algorithms and methods Decision tree induction Bayesian classification Lazy learning.
Christopher M. Bishop, Pattern Recognition and Machine Learning.
START OF DAY 5 Reading: Chap. 8. Support Vector Machine.
RSVM: Reduced Support Vector Machines Y.-J. Lee & O. L. Mangasarian First SIAM International Conference on Data Mining Chicago, April 6, 2001 University.
Linear Models for Classification
1  The Problem: Consider a two class task with ω 1, ω 2   LINEAR CLASSIFIERS.
Chapter1: Introduction Chapter2: Overview of Supervised Learning
Konstantina Christakopoulou Liang Zeng Group G21
Final Exam Review CS479/679 Pattern Recognition Dr. George Bebis 1.
Random Forests Ujjwol Subedi. Introduction What is Random Tree? ◦ Is a tree constructed randomly from a set of possible trees having K random features.
Classification Ensemble Methods 1
Text Classification using Support Vector Machine Debapriyo Majumdar Information Retrieval – Spring 2015 Indian Statistical Institute Kolkata.
Classification and Prediction: Ensemble Methods Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.
Computational Intelligence: Methods and Applications Lecture 21 Linear discrimination, linear machines Włodzisław Duch Dept. of Informatics, UMK Google:
Competition II: Springleaf Sha Li (Team leader) Xiaoyan Chong, Minglu Ma, Yue Wang CAMCOS Fall 2015 San Jose State University.
Machine Learning CUNY Graduate Center Lecture 6: Linear Regression II.
Giansalvo EXIN Cirrincione unit #4 Single-layer networks They directly compute linear discriminant functions using the TS without need of determining.
SUPERVISED AND UNSUPERVISED LEARNING Presentation by Ege Saygıner CENG 784.
Tree and Forest Classification and Regression Tree Bagging of trees Boosting trees Random Forest.
A Brief Introduction to Support Vector Machine (SVM) Most slides were from Prof. A. W. Moore, School of Computer Science, Carnegie Mellon University.
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 1: INTRODUCTION.
Support Vector Machines (SVMs) Chapter 5 (Duda et al.) CS479/679 Pattern Recognition Dr. George Bebis.
Combining Models Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya.
CS 9633 Machine Learning Support Vector Machines
Boosting and Additive Trees (2)
Support Vector Machines (SVM)
Support Vector Machines
Classification and Prediction
Support Vector Machines 2
Advisor: Dr.vahidipour Zahra salimian Shaghayegh jalali Dec 2017
Presentation transcript:

Data mining and statistical learning - lecture 13 Separating hyperplane

Data mining and statistical learning - lecture 13 Optimal separating hyperplane - support vector classifier margin Find the hyperplane that creates the biggest margin between the training points for class 1 and -1

Data mining and statistical learning - lecture 13 Formulation of the optimization problem Signed distance to decision border y=1 for one of the groups and y=-1 for the other one

Data mining and statistical learning - lecture 13 Two equivalent formulations of the optimization problem

Data mining and statistical learning - lecture 13 Optimal separating hyperplane – overlapping classes Find the hyperplane that creates the biggest margin subject to 11 22 33

Data mining and statistical learning - lecture 13 Characteristics of the support vector classifier Points well inside their class boundary do not play a big role in the shaping of the decision border Cf. linear discriminant analysis (LDA) for which the decision boundary is determined by the covariance matrix of the class distributions and their centroids

Data mining and statistical learning - lecture 13 Support vector machines using basis expansions (polynomials, splines)

Data mining and statistical learning - lecture 13 Characteristics of support vector machines The dimension of the enlarged feature space can be very large Overfitting is prevented by a built-in shrinkage of beta coefficients Irrelevant inputs can create serious problems

Data mining and statistical learning - lecture 13 The SVM as a penalization method Misclassification: f(x) 0 when y=-1 Loss function: Loss function + penalty:

Data mining and statistical learning - lecture 13 The SVM as a penalization method Minimizing the loss function + penalty is equivalent to fitting a support vector machine to data The penalty factor is a function of the constant providing an upper bound of

Data mining and statistical learning - lecture 13 Some characteristics of different learning methods CharacteristicNeural networks Support vector machines TreesMARS Natural handling of data of “mixed” typePoor Good Handling of missing valuesPoor Good Robustness to outliers in input spacePoor GoodPoor Insensitive to monotone transformations of inputs Poor GoodPoor Computational scalability (large N)Poor Good Ability to deal with irrelevant inputsPoor Good Ability to extract linear combinations of featuresGood Poor InterpretabilityPoor FairGood Predictive powerGood PoorFair

Data mining and statistical learning - lecture 13  -insensitive error function --

Data mining and statistical learning - lecture 13 SVMs for linear regression Estimate the regression coefficients by minimizing (i) The fitting is less sensitive than OLS to outliers (ii) Errors of size less than  are ignored (iii) Typically, the parameter estimates are functions of only a minor subset of the observations

Data mining and statistical learning - lecture 13 Ensemble methods  Bootstrapping (Chapter 8)  Bagging (Chapter 8)  Boosting (Chapter 10)  Bagging and boosting in SAS EM

Data mining and statistical learning - lecture 13 Major types of ensemble methods Manipulation of the model Manipulation of the data set

Data mining and statistical learning - lecture 13 Terminology  Bagging=Manipulation of the data set  Boosting = Manipulation of the model

Data mining and statistical learning - lecture 13 The bootstrap We would like to determine a functional F(P) of an unknown probability distribution P The bootstrap: Compute F(P*) where P* is an approximation of P

Data mining and statistical learning - lecture 13 Resampling techniques - the bootstrap method Sampling with replacement Resampled data Observed data

Data mining and statistical learning - lecture 13 The bootstrap for assessing the accuracy of an estimate or prediction Compute Bootstrap samples are generated by sampling with replacement from the observed data 1.Generate N bootstrap samples and compute 2. Compute the sample variance of T k

Data mining and statistical learning - lecture 13 Bagging - using the bootstrap to improve a prediction Question: Given the model Y=f(X)+ε and a set of observed values Z={Y i, X i, i=1,…,N}, what is where P denotes the distribution of (X,Y)? Solution: Replace P with P*: Produce B bootstrap samples and, for each sample, compute Compute the sample mean by averaging over the bootstrap functions.

Data mining and statistical learning - lecture 13 Bagging Formula: Construct graphs, compute average

Data mining and statistical learning - lecture 13 Properties of bagging  Bagging of fitted functions reduces the variance  Bagging makes good predictions better, bad predictions worse  If the fitted function is linear, it will asymptotically coincide with the bagged estimate (B -> Infinity)

Data mining and statistical learning - lecture 13 Bagging for classification Given a K-class classification problem with Z={Y i, X i, i=1, …, N} and a computed indicator function (or class probabilities) we produce a bagging estimate and predict class variables

Data mining and statistical learning - lecture 13 Boosting - basic idea Consider a 2-class problem with and a classifier. Produce a sequence of classifiers and combine them. The weights for misclassified observations are increased to force the algorithm to classify them correctly at next step.

Data mining and statistical learning - lecture 13 Boosting

Data mining and statistical learning - lecture 13 Boosting

Data mining and statistical learning - lecture 13 Boosting - comments  Boosting can be modified for regression  AdaBoost.M1 can be modified to handle categorical output

Data mining and statistical learning - lecture 13 Bagging and boosting in EM  Create a diagram (Input node (define target!) – Partition node – Group processing node – Your model – Ensemble node)  Comment: boosting works only for classification (categorical output)

Data mining and statistical learning - lecture 13 Group processing: General Modes: Unweighted resampling for bagging Weighted resampling for boosting

Data mining and statistical learning - lecture 13 Group processing - Unweighted resampling for bagging Specify sample size

Data mining and statistical learning - lecture 13 Group processing: weighted resampling for boosting Specify target

Data mining and statistical learning - lecture 13 Ensemble results