Support Vector Machines in Marketing Georgi Nalbantov MICC, Maastricht University.

Slides:



Advertisements
Similar presentations
Introduction to Support Vector Machines (SVM)
Advertisements

Support Vector Machine
Lecture 9 Support Vector Machines
ECG Signal processing (2)
SVM - Support Vector Machines A new classification method for both linear and nonlinear data It uses a nonlinear mapping to transform the original training.
An Introduction of Support Vector Machine
Classification / Regression Support Vector Machines
Support Vector Machines Instructor Max Welling ICS273A UCIrvine.
Pattern Recognition and Machine Learning
An Introduction of Support Vector Machine
Support Vector Machines
1 Lecture 5 Support Vector Machines Large-margin linear classifier Non-separable case The Kernel trick.
SVM—Support Vector Machines
Support vector machine
Machine learning continued Image source:
Groundwater 3D Geological Modeling: Solving as Classification Problem with Support Vector Machine A. Smirnoff, E. Boisvert, S. J.Paradis Earth Sciences.
Support Vector Machines
Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?
University of Texas at Austin Machine Learning Group Department of Computer Sciences University of Texas at Austin Support Vector Machines.
Support Vector Classification (Linearly Separable Case, Primal) The hyperplanethat solves the minimization problem: realizes the maximal margin hyperplane.
Support Vector Machines Based on Burges (1998), Scholkopf (1998), Cristianini and Shawe-Taylor (2000), and Hastie et al. (2001) David Madigan.
Classification Problem 2-Category Linearly Separable Case A- A+ Malignant Benign.
Support Vector Machines
CS 4700: Foundations of Artificial Intelligence
1 Computational Learning Theory and Kernel Methods Tianyi Jiang March 8, 2004.
Lecture outline Support vector machines. Support Vector Machines Find a linear hyperplane (decision boundary) that will separate the data.
Support Vector Machines
Lecture 10: Support Vector Machines
Theory Simulations Applications Theory Simulations Applications.
Statistical Learning Theory: Classification Using Support Vector Machines John DiMona Some slides based on Prof Andrew Moore at CMU:
Optimization Theory Primal Optimization Problem subject to: Primal Optimal Value:
An Introduction to Support Vector Machines Martin Law.
Ch. Eick: Support Vector Machines: The Main Ideas Reading Material Support Vector Machines: 1.Textbook 2. First 3 columns of Smola/Schönkopf article on.
Support Vector Machines Mei-Chen Yeh 04/20/2010. The Classification Problem Label instances, usually represented by feature vectors, into one of the predefined.
1 SUPPORT VECTOR MACHINES İsmail GÜNEŞ. 2 What is SVM? A new generation learning system. A new generation learning system. Based on recent advances in.
Kernel Methods A B M Shawkat Ali 1 2 Data Mining ¤ DM or KDD (Knowledge Discovery in Databases) Extracting previously unknown, valid, and actionable.
SVM Support Vector Machines Presented by: Anas Assiri Supervisor Prof. Dr. Mohamed Batouche.
Classifiers Given a feature representation for images, how do we learn a model for distinguishing features from different classes? Zebra Non-zebra Decision.
An Introduction to Support Vector Machines (M. Law)
1 Chapter 6. Classification and Prediction Overview Classification algorithms and methods Decision tree induction Bayesian classification Lazy learning.
Kernel Methods: Support Vector Machines Maximum Margin Classifiers and Support Vector Machines.
Sparse Kernel Methods 1 Sparse Kernel Methods for Classification and Regression October 17, 2007 Kyungchul Park SKKU.
컴퓨터 과학부 김명재.  Introduction  Data Preprocessing  Model Selection  Experiments.
An Introduction to Support Vector Machine (SVM)
University of Texas at Austin Machine Learning Group Department of Computer Sciences University of Texas at Austin Support Vector Machines.
1  The Problem: Consider a two class task with ω 1, ω 2   LINEAR CLASSIFIERS.
Supervised Learning. CS583, Bing Liu, UIC 2 An example application An emergency room in a hospital measures 17 variables (e.g., blood pressure, age, etc)
Support vector machine LING 572 Fei Xia Week 8: 2/23/2010 TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A 1.
Support Vector Machines
Chapter 6. Classification and Prediction Classification by decision tree induction Bayesian classification Rule-based classification Classification by.
Chapter 6. Classification and Prediction Classification by decision tree induction Bayesian classification Rule-based classification Classification by.
Kernel Methods: Support Vector Machines Maximum Margin Classifiers and Support Vector Machines.
SVMs in a Nutshell.
Support Vector Machine: An Introduction. (C) by Yu Hen Hu 2 Linear Hyper-plane Classifier For x in the side of o : w T x + b  0; d = +1; For.
Support Vector Regression in Marketing Georgi Nalbantov.
Learning by Loss Minimization. Machine learning: Learn a Function from Examples Function: Examples: – Supervised: – Unsupervised: – Semisuprvised:
An Introduction of Support Vector Machine In part from of Jinwei Gu.
A Brief Introduction to Support Vector Machine (SVM) Most slides were from Prof. A. W. Moore, School of Computer Science, Carnegie Mellon University.
Day 17: Duality and Nonlinear SVM Kristin P. Bennett Mathematical Sciences Department Rensselaer Polytechnic Institute.
Support Vector Machines Reading: Textbook, Chapter 5 Ben-Hur and Weston, A User’s Guide to Support Vector Machines (linked from class web page)
Non-separable SVM's, and non-linear classification using kernels Jakob Verbeek December 16, 2011 Course website:
PREDICT 422: Practical Machine Learning
An Introduction to Support Vector Machines
An Introduction to Support Vector Machines
Support Vector Machines Introduction to Data Mining, 2nd Edition by
CSSE463: Image Recognition Day 14
Presenter: Georgi Nalbantov
Support Vector Machines
COSC 4368 Machine Learning Organization
Support Vector Machines 2
Presentation transcript:

Support Vector Machines in Marketing Georgi Nalbantov MICC, Maastricht University

2/20 Contents Purpose Linear Support Vector Machines Nonlinear Support Vector Machines (Theoretical justifications of SVM) Marketing Examples Conclusion and Q & A (some extensions)

3/20 Purpose Task to be solved (The Classification Task): Classify cases (customers) into “type 1” or “type 2” on the basis of some known attributes (characteristics) Chosen tool to solve this task: Support Vector Machines

4/20 The Classification Task Given data on explanatory and explained variables, where the explained variable can take two values {  1 }, find a function that gives the “best” separation between the “-1” cases and the “+1” cases: Given: ( x 1, y 1 ), …, ( x m, y m )   n  {  1 } Find:  :  n  {  1 } “best function” = the expected error on unseen data ( x m+1, y m+1 ), …, ( x m+k, y m+k ) is minimal Existing techniques to solve the classification task: Linear and Quadratic Discriminant Analysis Logit choice models (Logistic Regression) Decision trees, Neural Networks, Least Squares SVM

5/20 Support Vector Machines: Definition Support Vector Machines are a non-parametric tool for classification/regression Support Vector Machines are used for prediction rather than description purposes Support Vector Machines have been developed by Vapnik and co-workers

6/20 Number of art books purchased ∆ buyers ● non-buyers Months since last purchase Linear Support Vector Machines A direct marketing company wants to sell a new book: “The Art History of Florence” Nissan Levin and Jacob Zahavi in Lattin, Carroll and Green (2003). Problem: How to identify buyers and non- buyers using the two variables: Months since last purchase Number of art books purchased ∆ ● ∆ ∆ ● ● ● ● ∆ ∆ ∆ ● ● ● ● ● ● ∆ ∆ ∆

7/20 ∆ buyers ● non-buyers Number of art books purchased Months since last purchase Main idea of SVM: separate groups by a line. However: There are infinitely many lines that have zero training error… … which line shall we choose? Linear SVM: Separable Case ∆ ● ∆ ∆ ● ● ● ● ∆ ∆ ∆ ● ● ●

8/20 SVM use the idea of a margin around the separating line. The thinner the margin, the more complex the model, The best line is the one with the largest margin. ∆ buyers ● non-buyers Number of art books purchased margin Months since last purchase Linear SVM: Separable Case ∆ ● ∆ ∆ ● ● ● ● ∆ ∆ ∆ ● ● ●

9/20 The line having the largest margin is: w 1 x 1 + w 2 x 2 + b = 0 Where x 1 = months since last purchase x 2 = number of art books purchased Note: w 1 x i 1 + w 2 x i 2 + b  +1 for i  ∆ w 1 x j 1 + w 2 x j 2 + b  –1 for j  ● x2x2 x1x1 Months since last purchase Number of art books purchased margin Linear SVM: Separable Case w 1 x 1 + w 2 x 2 + b = 1 w 1 x 1 + w 2 x 2 + b = 0 w 1 x 1 + w 2 x 2 + b = -1 ∆ ● ∆ ∆ ● ● ● ● ∆ ∆ ∆ ● ● ●

10/20 The width of the margin is given by: Note: maximize the margin minimize Linear SVM: Separable Case x2x2 x1x1 Months since last purchase Number of art books purchased w 1 x 1 + w 2 x 2 + b = 1 w 1 x 1 + w 2 x 2 + b = 0 w 1 x 1 + w 2 x 2 + b = -1 margin ∆ ● ∆ ∆ ● ● ● ● ∆ ∆ ∆ ● ● ●

11/20 The optimization problem for SVM is: subject to: w 1 x i 1 + w 2 x i 2 + b  +1 for i  ∆ w 1 x j 1 + w 2 x j 2 + b  –1 for j  ● x2x2 x1x1 maximize the margin minimize Linear SVM: Separable Case margin ∆ ● ∆ ∆ ● ● ● ● ∆ ∆ ∆ ● ● ●

12/20 “Support vectors” are those points that lie on the boundaries of the margin The decision surface (line) is determined only by the support vectors. All other points are irrelevant x2x2 x1x1 “Support vectors” Linear SVM: Separable Case ∆ ● ∆ ∆ ● ● ● ● ∆ ∆ ∆ ● ● ●

13/20 Non-separable case: there is no line separating errorlessly the two groups Here, SVM minimize L(w,C) : subject to: w 1 x i 1 + w 2 x i 2 + b  +1 –  i for i  ∆ w 1 x j 1 + w 2 x j 2 + b  –1 +  i for j  ●  I,j  0 x2x2 x1x1 ∆ buyers ● non-buyers Training set: 1000 targeted customers maximize the margin minimize the training errors L(w,C) = Complexity + Errors Linear SVM: Nonseparable Case w 1 x 1 + w 2 x 2 + b = 1 ∆ ● ∆ ∆ ● ● ● ● ∆ ∆ ∆ ● ● ● ● ● ● ∆ ∆ ∆

14/20 C = 5 x2x2 x1x1 Bigger C ( thinner margin ) smaller number errors ( better fit on the data ) increased complexity Smaller C ( wider margin ) bigger number errors ( worse fit on the data ) decreased complexity Linear SVM: The Role of C ∆ ∆ ● ∆ ∆ ∆ ● ● ● ● x2x2 x1x1 C = 1∆ ● ∆ ∆ ∆ ● ● ● ● ∆ Vary both complexity and empirical error via C … by affecting the optimal w and optimal number of training errors

15/20 Mapping into a higher-dimensional space Optimization task: minimize L(w,C) subject to: ∆ ● x2x2 x1x1 Nonlinear SVM: Nonseparable Case ∆ ● ∆ ∆ ● ● ● ● ∆ ∆ ∆ ● ● ● ● ● ● ∆ ∆ ∆

16/20 Nonlinear SVM: Nonseparable Case  Map the data into higher-dimensional space:  2  3 (1,-1) x2x2 (1,1)(-1,1) (-1,-1) ∆ ∆● ● x1x1 ∆ ∆ ● ● ● ∆

17/20 Nonlinear SVM: Nonseparable Case  Find the optimal hyperplane in the transformed space (1,-1) x2x2 (1,1)(-1,1) (-1,-1) ∆ ∆● ● x1x1 ∆ ∆ ● ● ∆ ●

18/20 Nonlinear SVM: Nonseparable Case  Observe the decision surface in the original space (optional) x2x2 ∆ ∆● ● x1x1 ∆ ∆ ● ● ∆ ●

19/20 Nonlinear SVM: Nonseparable Case  Dual formulation of the (primal) SVM minimization problem PrimalDual Subject to

20/20 Nonlinear SVM: Nonseparable Case  Dual formulation of the (primal) SVM minimization problem Dual (kernel function) Subject to

21/20 Nonlinear SVM: Nonseparable Case  Dual formulation of the (primal) SVM minimization problem Dual Subject to (kernel function)

22/20 Strengths of SVM: Training is relatively easy No local minima It scales relatively well to high dimensional data Trade-off between classifier complexity and error can be controlled explicitly via C Robustness of the results The “curse of dimensionality” is avoided Weaknesses of SVM: What is the best trade-off parameter C ? Need a good transformation of the original space Strengths and Weaknesses of SVM

23/20 The Ketchup Marketing Problem Two types of ketchup: Heinz and Hunts Seven Attributes Feature Heinz Feature Hunts Display Heinz Display Hunts Feature&Display Heinz Feature&Display Hunts Log price difference between Heinz and Hunts Training Data: 2498 cases (89.11% Heinz is chosen) Test Data: 300 cases (88.33% Heinz is chosen)

24/20 C σ Cross-validation mean squared errors, SVM with RBF kernel minmax Do (5-fold ) cross-validation procedure to find the best combination of the manually adjustable parameters (here: C and σ) The Ketchup Marketing Problem Choose a kernel mapping: Linear kernel Polynomial kernel RBF kernel

25/20 Model Linear Discriminant Analysis The Ketchup Marketing Problem – Training Set Heinz Predicted Group MembershipTotal HuntsHeinzHit Rate OriginalCountHunts % Heinz %Hunts25.00%75.00%100.00% Heinz2.61%97.39%100.00%

26/20 Model Logit Choice Model The Ketchup Marketing Problem – Training Set Heinz Predicted Group MembershipTotal HuntsHeinzHit Rate OriginalCountHunts % Heinz %Hunts78.68%21.32%100.00% Heinz22.33%77.67%100.00%

27/20 Model Support Vector Machines The Ketchup Marketing Problem – Training Set Heinz Predicted Group MembershipTotal HuntsHeinzHit Rate OriginalCountHunts % Heinz %Hunts93.75%6.25%100.00% Heinz0.27%99.73%100.00%

28/20 Model Majority Voting The Ketchup Marketing Problem – Training Set Heinz Predicted Group MembershipTotal HuntsHeinzHit Rate OriginalCountHunts % Heinz02226 %Hunts0%100%100.00% Heinz0%100%100.00%

29/20 Model Linear Discriminant Analysis The Ketchup Marketing Problem – Test Set Heinz Predicted Group MembershipTotal HuntsHeinzHit Rate OriginalCountHunts % Heinz %Hunts8.57%91.43%100.00% Heinz1.13%98.87%100.00%

30/20 Model Logit Choice Model The Ketchup Marketing Problem – Test Set Heinz Predicted Group MembershipTotal HuntsHeinzHit Rate OriginalCountHunts % Heinz %Hunts82.86%17.14%100.00% Heinz23.77%76.23%100.00%

31/20 Model Support Vector Machines The Ketchup Marketing Problem – Test Set Heinz Predicted Group MembershipTotal HuntsHeinzHit Rate OriginalCountHunts % Heinz %Hunts71.43%28.57%100.00% Heinz1.13%98.87%100.00%

32/20 Conclusion Support Vector Machines (SVM) can be applied in the binary and multi-class classification problems SVM behave robustly in multivariate problems Further research in various Marketing areas is needed to justify or refute the applicability of SVM Support Vector Regressions (SVR) can also be applied