Presentation is loading. Please wait.

Presentation is loading. Please wait.

Biointelligence Laboratory, Seoul National University

Similar presentations


Presentation on theme: "Biointelligence Laboratory, Seoul National University"— Presentation transcript:

1 Biointelligence Laboratory, Seoul National University
Ch 7. Sparse Kernel Machines Pattern Recognition and Machine Learning, C. M. Bishop, 2006. Summarized by S. Kim Biointelligence Laboratory, Seoul National University

2 (C) 2007, SNU Biointelligence Lab, http://bi.snu.ac.kr/
Contents Maximum Margin Classifiers Overlapping Class Distributions Relation to Logistic Regression Multiclass SVMs SVMs for Regression Relevance Vector Machines RVM for Regression Analysis of Sparsity RVMs for Classification (C) 2007, SNU Biointelligence Lab, 

3 Maximum Margin Classifiers
Problem settings Two-class classification using linear models Assume that training data set is linearly separable Support vector machine approaches The decision boundary is chosen to be the one for which the margin is maximized support vectors (C) 2007, SNU Biointelligence Lab, 

4 Maximum Margin Solution
For all data points, The distance of a point to the decision surface The maximum margin solution (C) 2007, SNU Biointelligence Lab, 

5 (C) 2007, SNU Biointelligence Lab, http://bi.snu.ac.kr/
Dual Representation Introducing Lagrange multipliers, Min. points satisfy the derivatives of L w.r.t. w and b equal 0 Dual representation Find Appendix E for more details (C) 2007, SNU Biointelligence Lab, 

6 (C) 2007, SNU Biointelligence Lab, http://bi.snu.ac.kr/
Classifying New Data Optimization subjects to Found by solving a quadratic programming problem Karush-Kuhn-Tucker (KKT) Conditions  Appendix E or : support vectors O(N3) (C) 2007, SNU Biointelligence Lab, 

7 Example of Separable Data Classification
Figure 7.2 (C) 2007, SNU Biointelligence Lab, 

8 Overlapping Class Distributions
Allow some misclassified examples  soft margin Introduce slack variables (C) 2007, SNU Biointelligence Lab, 

9 (C) 2007, SNU Biointelligence Lab, http://bi.snu.ac.kr/
Soft Margin Solution Minimize KKT conditions: : trade-off between minimizing training errors and controlling model complexity or (C) 2007, SNU Biointelligence Lab,  : support vectors

10 (C) 2007, SNU Biointelligence Lab, http://bi.snu.ac.kr/
Dual Representation Dual representation Classifying new data and obtaining b ( hard margin classifiers) (C) 2007, SNU Biointelligence Lab, 

11 Alternative Formulation
v-SVM (Schölkopf et al., 2000) - Upper bound on the fraction of margin errors - Lower bound on the fraction of support vectors (C) 2007, SNU Biointelligence Lab, 

12 Example of Nonseparable Data Classification (v-SVM)
(C) 2007, SNU Biointelligence Lab, 

13 Solutions of the QP Problem
Chunking (Vapnik, 1982) Idea: the value of Lagrangian is unchanged if we remove the rows and columns of the kernel matrix corresponding to Lagrange multipliers that have value zero Protected conjugate gradients (Burges, 1998) Decomposition methods (Osuna et al., 1996) Sequential minimal optimization (Platt, 1999) (C) 2007, SNU Biointelligence Lab, 

14 Relation to Logistic Regression (Section 4.3.2)
For data points on the correct side, For the remaining points, : hinge error function (C) 2007, SNU Biointelligence Lab, 

15 Relation to Logistic Regression (Cont’d)
From maximum likelihood logistic regression Error function with a quadratic regularizer (C) 2007, SNU Biointelligence Lab, 

16 Comparison of Error Functions
Hinge error function Error function for logistic regression Misclassification error Squared error (C) 2007, SNU Biointelligence Lab, 

17 (C) 2007, SNU Biointelligence Lab, http://bi.snu.ac.kr/
Multiclass SVMs One-versus-the-rest: K separate SVMs Can lead inconsistent results (Figure 4.2) Imbalanced training sets Positive class: +1, negative class: -1/(K-1) (Lee et al., 2001) An objective function for training all SVMs simultaneously (Weston and Watkins, 1999) One-versus-one: K(K-1)/2 SVMs Based on error-correcting output codes (Allwein et al., 2000) Generalization of the voting scheme of the one-versus-one (C) 2007, SNU Biointelligence Lab, 

18 (C) 2007, SNU Biointelligence Lab, http://bi.snu.ac.kr/
SVMs for Regression Simple linear regression: minimize ε-insensitive error function ε-insensitive error function quadratic error function (C) 2007, SNU Biointelligence Lab, 

19 SVMs for Regression (Cont’d)
Minimize (C) 2007, SNU Biointelligence Lab, 

20 (C) 2007, SNU Biointelligence Lab, http://bi.snu.ac.kr/
Dual Problem (C) 2007, SNU Biointelligence Lab, 

21 (C) 2007, SNU Biointelligence Lab, http://bi.snu.ac.kr/
Predictions KKT conditions: (from derivatives of the Lagrangian) (C) 2007, SNU Biointelligence Lab, 

22 Alternative Formulation
v-SVM (Schölkopf et al., 2000) fraction of points lying outside the tube (C) 2007, SNU Biointelligence Lab, 

23 Example of v-SVM Regression
(C) 2007, SNU Biointelligence Lab, 

24 Relevance Vector Machines
SVM Outputs are decisions rather than posterior probabilities The extension to K>2 classes is problematic There is a complexity parameter C Kernel functions are centered on training data points and required to be positive definite RVM Bayesian sparse kernel technique Much sparser models Faster performance on test data (C) 2007, SNU Biointelligence Lab, 

25 (C) 2007, SNU Biointelligence Lab, http://bi.snu.ac.kr/
RVM for Regression RVM is a linear form in Chapter 3 with a modified prior (C) 2007, SNU Biointelligence Lab, 

26 RVM for Regression (Cont’d)
From the result (3.49) for linear regression models α and β are determined using evidence approximation (type-2 maximum likelihood) (Section 3.5) Maximize (C) 2007, SNU Biointelligence Lab, 

27 RVM for Regression (Cont’d)
Two approaches By derivatives of marginal likelihood EM algorithm  Section 9.3.4 Predictive distribution Section 3.3.2 (C) 2007, SNU Biointelligence Lab, 

28 Example of RVM Regression
More compact than SVM Parameters are determined automatically Require more training time than SVM RVM regression v-SVM regression (C) 2007, SNU Biointelligence Lab, 

29 Mechanism for Sparsity
only isotropic noise, α = ∞ a finite value of α (C) 2007, SNU Biointelligence Lab, 

30 (C) 2007, SNU Biointelligence Lab, http://bi.snu.ac.kr/
Sparse Solution Pull out the contribution from αi in Using (C.7), (C.15) in Appendix C (C) 2007, SNU Biointelligence Lab, 

31 Sparse Solution (Cont’d)
For log marginal likelihood function L, Stationary points of the marginal likelihood w.r.t.αi Sparsity: measures the extent to which overlaps with the other basis vectors Quality of : represents a measure of the alignment of the basis vector with the error between t and y-i (C) 2007, SNU Biointelligence Lab, 

32 Sequential Sparse Bayesian Learning Algorithm
Initialize Initialize using , with , with the remaining Evaluate and for all basis functions Select a candidate If ( is already in the model), update If , add to the model, and evaluate If , remove from the model, and set Update Go to 3 until converged (C) 2007, SNU Biointelligence Lab, 

33 RVM for Classification
Probabilistic linear classification model (Chapter 4) with ARD prior - Initialize - Build a Gaussian approximation to the posterior distribution - Obtain an approximation to the marginal likelihood - Maximize the marginal likelihood (re-estimate ) until converged (C) 2007, SNU Biointelligence Lab, 

34 RVM for Classification (Cont’d)
The posterior distribution is obtained by maximizing Iterative reweighted least squares (IRLS) from Section 4.3.3 Resulting Gaussian approximation to the posterior distribution (C) 2007, SNU Biointelligence Lab, 

35 RVM for Classification (Cont’d)
Marginal likelihood using Laplace approximation (Section 4.4) Set the derivative of the marginal likelihood equal to zero, and rearranging then gives If we define , Same in the regression case (C) 2007, SNU Biointelligence Lab, 

36 Example of RVM Classification
(C) 2007, SNU Biointelligence Lab, 


Download ppt "Biointelligence Laboratory, Seoul National University"

Similar presentations


Ads by Google