Presentation is loading. Please wait.

Presentation is loading. Please wait.

ICML2004, Banff, Alberta, Canada Learning Larger Margin Machine Locally and Globally Kaizhu Huang Haiqin Yang, Irwin King, Michael.

Similar presentations


Presentation on theme: "ICML2004, Banff, Alberta, Canada Learning Larger Margin Machine Locally and Globally Kaizhu Huang Haiqin Yang, Irwin King, Michael."— Presentation transcript:

1 ICML2004, Banff, Alberta, Canada Learning Larger Margin Machine Locally and Globally Kaizhu Huang (kzhuang@cse.cuhk.edu.hk) Haiqin Yang, Irwin King, Michael R. Lyu Dept. of Computer Science and Engineering The Chinese University of Hong Kong July 5, 2004 The Chinese University of Hong Kong

2 ICML2004, Banff, Alberta, Canada The Chinese University of Hong Kong Learning Larger Margin Machine Locally and Globally ContributionsBackground: – Linear Binary Classification – Motivation Maxi-Min Margin Machine(M 4 ) –Model Definition –Geometrical Interpretation –Solving Methods –Connections With Other Models –Nonseparable case –Kernelizations Experimental Results Future Work Conclusion

3 ICML2004, Banff, Alberta, Canada The Chinese University of Hong Kong Theory: A unified model of Support Vector Machine (SVM), Minimax Probability Machine (MPM), and Linear Discriminant Analysis (LDA). Practice: A sequential Conic Programming Problem. Contributions

4 ICML2004, Banff, Alberta, Canada The Chinese University of Hong Kong Background: Linear Binary Classification Given two classes of data sampled from x and y, we are trying to find a linear decision plane wT z + b=0, which can correctly discriminate x from y. wT z + b< 0, z is classified as y; wT z + b >0, z is classified as x. w T z + b=0 : decision hyperplane Only partial information is available, we need to choose a criterion to select hyperplanes y x

5 ICML2004, Banff, Alberta, Canada The Chinese University of Hong Kong w T z + b=0 Background: Support Vector Machine Margin Support Vector Machines (SVM): The optimal hyperplane is the one which maximizes the margin between two classes of data Support Vectors The boundary of SVM is exclusively determined by several critical points called support vectors All other points are totally irrelevant with the decision plane SVM discards global information x y

6 ICML2004, Banff, Alberta, Canada Learning Locally and Globally The Chinese University of Hong Kong w T z + b=0 y x Along the dashed axis, y data have a larger data trend than x data. Therefore, a more reasonable hyerplane may lie closer than x data rather than locating itself in the middle of two classes as in SVM. SVM A more reasonable hyperplane Learning Locally and Globally

7 ICML2004, Banff, Alberta, Canada The Chinese University of Hong Kong M 4 : Learning Locally and Globally

8 ICML2004, Banff, Alberta, Canada The Chinese University of Hong Kong M 4 : Geometric Interpretation

9 ICML2004, Banff, Alberta, Canada The Chinese University of Hong Kong M 4 : Solving Method Divide and Conquer: If we fix ρ to a specific ρ n, the problem changes to check whether this ρ n satisfies the following constraints: If yes, we increase ρ n ; otherwise, we decrease it. Second Order Cone Programming Problem!!!

10 ICML2004, Banff, Alberta, Canada The Chinese University of Hong Kong M 4 : Solving Method (Cont ’ ) Iterate the following two Divide and Conquer steps: Sequential Second Order Cone Programming Problem!!!

11 ICML2004, Banff, Alberta, Canada can it satisfy the constraints? Yes No The Chinese University of Hong Kong M 4 : Solving Method (Cont ’ )

12 ICML2004, Banff, Alberta, Canada The Chinese University of Hong Kong M 4 : Links with MPM + Span all the data points and add them together Exactly MPM Optimization Problem!!!

13 ICML2004, Banff, Alberta, Canada The Chinese University of Hong Kong M 4 : Links with MPM (Cont ’ ) MPM M 4 Remarks: The procedure is not reversible: MPM is a special case of M4 MPM focuses on building decision boundary GLOBALLY, i.e., it exclusively depends on the means and covariances. However, means and covariances may not be accurately estimated.

14 ICML2004, Banff, Alberta, Canada If one assumes ∑=I The Chinese University of Hong Kong M 4 : Links with SVM The magnitude of w can scale up without influencing the optimization 1 2 3 4 Support Vector Machines!!! M 4 SVM is the special case of M 4

15 ICML2004, Banff, Alberta, Canada The Chinese University of Hong Kong M 4 : Links with SVM (Cont ’ ) These two assumptions of SVM are inappropriate If one assumes ∑=I Assumption 1 Assumption 2

16 ICML2004, Banff, Alberta, Canada The Chinese University of Hong Kong M 4 : Links with LDA If one assumes ∑x=∑y=(∑ * y+∑ * x)/2 Perform a procedure similar to MPM … LDA

17 ICML2004, Banff, Alberta, Canada The Chinese University of Hong Kong M 4 : Links with LDA (Cont ’ ) Assumption Still inappropriate ? If one assumes ∑x=∑y=(∑ * y+∑ * x)/2

18 ICML2004, Banff, Alberta, Canada The Chinese University of Hong Kong Nonseparable Case Introducing slack variables How to solve?? Line Search+Second Order Cone Programming

19 ICML2004, Banff, Alberta, Canada The Chinese University of Hong Kong Nonlinear Classifier: Kernelization Map data to higher dimensional feature space R f x i   (x i ) y i   (x i ) Construct the linear decision plane f(γ,b)=γ T z + b in the feature space R f, with γ Є R f, b Є R In Rf, we need to solve However, we do not want to solve this in an explicit form of . Instead, we want to solve it in a kernelization form K(z 1,z 2 )=  (z 1 ) T  (z 2 )

20 ICML2004, Banff, Alberta, Canada The Chinese University of Hong Kong Nonlinear Classifier: Kernelization

21 ICML2004, Banff, Alberta, Canada The Chinese University of Hong Kong Nonlinear Classifier: Kernelization Notation

22 ICML2004, Banff, Alberta, Canada The Chinese University of Hong Kong Experimental Results Toy Example: Two Gaussian Data with different data trends

23 ICML2004, Banff, Alberta, Canada The Chinese University of Hong Kong Data sets: UCI Machine Learning Repository Procedures: 10-fold Cross validation Solving Package: SVM: Libsvm 2.4, M4: Sedumi 1.05 MPM: MPM 1.0 In linear cases, M 4 outperforms SVM and MPM In Gaussian cases, M 4 is slightly better or comparable than SVM (1). Sparsity in the feature space results in inaccurate estimation of covariance matrices (2) Kernelization may not keep data topology of the original data. — Maximizing Margin in the feature space does not necessarily maximize margin in the original space Experimental Results

24 ICML2004, Banff, Alberta, Canada The Chinese University of Hong Kong From Simon Tong et al. Restricted Bayesian Optimal classifiers, AAAI, 2000. An example to illustrate that maximizing Margin in the feature space does not necessarily maximize margin in the original space Experimental Results

25 ICML2004, Banff, Alberta, Canada The Chinese University of Hong Kong Future Work Speeding up M 4  Contain support vectors — can we employ its sparsity as has been done in SVM?  Can we reduce redundant points?? How to impose constrains on the kernelization for keeping the topology of data? How to impose constrains on the kernelization for keeping the topology of data? Generalization error bound? Generalization error bound?  SVM and MPM have both error bounds. How to extend to multi-category classifications?

26 ICML2004, Banff, Alberta, Canada The Chinese University of Hong Kong Conclusion Proposed a new large margin classifier M 4 which learns the decision boundary both locally and globally Built theoretical connections with other models: A unified model of SVM, MPM and LDA Developed sequential Second Order Cone Programming algorithm for M 4 Experimental results demonstrated the advantages of our new model

27 ICML2004, Banff, Alberta, Canada Thanks! The Chinese University of Hong Kong


Download ppt "ICML2004, Banff, Alberta, Canada Learning Larger Margin Machine Locally and Globally Kaizhu Huang Haiqin Yang, Irwin King, Michael."

Similar presentations


Ads by Google