ICML2004, Banff, Alberta, Canada Learning Larger Margin Machine Locally and Globally Kaizhu Huang Haiqin Yang, Irwin King, Michael.

Slides:



Advertisements
Similar presentations
Introduction to Support Vector Machines (SVM)
Advertisements

ECG Signal processing (2)
Principal Component Analysis Based on L1-Norm Maximization Nojun Kwak IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008.
Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?
Classification / Regression Support Vector Machines
ICONIP 2005 Improve Naïve Bayesian Classifier by Discriminative Training Kaizhu Huang, Zhangbing Zhou, Irwin King, Michael R. Lyu Oct
Support Vector Machines Instructor Max Welling ICS273A UCIrvine.
CHAPTER 10: Linear Discrimination
An Introduction of Support Vector Machine
Support Vector Machines and Kernels Adapted from slides by Tim Oates Cognition, Robotics, and Learning (CORAL) Lab University of Maryland Baltimore County.
Support Vector Machines
SVM—Support Vector Machines
CSCI 347 / CS 4206: Data Mining Module 07: Implementations Topic 03: Linear Models.
Second order cone programming approaches for handing missing and uncertain data P. K. Shivaswamy, C. Bhattacharyya and A. J. Smola Discussion led by Qi.
Fei Xing1, Ping Guo1,2 and Michael R. Lyu2
Classification and Decision Boundaries
Support Vector Machines (SVMs) Chapter 5 (Duda et al.)
Variations of Minimax Probability Machine Huang, Kaizhu
Classification Problem 2-Category Linearly Separable Case A- A+ Malignant Benign.
Discriminative Naïve Bayesian Classifiers Kaizhu Huang Supervisors: Prof. Irwin King, Prof. Michael R. Lyu Markers: Prof. Lai Wan Chan, Prof. Kin Hong.
Lecture outline Support vector machines. Support Vector Machines Find a linear hyperplane (decision boundary) that will separate the data.
Dept. of C.S.E., C.U.H.K. 1 Learning From Data Locally and Globally Kaizhu Huang Supervisors: Prof. Irwin King, Prof. Michael R. Lyu Prof. Michael R. Lyu.
Lecture 10: Support Vector Machines
Linear Discriminant Functions Chapter 5 (Duda et al.)
Greg GrudicIntro AI1 Support Vector Machine (SVM) Classification Greg Grudic.
An Introduction to Support Vector Machines Martin Law.
Ch. Eick: Support Vector Machines: The Main Ideas Reading Material Support Vector Machines: 1.Textbook 2. First 3 columns of Smola/Schönkopf article on.
Linear hyperplanes as classifiers Usman Roshan. Hyperplane separators.
CS 8751 ML & KDDSupport Vector Machines1 Support Vector Machines (SVMs) Learning mechanism based on linear programming Chooses a separating plane based.
1 SUPPORT VECTOR MACHINES İsmail GÜNEŞ. 2 What is SVM? A new generation learning system. A new generation learning system. Based on recent advances in.
10/18/ Support Vector MachinesM.W. Mak Support Vector Machines 1. Introduction to SVMs 2. Linear SVMs 3. Non-linear SVMs References: 1. S.Y. Kung,
An Introduction to Support Vector Machine (SVM) Presenter : Ahey Date : 2007/07/20 The slides are based on lecture notes of Prof. 林智仁 and Daniel Yeung.
Machine Learning Using Support Vector Machines (Paper Review) Presented to: Prof. Dr. Mohamed Batouche Prepared By: Asma B. Al-Saleh Amani A. Al-Ajlan.
Support Vector Machines Reading: Ben-Hur and Weston, “A User’s Guide to Support Vector Machines” (linked from class web page)
An Introduction to Support Vector Machines (M. Law)
Max-Margin Classification of Data with Absent Features Presented by Chunping Wang Machine Learning Group, Duke University July 3, 2008 by Chechik, Heitz,
1 Chapter 6. Classification and Prediction Overview Classification algorithms and methods Decision tree induction Bayesian classification Lazy learning.
Computational Intelligence: Methods and Applications Lecture 23 Logistic discrimination and support vectors Włodzisław Duch Dept. of Informatics, UMK Google:
GRASP Learning a Kernel Matrix for Nonlinear Dimensionality Reduction Kilian Q. Weinberger, Fei Sha and Lawrence K. Saul ICML’04 Department of Computer.
CISC667, F05, Lec22, Liao1 CISC 667 Intro to Bioinformatics (Fall 2005) Support Vector Machines I.
CS 478 – Tools for Machine Learning and Data Mining SVM.
1 Heat Diffusion Classifier on a Graph Haixuan Yang, Irwin King, Michael R. Lyu The Chinese University of Hong Kong Group Meeting 2006.
Dept. of C.S.E., C.U.H.K. 1 Learning From Data Locally and Globally Kaizhu Huang Supervisors: Prof. Irwin King, Prof. Michael R. Lyu Prof. Michael R. Lyu.
An Introduction to Support Vector Machine (SVM)
Linear Models for Classification
Supervised Learning. CS583, Bing Liu, UIC 2 An example application An emergency room in a hospital measures 17 variables (e.g., blood pressure, age, etc)
CSE4334/5334 DATA MINING CSE4334/5334 Data Mining, Fall 2014 Department of Computer Science and Engineering, University of Texas at Arlington Chengkai.
Support Vector Machines. Notation Assume a binary classification problem. –Instances are represented by vector x   n. –Training examples: x = (x 1,
Final Exam Review CS479/679 Pattern Recognition Dr. George Bebis 1.
Classification Course web page: vision.cis.udel.edu/~cv May 14, 2003  Lecture 34.
Feature Selction for SVMs J. Weston et al., NIPS 2000 오장민 (2000/01/04) Second reference : Mark A. Holl, Correlation-based Feature Selection for Machine.
A Kernel Approach for Learning From Almost Orthogonal Pattern * CIS 525 Class Presentation Professor: Slobodan Vucetic Presenter: Yilian Qin * B. Scholkopf.
Support Vector Machines Reading: Ben-Hur and Weston, “A User’s Guide to Support Vector Machines” (linked from class web page)
Greg GrudicIntro AI1 Support Vector Machine (SVM) Classification Greg Grudic.
Computational Intelligence: Methods and Applications Lecture 24 SVM in the non-linear case Włodzisław Duch Dept. of Informatics, UMK Google: W Duch.
Incremental Reduced Support Vector Machines Yuh-Jye Lee, Hung-Yi Lo and Su-Yun Huang National Taiwan University of Science and Technology and Institute.
The Chinese University of Hong Kong Learning Larger Margin Machine Locally and Globally Dept. of Computer Science and Engineering The Chinese University.
Support Vector Machines Reading: Textbook, Chapter 5 Ben-Hur and Weston, A User’s Guide to Support Vector Machines (linked from class web page)
Linear Discriminant Functions Chapter 5 (Duda et al.) CS479/679 Pattern Recognition Dr. George Bebis.
Support Vector Machines (SVMs) Chapter 5 (Duda et al.) CS479/679 Pattern Recognition Dr. George Bebis.
PREDICT 422: Practical Machine Learning
Geometrical intuition behind the dual problem
Support Vector Machines
An Introduction to Support Vector Machines
Support Vector Machines Introduction to Data Mining, 2nd Edition by
Pattern Recognition CS479/679 Pattern Recognition Dr. George Bebis
Classification Discriminant Analysis
Statistical Learning Dong Liu Dept. EEIS, USTC.
Support Vector Machines and Kernels
Linear Discrimination
Presentation transcript:

ICML2004, Banff, Alberta, Canada Learning Larger Margin Machine Locally and Globally Kaizhu Huang Haiqin Yang, Irwin King, Michael R. Lyu Dept. of Computer Science and Engineering The Chinese University of Hong Kong July 5, 2004 The Chinese University of Hong Kong

ICML2004, Banff, Alberta, Canada The Chinese University of Hong Kong Learning Larger Margin Machine Locally and Globally ContributionsBackground: – Linear Binary Classification – Motivation Maxi-Min Margin Machine(M 4 ) –Model Definition –Geometrical Interpretation –Solving Methods –Connections With Other Models –Nonseparable case –Kernelizations Experimental Results Future Work Conclusion

ICML2004, Banff, Alberta, Canada The Chinese University of Hong Kong Theory: A unified model of Support Vector Machine (SVM), Minimax Probability Machine (MPM), and Linear Discriminant Analysis (LDA). Practice: A sequential Conic Programming Problem. Contributions

ICML2004, Banff, Alberta, Canada The Chinese University of Hong Kong Background: Linear Binary Classification Given two classes of data sampled from x and y, we are trying to find a linear decision plane wT z + b=0, which can correctly discriminate x from y. wT z + b< 0, z is classified as y; wT z + b >0, z is classified as x. w T z + b=0 : decision hyperplane Only partial information is available, we need to choose a criterion to select hyperplanes y x

ICML2004, Banff, Alberta, Canada The Chinese University of Hong Kong w T z + b=0 Background: Support Vector Machine Margin Support Vector Machines (SVM): The optimal hyperplane is the one which maximizes the margin between two classes of data Support Vectors The boundary of SVM is exclusively determined by several critical points called support vectors All other points are totally irrelevant with the decision plane SVM discards global information x y

ICML2004, Banff, Alberta, Canada Learning Locally and Globally The Chinese University of Hong Kong w T z + b=0 y x Along the dashed axis, y data have a larger data trend than x data. Therefore, a more reasonable hyerplane may lie closer than x data rather than locating itself in the middle of two classes as in SVM. SVM A more reasonable hyperplane Learning Locally and Globally

ICML2004, Banff, Alberta, Canada The Chinese University of Hong Kong M 4 : Learning Locally and Globally

ICML2004, Banff, Alberta, Canada The Chinese University of Hong Kong M 4 : Geometric Interpretation

ICML2004, Banff, Alberta, Canada The Chinese University of Hong Kong M 4 : Solving Method Divide and Conquer: If we fix ρ to a specific ρ n, the problem changes to check whether this ρ n satisfies the following constraints: If yes, we increase ρ n ; otherwise, we decrease it. Second Order Cone Programming Problem!!!

ICML2004, Banff, Alberta, Canada The Chinese University of Hong Kong M 4 : Solving Method (Cont ’ ) Iterate the following two Divide and Conquer steps: Sequential Second Order Cone Programming Problem!!!

ICML2004, Banff, Alberta, Canada can it satisfy the constraints? Yes No The Chinese University of Hong Kong M 4 : Solving Method (Cont ’ )

ICML2004, Banff, Alberta, Canada The Chinese University of Hong Kong M 4 : Links with MPM + Span all the data points and add them together Exactly MPM Optimization Problem!!!

ICML2004, Banff, Alberta, Canada The Chinese University of Hong Kong M 4 : Links with MPM (Cont ’ ) MPM M 4 Remarks: The procedure is not reversible: MPM is a special case of M4 MPM focuses on building decision boundary GLOBALLY, i.e., it exclusively depends on the means and covariances. However, means and covariances may not be accurately estimated.

ICML2004, Banff, Alberta, Canada If one assumes ∑=I The Chinese University of Hong Kong M 4 : Links with SVM The magnitude of w can scale up without influencing the optimization Support Vector Machines!!! M 4 SVM is the special case of M 4

ICML2004, Banff, Alberta, Canada The Chinese University of Hong Kong M 4 : Links with SVM (Cont ’ ) These two assumptions of SVM are inappropriate If one assumes ∑=I Assumption 1 Assumption 2

ICML2004, Banff, Alberta, Canada The Chinese University of Hong Kong M 4 : Links with LDA If one assumes ∑x=∑y=(∑ * y+∑ * x)/2 Perform a procedure similar to MPM … LDA

ICML2004, Banff, Alberta, Canada The Chinese University of Hong Kong M 4 : Links with LDA (Cont ’ ) Assumption Still inappropriate ? If one assumes ∑x=∑y=(∑ * y+∑ * x)/2

ICML2004, Banff, Alberta, Canada The Chinese University of Hong Kong Nonseparable Case Introducing slack variables How to solve?? Line Search+Second Order Cone Programming

ICML2004, Banff, Alberta, Canada The Chinese University of Hong Kong Nonlinear Classifier: Kernelization Map data to higher dimensional feature space R f x i   (x i ) y i   (x i ) Construct the linear decision plane f(γ,b)=γ T z + b in the feature space R f, with γ Є R f, b Є R In Rf, we need to solve However, we do not want to solve this in an explicit form of . Instead, we want to solve it in a kernelization form K(z 1,z 2 )=  (z 1 ) T  (z 2 )

ICML2004, Banff, Alberta, Canada The Chinese University of Hong Kong Nonlinear Classifier: Kernelization

ICML2004, Banff, Alberta, Canada The Chinese University of Hong Kong Nonlinear Classifier: Kernelization Notation

ICML2004, Banff, Alberta, Canada The Chinese University of Hong Kong Experimental Results Toy Example: Two Gaussian Data with different data trends

ICML2004, Banff, Alberta, Canada The Chinese University of Hong Kong Data sets: UCI Machine Learning Repository Procedures: 10-fold Cross validation Solving Package: SVM: Libsvm 2.4, M4: Sedumi 1.05 MPM: MPM 1.0 In linear cases, M 4 outperforms SVM and MPM In Gaussian cases, M 4 is slightly better or comparable than SVM (1). Sparsity in the feature space results in inaccurate estimation of covariance matrices (2) Kernelization may not keep data topology of the original data. — Maximizing Margin in the feature space does not necessarily maximize margin in the original space Experimental Results

ICML2004, Banff, Alberta, Canada The Chinese University of Hong Kong From Simon Tong et al. Restricted Bayesian Optimal classifiers, AAAI, An example to illustrate that maximizing Margin in the feature space does not necessarily maximize margin in the original space Experimental Results

ICML2004, Banff, Alberta, Canada The Chinese University of Hong Kong Future Work Speeding up M 4  Contain support vectors — can we employ its sparsity as has been done in SVM?  Can we reduce redundant points?? How to impose constrains on the kernelization for keeping the topology of data? How to impose constrains on the kernelization for keeping the topology of data? Generalization error bound? Generalization error bound?  SVM and MPM have both error bounds. How to extend to multi-category classifications?

ICML2004, Banff, Alberta, Canada The Chinese University of Hong Kong Conclusion Proposed a new large margin classifier M 4 which learns the decision boundary both locally and globally Built theoretical connections with other models: A unified model of SVM, MPM and LDA Developed sequential Second Order Cone Programming algorithm for M 4 Experimental results demonstrated the advantages of our new model

ICML2004, Banff, Alberta, Canada Thanks! The Chinese University of Hong Kong