MURI Meeting July 2002 Gert Lanckriet ( ) L. El Ghaoui, M. Jordan, C. Bhattacharrya, N. Cristianini, P. Bartlett.

Slides:



Advertisements
Similar presentations
Introduction to Support Vector Machines (SVM)
Advertisements

Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?
Classification / Regression Support Vector Machines
CHAPTER 10: Linear Discrimination
An Introduction of Support Vector Machine
Support Vector Machines
SVM—Support Vector Machines
Support vector machine
Machine learning continued Image source:
N.U.S. - January 13, 2006 Gert Lanckriet U.C. San Diego Classification problems with heterogeneous information sources.
Discriminative, Unsupervised, Convex Learning Dale Schuurmans Department of Computing Science University of Alberta MITACS Workshop, August 26, 2005.
Continuous optimization Problems and successes
Robust Multi-Kernel Classification of Uncertain and Imbalanced Data
Discriminative and generative methods for bags of features
Fuzzy Support Vector Machines (FSVMs) Weijia Wang, Huanren Zhang, Vijendra Purohit, Aditi Gupta.
Support Vector Machines (SVMs) Chapter 5 (Duda et al.)
Dual Problem of Linear Program subject to Primal LP Dual LP subject to ※ All duality theorems hold and work perfectly!
Support Vector Classification (Linearly Separable Case, Primal) The hyperplanethat solves the minimization problem: realizes the maximal margin hyperplane.
Direct Convex Relaxations of Sparse SVM Antoni B. Chan, Nuno Vasconcelos, and Gert R. G. Lanckriet The 24th International Conference on Machine Learning.
Variations of Minimax Probability Machine Huang, Kaizhu
Support Vector Machines for Multiple- Instance Learning Authors: Andrews, S.; Tsochantaridis, I. & Hofmann, T. (Advances in Neural Information Processing.
Support Vector Machines Formulation  Solve the quadratic program for some : min s. t.,, denotes where or membership.  Different error functions and measures.
Classification Problem 2-Category Linearly Separable Case A- A+ Malignant Benign.
Binary Classification Problem Linearly Separable Case
Sketched Derivation of error bound using VC-dimension (1) Bound our usual PAC expression by the probability that an algorithm has 0 error on the training.
Lecture outline Support vector machines. Support Vector Machines Find a linear hyperplane (decision boundary) that will separate the data.
Seminar in Advanced Machine Learning Rong Jin. Course Description  Introduction to the state-of-the-art techniques in machine learning  Focus of this.
What is Learning All about ?  Get knowledge of by study, experience, or being taught  Become aware by information or from observation  Commit to memory.
Lecture 10: Support Vector Machines
Optimization Theory Primal Optimization Problem subject to: Primal Optimal Value:
Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)
An Introduction to Support Vector Machines Martin Law.
Ch. Eick: Support Vector Machines: The Main Ideas Reading Material Support Vector Machines: 1.Textbook 2. First 3 columns of Smola/Schönkopf article on.
Integration II Prediction. Kernel-based data integration SVMs and the kernel “trick” Multiple-kernel learning Applications – Protein function prediction.
Support Vector Machine (SVM) Based on Nello Cristianini presentation
1 SUPPORT VECTOR MACHINES İsmail GÜNEŞ. 2 What is SVM? A new generation learning system. A new generation learning system. Based on recent advances in.
ICML2004, Banff, Alberta, Canada Learning Larger Margin Machine Locally and Globally Kaizhu Huang Haiqin Yang, Irwin King, Michael.
An Introduction to Support Vector Machines (M. Law)
1 Kernel based data fusion Discussion of a Paper by G. Lanckriet.
Learning Vilanova Multi-classification by using tri- class SVM Luis González and Francisco Velasco Appl. Economics – University of Sevilla Cecilio.
Nonlinear Data Discrimination via Generalized Support Vector Machines David R. Musicant and Olvi L. Mangasarian University of Wisconsin - Madison
Kernels Usman Roshan CS 675 Machine Learning. Feature space representation Consider two classes shown below Data cannot be separated by a hyperplane.
Robust Optimization and Applications Laurent El Ghaoui IMA Tutorial, March 11, 2003.
Dd Generalized Optimal Kernel-based Ensemble Learning for HS Classification Problems Generalized Optimal Kernel-based Ensemble Learning for HS Classification.
CSE4334/5334 DATA MINING CSE4334/5334 Data Mining, Fall 2014 Department of Computer Science and Engineering, University of Texas at Arlington Chengkai.
Support Vector Machines Tao Department of computer science University of Illinois.
Final Exam Review CS479/679 Pattern Recognition Dr. George Bebis 1.
Support Vector Machines (SVM): A Tool for Machine Learning Yixin Chen Ph.D Candidate, CSE 1/10/2002.
Robust Optimization and Applications in Machine Learning.
© Eric CMU, Machine Learning Support Vector Machines Eric Xing Lecture 4, August 12, 2010 Reading:
Greg GrudicIntro AI1 Support Vector Machine (SVM) Classification Greg Grudic.
SVMs in a Nutshell.
CSSE463: Image Recognition Day 14 Lab due Weds. Lab due Weds. These solutions assume that you don't threshold the shapes.ppt image: Shape1: elongation.
Generalization Error of pac Model  Let be a set of training examples chosen i.i.d. according to  Treat the generalization error as a r.v. depending on.
FUZZ-IEEE Kernel Machines and Additive Fuzzy Systems: Classification and Function Approximation Yixin Chen and James Z. Wang The Pennsylvania State.
The Chinese University of Hong Kong Learning Larger Margin Machine Locally and Globally Dept. of Computer Science and Engineering The Chinese University.
Knowledge-Based Nonlinear Support Vector Machine Classifiers Glenn Fung, Olvi Mangasarian & Jude Shavlik COLT 2003, Washington, DC. August 24-27, 2003.
Geometrical intuition behind the dual problem
An Introduction to Support Vector Machines
Robust Optimization and Applications in Machine Learning
Support Vector Machines Introduction to Data Mining, 2nd Edition by
Machine Learning Week 2.
Pattern Recognition CS479/679 Pattern Recognition Dr. George Bebis
Classification Discriminant Analysis
Minimax Probability Machine (MPM)
CSSE463: Image Recognition Day 14
Robust Optimization and Applications in Machine Learning
Machine Learning Week 3.
Other Classification Models: Support Vector Machine (SVM)
COSC 4368 Machine Learning Organization
Presentation transcript:

MURI Meeting July 2002 Gert Lanckriet ( ) L. El Ghaoui, M. Jordan, C. Bhattacharrya, N. Cristianini, P. Bartlett U.C. Berkeley Convex Optimization in Machine Learning

QP LP QCQP SDP SOCP Advanced Convex Optimization in Machine Learning

Linear Programming (LP)

Quadratic Programming (QP)

Quadratic Constrained Quadratic Programming (QCQP)

Second Order Cone Programming (SOCP)

Semi-Definite Programming

Advanced Convex Optimization in Machine Learning

MPM: Problem Sketch (1) a T z = b : decision hyperplane

MPM: Problem Sketch (2)

MPM: Problem Sketch (3) Probability of misclassification… … for worst-case class- conditional density… … should be minimized !

MPM: Main Result (1) Marshall & Olkin / Popescu & Bertsimas ??

MPM: Main Result (2)

Lemma MPM: Main Result (3)

MPM: Main Result (4) Lemma Probabilistic Constraint Deterministic Constraint

MPM: Main Result (5)

MPM: Geometric Interpretation

MPM: Link with FDA (1)

MPM: Link with FDA (2)

MPM: Link with FDA (3)

Robustness to Estimation Errors: Robust MPM (R-MPM)

MPM: Convex Optimization to solve the problem Linear Classifier Nonlinear Classifier Kernelizing Convex Optimization: Second Order Cone Program (SOCP) ) competitive with Quadratic Program (QP) SVMs Lemma

MPM: Empirical results  =1–  and TSA (test-set accuracy) of the MPM, compared to BPB (best performance in Breiman's report (Arcing classifiers, 1996)) and SVMs. (averages for 50 random partitions into 90% training and 10% test sets) Comparable with existing literature, SVMs  = 1-  is indeed smaller than the test-set accuracy in all cases (consistent with  as worst-case bound on probability of misclassification) Kernelizing leads to more powerfull decision boundaries (  linear decision boundary <  nonlinear decision boundary (Gaussian kernel) )

Conclusions

Future directions

Advanced Convex Optimization in Machine Learning

The idea (1) Machine learning Kernel-based machine learning

The idea (2)

The idea (3)

training set (labelled) test set (unlabelled) The idea (4)

The idea (5)

Hard margin SVM classifiers (1)

Hard margin SVM classifiers (2)

Hard margin SVM classifiers (3)

Hard margin SVM classifiers (4)

SDP ! Hard margin SVM classifiers (5)

Optimization Learning the kernel matrix ! Learning Hard margin SVM classifiers (6)

training set (labelled) test set (unlabelled) Learning the kernel matrix ! Hard margin SVM classifiers (7)

? Hard margin SVM classifiers (8)

Hard margin SVM classifiers (9)

Hard margin SVM classifiers (10)

Hard margin SVM classifiers (11) Learning Kernel Matrix with SDP !

Empirical results hard margin SVMs

Conclusions and future directions

See also