Direct Convex Relaxations of Sparse SVM Antoni B. Chan, Nuno Vasconcelos, and Gert R. G. Lanckriet The 24th International Conference on Machine Learning.

Slides:



Advertisements
Similar presentations
SVM - Support Vector Machines A new classification method for both linear and nonlinear data It uses a nonlinear mapping to transform the original training.
Advertisements

Tighter and Convex Maximum Margin Clustering Yu-Feng Li (LAMDA, Nanjing University, China) Ivor W. Tsang.
SVM—Support Vector Machines
Sum of Squares and SemiDefinite Programmming Relaxations of Polynomial Optimization Problems The 2006 IEICE Society Conference Kanazawa, September 21,
Feature/Model Selection by Linear Programming SVM, Combined with State-of-Art Classifiers: What Can We Learn About the Data Erinija Pranckeviciene, Ray.
N.U.S. - January 13, 2006 Gert Lanckriet U.C. San Diego Classification problems with heterogeneous information sources.
Continuous optimization Problems and successes
Ziming Zhang*, Ze-Nian Li, Mark Drew School of Computing Science Simon Fraser University Vancouver, Canada {zza27, li, AdaMKL: A Novel.
Robust Multi-Kernel Classification of Uncertain and Imbalanced Data
Jun Zhu Dept. of Comp. Sci. & Tech., Tsinghua University This work was done when I was a visiting researcher at CMU. Joint.
Support Vector Machines
Support Vector Machines (and Kernel Methods in general)
Support Vector Machines and Kernel Methods
1-norm Support Vector Machines Good for Feature Selection  Solve the quadratic program for some : min s. t.,, denotes where or membership. Equivalent.
Dual Problem of Linear Program subject to Primal LP Dual LP subject to ※ All duality theorems hold and work perfectly!
An Introduction to Kernel-Based Learning Algorithms K.-R. Muller, S. Mika, G. Ratsch, K. Tsuda and B. Scholkopf Presented by: Joanna Giforos CS8980: Topics.
Support Vector Classification (Linearly Separable Case, Primal) The hyperplanethat solves the minimization problem: realizes the maximal margin hyperplane.
Support Vector Machines Formulation  Solve the quadratic program for some : min s. t.,, denotes where or membership.  Different error functions and measures.
MURI Meeting July 2002 Gert Lanckriet ( ) L. El Ghaoui, M. Jordan, C. Bhattacharrya, N. Cristianini, P. Bartlett.
Classification Problem 2-Category Linearly Separable Case A- A+ Malignant Benign.
Announcements  Project teams should be decided today! Otherwise, you will work alone.  If you have any question or uncertainty about the project, talk.
Support Vector Machines and Kernel Methods
Support Vector Machines
Dept. of Computer Science & Engineering, CUHK Pseudo Relevance Feedback with Biased Support Vector Machine in Multimedia Retrieval Steven C.H. Hoi 14-Oct,
Lecture outline Support vector machines. Support Vector Machines Find a linear hyperplane (decision boundary) that will separate the data.
Identifying Words that are Musically Meaningful David Torres, Douglas Turnbull, Luke Barrington, Gert Lanckriet Computer Audition Lab UC San Diego ISMIR.
1 Multiple Kernel Learning Naouel Baili MRL Seminar, Fall 2009.
Mehdi Ghayoumi MSB rm 132 Ofc hr: Thur, a Machine Learning.
Optimization Theory Primal Optimization Problem subject to: Primal Optimal Value:
Trading Convexity for Scalability Marco A. Alvarez CS7680 Department of Computer Science Utah State University.
An Introduction to Support Vector Machines Martin Law.
Integration II Prediction. Kernel-based data integration SVMs and the kernel “trick” Multiple-kernel learning Applications – Protein function prediction.
Linear Programming Boosting by Column and Row Generation Kohei Hatano and Eiji Takimoto Kyushu University, Japan DS 2009.
Machine Learning Seminar: Support Vector Regression Presented by: Heng Ji 10/08/03.
Support Vector Machine (SVM) Based on Nello Cristianini presentation
计算机学院 计算感知 Support Vector Machines. 2 University of Texas at Austin Machine Learning Group 计算感知 计算机学院 Perceptron Revisited: Linear Separators Binary classification.
ICML2004, Banff, Alberta, Canada Learning Larger Margin Machine Locally and Globally Kaizhu Huang Haiqin Yang, Irwin King, Michael.
Support Vector Machines Reading: Ben-Hur and Weston, “A User’s Guide to Support Vector Machines” (linked from class web page)
An Introduction to Support Vector Machines (M. Law)
1 Kernel based data fusion Discussion of a Paper by G. Lanckriet.
Fast Maximum Margin Matrix Factorization for Collaborative Prediction Jason Rennie MIT Nati Srebro Univ. of Toronto.
Kernels Usman Roshan CS 675 Machine Learning. Feature space representation Consider two classes shown below Data cannot be separated by a hyperplane.
Spoken Language Group Chinese Information Processing Lab. Institute of Information Science Academia Sinica, Taipei, Taiwan
An Efficient Greedy Method for Unsupervised Feature Selection
Survey of Kernel Methods by Jinsan Yang. (c) 2003 SNU Biointelligence Lab. Introduction Support Vector Machines Formulation of SVM Optimization Theorem.
Introduction to Semidefinite Programs Masakazu Kojima Semidefinite Programming and Its Applications Institute for Mathematical Sciences National University.
University of Texas at Austin Machine Learning Group Department of Computer Sciences University of Texas at Austin Support Vector Machines.
Support vector machine LING 572 Fei Xia Week 8: 2/23/2010 TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A 1.
Dd Generalized Optimal Kernel-based Ensemble Learning for HS Classification Problems Generalized Optimal Kernel-based Ensemble Learning for HS Classification.
CSE4334/5334 DATA MINING CSE4334/5334 Data Mining, Fall 2014 Department of Computer Science and Engineering, University of Texas at Arlington Chengkai.
Support Vector Machine Debapriyo Majumdar Data Mining – Fall 2014 Indian Statistical Institute Kolkata November 3, 2014.
Support Vector Machines. Notation Assume a binary classification problem. –Instances are represented by vector x   n. –Training examples: x = (x 1,
Ariadna Quattoni Xavier Carreras An Efficient Projection for l 1,∞ Regularization Michael Collins Trevor Darrell MIT CSAIL.
Support Vector Machines (SVM): A Tool for Machine Learning Yixin Chen Ph.D Candidate, CSE 1/10/2002.
© Eric CMU, Machine Learning Support Vector Machines Eric Xing Lecture 4, August 12, 2010 Reading:
Robust Optimization and Applications in Machine Learning.
Support Vector Machines Reading: Ben-Hur and Weston, “A User’s Guide to Support Vector Machines” (linked from class web page)
Support Vector Machines Reading: Textbook, Chapter 5 Ben-Hur and Weston, A User’s Guide to Support Vector Machines (linked from class web page)
Efficient non-linear analysis of large data sets
ECE 5424: Introduction to Machine Learning
Metric Learning for Clustering
An Introduction to Support Vector Machines
Kernels Usman Roshan.
Blind Signal Separation using Principal Components Analysis
Nuclear Norm Heuristic for Rank Minimization
Machine Learning Week 3.
Rank-Sparsity Incoherence for Matrix Decomposition
Usman Roshan CS 675 Machine Learning
Primal Sparse Max-Margin Markov Networks
Introduction to Machine Learning
Presentation transcript:

Direct Convex Relaxations of Sparse SVM Antoni B. Chan, Nuno Vasconcelos, and Gert R. G. Lanckriet The 24th International Conference on Machine Learning (ICML 2007) Presented by Shuiwang Ji

Outline Introduction; Quadratically Constrained Quadratic Programming (QCQP) formulation; Semidefinite Programming (SDP) formulation; Experiments;

Sparsity of SVM x1, …, xd SVM is sparse w.r.t. data points, but not sparse w.r.t. features.

Motivations & Related Work Features may be noisy, redundant; Sparsity enhance interpretability; Sparse PCA (Zou et al. & d'Aspremont et al.); Sparse Eigen Methods by D.C. Programming (ICML07);

An Example

Vector Norm Number of nonzero entries in x

C-SVM Primal and Dual 2-norm

LP-SVM Primal and Dual 1-norm

Convex QCQP Relaxation

Interpretations of QCQP-SSVM Problem 6 and 7 are equivalent; QCQP-SSVM is a combination of C-SVM and LP-SVM, 1-norm encourages sparsity and 2-norm encourages large margin;

QCQP-SSVM Dual

QCQP-SSVM QCQP-SSVM automatically learns an adaptive soft-threshold on the original SVM hyperplane.

SDP Relaxation

SDP-SSVM Dual The optimal weighting matrix increases the influence of the relevant features while demoting the less relevant features; SDP-SSVM learns a weighting on the inner product such that the hyperplane in the feature space is sparse.

Results on Synthetic Data

Results on 15 UCI data sets