Ziming Zhang*, Ze-Nian Li, Mark Drew School of Computing Science Simon Fraser University Vancouver, Canada {zza27, li, AdaMKL: A Novel.

Slides:



Advertisements
Similar presentations
ECG Signal processing (2)
Advertisements

Principal Component Analysis Based on L1-Norm Maximization Nojun Kwak IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008.
Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?
SVM - Support Vector Machines A new classification method for both linear and nonlinear data It uses a nonlinear mapping to transform the original training.
A CTION R ECOGNITION FROM V IDEO U SING F EATURE C OVARIANCE M ATRICES Kai Guo, Prakash Ishwar, Senior Member, IEEE, and Janusz Konrad, Fellow, IEEE.
Tighter and Convex Maximum Margin Clustering Yu-Feng Li (LAMDA, Nanjing University, China) Ivor W. Tsang.

Support Vector Machines
Support Vector Machines
Ziming Zhang *, Ze-Nian Li, Mark Drew School of Computing Science, Simon Fraser University, Vancouver, B.C., Canada {zza27, li, Learning.
Robust Multi-Kernel Classification of Uncertain and Imbalanced Data
Classification and Prediction: Regression Via Gradient Descent Optimization Bamshad Mobasher DePaul University.
We propose a successive convex matching method to detect actions in videos. The proposed scheme does not need foreground/background separation, works in.
Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?
Locally Constraint Support Vector Clustering
Speeding up multi-task learning Phong T Pham. Multi-task learning  Combine data from various data sources  Potentially exploit the inter-relation between.
1-norm Support Vector Machines Good for Feature Selection  Solve the quadratic program for some : min s. t.,, denotes where or membership. Equivalent.
University of Texas at Austin Machine Learning Group Department of Computer Sciences University of Texas at Austin Support Vector Machines.
Using Analytic QP and Sparseness to Speed Training of Support Vector Machines John C. Platt Presented by: Travis Desell.
Speaker Adaptation for Vowel Classification
Sketched Derivation of error bound using VC-dimension (1) Bound our usual PAC expression by the probability that an algorithm has 0 error on the training.
Reformulated - SVR as a Constrained Minimization Problem subject to n+1+2m variables and 2m constrains minimization problem Enlarge the problem size and.
Chapter 5 Part II 5.3 Spread of Data 5.4 Fisher Discriminant.
Support Vector Machines and Kernel Methods
Support Vector Machines
1 Multiple Kernel Learning Naouel Baili MRL Seminar, Fall 2009.
Efficient Model Selection for Support Vector Machines
Jifeng Dai 2011/09/27.  Introduction  Structural SVM  Kernel Design  Segmentation and parameter learning  Object Feature Descriptors  Experimental.
Machine Learning Seminar: Support Vector Regression Presented by: Heng Ji 10/08/03.
计算机学院 计算感知 Support Vector Machines. 2 University of Texas at Austin Machine Learning Group 计算感知 计算机学院 Perceptron Revisited: Linear Separators Binary classification.
IEEE TRANSSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE
Machine Learning Using Support Vector Machines (Paper Review) Presented to: Prof. Dr. Mohamed Batouche Prepared By: Asma B. Al-Saleh Amani A. Al-Ajlan.
ICML2004, Banff, Alberta, Canada Learning Larger Margin Machine Locally and Globally Kaizhu Huang Haiqin Yang, Irwin King, Michael.
Support Vector Machines Reading: Ben-Hur and Weston, “A User’s Guide to Support Vector Machines” (linked from class web page)
Classifiers Given a feature representation for images, how do we learn a model for distinguishing features from different classes? Zebra Non-zebra Decision.
BING: Binarized Normed Gradients for Objectness Estimation at 300fps
Nonlinear Data Discrimination via Generalized Support Vector Machines David R. Musicant and Olvi L. Mangasarian University of Wisconsin - Madison
Computational Intelligence: Methods and Applications Lecture 23 Logistic discrimination and support vectors Włodzisław Duch Dept. of Informatics, UMK Google:
Kernels Usman Roshan CS 675 Machine Learning. Feature space representation Consider two classes shown below Data cannot be separated by a hyperplane.
Considering Cost Asymmetry in Learning Classifiers Presented by Chunping Wang Machine Learning Group, Duke University May 21, 2007 by Bach, Heckerman and.
Multiple Instance Learning via Successive Linear Programming Olvi Mangasarian Edward Wild University of Wisconsin-Madison.
Design of PCA and SVM based face recognition system for intelligent robots Department of Electrical Engineering, Southern Taiwan University, Tainan County,
University of Texas at Austin Machine Learning Group Department of Computer Sciences University of Texas at Austin Support Vector Machines.
Dd Generalized Optimal Kernel-based Ensemble Learning for HS Classification Problems Generalized Optimal Kernel-based Ensemble Learning for HS Classification.
1  Problem: Consider a two class task with ω 1, ω 2   LINEAR CLASSIFIERS.
Support Vector Machines. Notation Assume a binary classification problem. –Instances are represented by vector x   n. –Training examples: x = (x 1,
CS Statistical Machine learning Lecture 12 Yuan (Alan) Qi Purdue CS Oct
Combining Evolutionary Information Extracted From Frequency Profiles With Sequence-based Kernels For Protein Remote Homology Detection Name: ZhuFangzhi.
Feature Selction for SVMs J. Weston et al., NIPS 2000 오장민 (2000/01/04) Second reference : Mark A. Holl, Correlation-based Feature Selection for Machine.
Hybrid Classifiers for Object Classification with a Rich Background M. Osadchy, D. Keren, and B. Fadida-Specktor, ECCV 2012 Computer Vision and Video Analysis.
A Kernel Approach for Learning From Almost Orthogonal Pattern * CIS 525 Class Presentation Professor: Slobodan Vucetic Presenter: Yilian Qin * B. Scholkopf.
Support Vector Machines Reading: Ben-Hur and Weston, “A User’s Guide to Support Vector Machines” (linked from class web page)
Computational Intelligence: Methods and Applications Lecture 24 SVM in the non-linear case Włodzisław Duch Dept. of Informatics, UMK Google: W Duch.
Iterative K-Means Algorithm Based on Fisher Discriminant UNIVERSITY OF JOENSUU DEPARTMENT OF COMPUTER SCIENCE JOENSUU, FINLAND Mantao Xu to be presented.
Incremental Reduced Support Vector Machines Yuh-Jye Lee, Hung-Yi Lo and Su-Yun Huang National Taiwan University of Science and Technology and Institute.
ICCV 2007 National Laboratory of Pattern Recognition Institute of Automation Chinese Academy of Sciences Half Quadratic Analysis for Mean Shift: with Extension.
Massive Support Vector Regression (via Row and Column Chunking) David R. Musicant and O.L. Mangasarian NIPS 99 Workshop on Learning With Support Vectors.
Minimal Kernel Classifiers Glenn Fung Olvi Mangasarian Alexander Smola Data Mining Institute University of Wisconsin - Madison Informs 2002 San Jose, California,
1 Bilinear Classifiers for Visual Recognition Computational Vision Lab. University of California Irvine To be presented in NIPS 2009 Hamed Pirsiavash Deva.
Support Vector Machine Slides from Andrew Moore and Mingyue Tan.
Semi-Supervised Learning Using Label Mean
Yu-Feng Li 1, James T. Kwok2, Ivor W. Tsang3 and Zhi-Hua Zhou1
Glenn Fung, Murat Dundar, Bharat Rao and Jinbo Bi
Table 1. Advantages and Disadvantages of Traditional DM/ML Methods
Kernels Usman Roshan.
Learning with information of features
Recitation 6: Kernel SVM
Usman Roshan CS 675 Machine Learning
Learning Incoherent Sparse and Low-Rank Patterns from Multiple Tasks
Introduction to Machine Learning
Presentation transcript:

Ziming Zhang*, Ze-Nian Li, Mark Drew School of Computing Science Simon Fraser University Vancouver, Canada {zza27, li, AdaMKL: A Novel Biconvex Multiple Kernel Learning Approach 1 * This work was done when the author was at SFU.

Outline Introduction Adaptive Multiple Kernel Learning Experiments Conclusion 2

Introduction Kernel Given a set of data and a feature mapping function, a kernel matrix can be defined as the inner product of each pair of feature vectors of. Multiple Kernel Learning Aim to learn an optimal kernel as well as support vectors by combining a set of kernels linearly. 3 Kernel coefficients

Introduction Multiple Kernel Learning 4

Example: L p -norm Multiple Kernel Learning [1] Learning train a traditional SVM by fixing the kernel coefficients; learn kernel coefficients by fixing w Introduction Traditional SVM constraints Kernel coefficient constraints 5 [1] M. Kloft, et al. Efficient and accurate Lp-norm multiple kernel learning. In NIPS’09, Convex function

Introduction Motivation L p -norm kernel coefficient constraint makes the learning of kernel coefficients difficult, especially when p>1 Intuition Solve the MKL problem without considering the kernel coefficient constraints explicitly Contributions Propose a family of biconvex optimization formulations for MKL Can handle the cases of arbitrary norms of kernel coefficients Easy and fast to optimize 6

Adaptive Multiple Kernel Learning 7 weighting Biconvex function

Adaptive Multiple Kernel Learning 8 Biconvex functions f(x,y) is a biconvex function if f y (x) is convex and f x (y) is convex. Example: f(x,y)=x 2 +y 2 -3xy Biconvex optimization At least one function in the objective functions and constraints is biconvex, and others are convex. Local optima

Adaptive Multiple Kernel Learning 9 Adaptive Multiple Kernel Learning (AdaMKL) Aim to simplify the MKL learning process as well as keep the similar discriminative power of MKL using biconvex optimization. Binary classification

Adaptive Multiple Kernel Learning 10 Objective function:

Adaptive Multiple Kernel Learning Optimization Learn w by fixing θ using N p ( θ ) norm Learn θ by fixing w using L 1 or L 2 norm of θ Repeat the two steps until converged 11

Adaptive Multiple Kernel Learning 12 Learning w (Dual)

Adaptive Multiple Kernel Learning 13 Learning θ

Adaptive Multiple Kernel Learning 14 Computational complexity Same as quadratic programming Convergence If hard-margin cases (C=+∞) can be solved at the initialization stage, then AdaMKL will converge to a local minimum. If at either step our objective function converged, then AdaMKL has converged to a local minimum.

Adaptive Multiple Kernel Learning L p -norm MKLAdaMKL Convex Kernel coefficient norm condition Gradient search, Semi-infinite programming (SIP), etc Biconvex Kernel coefficient conditions hidden in dual Quadratic programming 15

Experiments 16 4 specific AdaMKL: N 0 L 1, N 1 L 1, N 1 L 2, N 2 L 2, where “N” and “L” denote the types of norm used for learning w and θ. 2 experiments 1. Toy example: C=10 5 without tuning, 10 Gaussian kernels, randomly sampled from 2D Gaussian distributions Positive samples: mean [0 0], covariance [0.3 0; 0 0.3], 100 samples Negative samples: mean [-1 -1] and [1 1], covariance [0.1 0; 0 0.1] and [0.2 0; 0 0.2], 100 samples, respectively.

Experiments (2) benchmark datasets: breast-cancer, heart, thyroid, and titanic (downloaded from Gaussian kernels + polynomial kernels 100, 140, 60, 40 kernels for corresponding datasets, respectively Compared with convex optimization based MKL: GMKL [2] and SMKL [3] [2] A. Rakotomamonjy, F. Bach, S. Canu, and Y. Grandvalet. More efficiency in multiple kernel learning. In ICML’07. [3] A. Rakotomamonjy, F. Bach, S. Canu, and Y. Grandvalet. SimpleMKL. JMLR, 9:2491–2521, 2008.

Experiments - Toy example 18

Experiments - Toy example 19

Experiments - Benchmark datasets 20 (a) Breast-Cancer: [69.64 ~ 75.23](b) Heart: [79.71 ~ 84.05] (c) Thyroid: [95.20 ~ 95.80] (d) Titanic: [76.02 ~ 77.58])

Experiments - Benchmark datasets 21 (a) Breast-Cancer(b) Heart (c) Thyroid (d) Titanic

Conclusion 22 Biconvex optimization for MKL Hide the kernel coefficient constraints (non-negative and L p (p≥1) norm) in the dual without explicit consideration. Easy to optimize, fast to converge, lower computational time but similar performance as traditional convex optimization based MKL

23 Thank you !!!