1 Kernel-class Jan. 13 2005. 2 Recap: Feature Spaces non-linear mapping to F 1. high-D space 2. infinite-D countable space : 3. function space (Hilbert.

Slides:



Advertisements
Similar presentations
ECG Signal processing (2)
Advertisements

Olivier Duchenne , Armand Joulin , Jean Ponce Willow Lab , ICCV2011.

1 Welcome to the Kernel-Class My name: Max (Welling) Book: There will be class-notes/slides. Homework: reading material, some exercises, some MATLAB implementations.
Input Space versus Feature Space in Kernel- Based Methods Scholkopf, Mika, Burges, Knirsch, Muller, Ratsch, Smola presented by: Joe Drish Department of.
An Introduction of Support Vector Machine
Basics of Kernel Methods in Statistical Learning Theory Mohammed Nasser Professor Department of Statistics Rajshahi University
Pattern Recognition and Machine Learning: Kernel Methods.
Computer vision: models, learning and inference Chapter 8 Regression.
CS Statistical Machine learning Lecture 13 Yuan (Alan) Qi Purdue CS Oct
Support Vector Machines
Kernels CMPUT 466/551 Nilanjan Ray. Agenda Kernel functions in SVM: A quick recapitulation Kernels in regression Kernels in k-nearest neighbor classifier.
Support Vector Machine
Pattern Recognition and Machine Learning
Support Vector Machines and Kernel Methods
Support Vector Machines (SVMs) Chapter 5 (Duda et al.)
Support Vector Machines and The Kernel Trick William Cohen
Support Vector Classification (Linearly Separable Case, Primal) The hyperplanethat solves the minimization problem: realizes the maximal margin hyperplane.
1 Introduction to Kernels Max Welling October (chapters 1,2,3,4)
Support Vector Machines Kernel Machines
Classification Problem 2-Category Linearly Separable Case A- A+ Malignant Benign.
Support Vector Machines and Kernel Methods
Support Vector Machines
The Implicit Mapping into Feature Space. In order to learn non-linear relations with a linear machine, we need to select a set of non- linear features.
Kernel Methods and SVM’s. Predictive Modeling Goal: learn a mapping: y = f(x;  ) Need: 1. A model structure 2. A score function 3. An optimization strategy.
1 Computational Learning Theory and Kernel Methods Tianyi Jiang March 8, 2004.
Greg GrudicIntro AI1 Support Vector Machine (SVM) Classification Greg Grudic.
Learning in Feature Space (Could Simplify the Classification Task)  Learning in a high dimensional space could degrade generalization performance  This.
Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)
An Introduction to Support Vector Machines Martin Law.
Overview of Kernel Methods Prof. Bennett Math Model of Learning and Discovery 2/27/05 Based on Chapter 2 of Shawe-Taylor and Cristianini.
1 Reproducing Kernel Exponential Manifold: Estimation and Geometry Kenji Fukumizu Institute of Statistical Mathematics, ROIS Graduate University of Advanced.
Gram-Schmidt Orthogonalization
Support Vector Machine & Image Classification Applications
Support Vector Machine (SVM) Based on Nello Cristianini presentation
Kernel Methods A B M Shawkat Ali 1 2 Data Mining ¤ DM or KDD (Knowledge Discovery in Databases) Extracting previously unknown, valid, and actionable.
Support Vector Machines Reading: Ben-Hur and Weston, “A User’s Guide to Support Vector Machines” (linked from class web page)
An Introduction to Support Vector Machines (M. Law)
Kernels Usman Roshan CS 675 Machine Learning. Feature space representation Consider two classes shown below Data cannot be separated by a hyperplane.
Machine Learning Weak 4 Lecture 2. Hand in Data It is online Only around 6000 images!!! Deadline is one week. Next Thursday lecture will be only one hour.
CISC667, F05, Lec22, Liao1 CISC 667 Intro to Bioinformatics (Fall 2005) Support Vector Machines I.
CS 478 – Tools for Machine Learning and Data Mining SVM.
Support Vector Machines and Kernel Methods Machine Learning March 25, 2010.
Sparse Kernel Methods 1 Sparse Kernel Methods for Classification and Regression October 17, 2007 Kyungchul Park SKKU.
Class 19, spring 2001 CBCl/AI MIT Review by Evgeniou, Pontil and Poggio Advances in Computational Mathematics, 2000 The “b” problem We said that the solution.
Survey of Kernel Methods by Jinsan Yang. (c) 2003 SNU Biointelligence Lab. Introduction Support Vector Machines Formulation of SVM Optimization Theorem.
ENEE698A Graduate Seminar Reproducing Kernel Hilbert Space (RKHS), Regularization Theory, and Kernel Methods Shaohua (Kevin) Zhou Center for Automation.
1 New Horizon in Machine Learning — Support Vector Machine for non-Parametric Learning Zhao Lu, Ph.D. Associate Professor Department of Electrical Engineering,
Properties of Kernels Presenter: Hongliang Fei Date: June 11, 2009.
Support Vector Machines. Notation Assume a binary classification problem. –Instances are represented by vector x   n. –Training examples: x = (x 1,
CS Statistical Machine learning Lecture 12 Yuan (Alan) Qi Purdue CS Oct
© Eric CMU, Machine Learning Support Vector Machines Eric Xing Lecture 4, August 12, 2010 Reading:
Support Vector Machines Exercise solutions Ata Kaban The University of Birmingham.
ADVANCED TOPIC: KERNELS 1. The kernel trick where i 1,…,i k are the mistakes… so: Remember in our alternate perceptron:
Support Vector Machines Reading: Ben-Hur and Weston, “A User’s Guide to Support Vector Machines” (linked from class web page)
Greg GrudicIntro AI1 Support Vector Machine (SVM) Classification Greg Grudic.
Support Vector Machine: An Introduction. (C) by Yu Hen Hu 2 Linear Hyper-plane Classifier For x in the side of o : w T x + b  0; d = +1; For.
Support Vector Machines Part 2. Recap of SVM algorithm Given training set S = {(x 1, y 1 ), (x 2, y 2 ),..., (x m, y m ) | (x i, y i )   n  {+1, -1}
Day 17: Duality and Nonlinear SVM Kristin P. Bennett Mathematical Sciences Department Rensselaer Polytechnic Institute.
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 1: INTRODUCTION.
Support Vector Machine
Geometrical intuition behind the dual problem
Support Vector Machines Introduction to Data Mining, 2nd Edition by
Chap 9. General LP problems: Duality and Infeasibility
Statistical Learning Dong Liu Dept. EEIS, USTC.
Welcome to the Kernel-Club
CS480/680: Intro to ML Lecture 09: Kernels 23/02/2019 Yao-Liang Yu.
Quantum Foundations Lecture 3
Back to Cone Motivation: From the proof of Affine Minkowski, we can see that if we know generators of a polyhedral cone, they can be used to describe.
Introduction to Machine Learning
Presentation transcript:

1 Kernel-class Jan

2 Recap: Feature Spaces non-linear mapping to F 1. high-D space 2. infinite-D countable space : 3. function space (Hilbert space) example:

3 Recap: Kernel Trick Note: In the dual representation we used the Gram matrix to express the solution. Kernel Trick: Replace : kernel If we use algorithms that only depend on the Gram-matrix, G, then we never have to know (compute) the actual features This is the crucial point of kernel methods

4 Recap: Properties of a Kernel Definition: A finitely positive semi-definite function is a symmetric function of its arguments for which matrices formed by restriction on any finite subset of points is positive semi-definite. Theorem: A function can be written as where is a feature map iff k(x,y) satisfies the semi-definiteness property. Relevance: We can now check if k(x,y) is a proper kernel using only properties of k(x,y) itself, i.e. without the need to know the feature map!

5 Reproducing Kernel Hilbert Spaces The proof of the above theorem proceeds by constructing a very special feature map (note that more feature maps may give rise to a kernel) i.e. we map to a function space. definition function space:reproducing property:

6 Mercer’s Theorem Theorem: X is compact, k(x,y) is symmetric continuous function s.t. is a positive semi-definite operator: i.e. then there exists an orthonormal feature basis of eigen-functions such that: Hence: k(x,y) is a proper kernel. Note: Here we construct feature vectors in l2, where the RKHS construction was in a function space.

7 Modularity Kernel methods consist of two modules: 1) The choice of kernel (this is non-trivial) 2) The algorithm which takes kernels as input Modularity: Any kernel can be used with any kernel-algorithm. some kernels: some kernel algorithms: - support vector machine - Fisher discriminant analysis - kernel regression - kernel PCA - kernel CCA

8 Niceties and Challenges Niceties: Kernel algorithms are typically constrained convex optimization problems  solved with either spectral methods or convex optimization tools. Efficient algorithms do exist in most cases. The similarity to linear methods facilitates analysis. There are strong generalization bounds on test error. Challenges: You need to choose the appropriate kernel Kernel learning is prone to over-fitting All information must go through the kernel-bottleneck.

9 Regularization Demo Trevor Hastie. regularization is very important! regularization parameters typically determined by out of sample. measures (cross-validation, leave-one-out). Example: Gaussian Kernel: if c is very small: G=I (all data are dissimilar): over-fitting if c is very large: G=1 (all data are very similar): under-fitting In RKHS view we compute overlap between 2 Gaussians with width “c”.

10 Learning Kernels All information is tunneled through the Gram-matrix information bottleneck. The real art is to pick an appropriate kernel for the data domain. Warning: Since kernels can overfit, we need to regularize. Solution: We need to learn the kernel. Here is some ways to combine kernels to improve them: k1 k2 cone any positive polynomial parameters can be set by i) cross-validation, ii) Bayesian methods, iii) test-error bound minimization.