Kernel Methods Jong Cheol Jeong. Out line 6.1 One-Dimensional Kernel Smoothers 6.1.1 Local Linear Regression 6.1.2 Local Polynomial Regression 6.2 Selecting.

Slides:

Advertisements

Similar presentations

Pattern Recognition and Machine Learning

Advertisements

Introduction to Support Vector Machines (SVM)

Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) ETHEM ALPAYDIN © The MIT Press, 2010

INTRODUCTION TO Machine Learning 2nd Edition

Notes Sample vs distribution “m” vs “µ” and “s” vs “σ” Bias/Variance Bias: Measures how much the learnt model is wrong disregarding noise Variance: Measures.

SVM - Support Vector Machines A new classification method for both linear and nonlinear data It uses a nonlinear mapping to transform the original training.

Linear Classifiers/SVMs

Pattern Recognition and Machine Learning

CHAPTER 10: Linear Discrimination

An Introduction of Support Vector Machine

Support Vector Machines

SVM—Support Vector Machines

Computer vision: models, learning and inference Chapter 8 Regression.

CSCI 347 / CS 4206: Data Mining Module 07: Implementations Topic 03: Linear Models.

Model assessment and cross-validation - overview

Data mining and statistical learning - lecture 6

Classification and Decision Boundaries

Support Vector Machines (and Kernel Methods in general)

Kernel methods - overview

Linear Methods for Regression Dept. Computer Science & Engineering, Shanghai Jiao Tong University.

MACHINE LEARNING 9. Nonparametric Methods. Introduction Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 

Lecture Notes for CMPUT 466/551 Nilanjan Ray

1 Classification: Definition Given a collection of records (training set ) Each record contains a set of attributes, one of the attributes is the class.

Linear Regression Models Based on Chapter 3 of Hastie, Tibshirani and Friedman Slides by David Madigan.

Support Vector Machines

Lecture 10: Support Vector Machines

1 An Introduction to Nonparametric Regression Ning Li March 15 th, 2004 Biostatistics 277.

Oregon State University – Intelligent Systems Group 8/22/2003ICML Giorgio Valentini Dipartimento di Scienze dell Informazione Università degli Studi.

1 Linear Classification Problem Two approaches: -Fisher’s Linear Discriminant Analysis -Logistic regression model.

Classification and Prediction: Regression Analysis

Ch. Eick: Support Vector Machines: The Main Ideas Reading Material Support Vector Machines: 1.Textbook 2. First 3 columns of Smola/Schönkopf article on.

Outline Separating Hyperplanes – Separable Case

Perceptual and Sensory Augmented Computing Machine Learning, WS 13/14 Machine Learning – Lecture 14 Introduction to Regression Bastian Leibe.

Integration II Prediction. Kernel-based data integration SVMs and the kernel “trick” Multiple-kernel learning Applications – Protein function prediction.

1 CSC 4510, Spring © Paula Matuszek CSC 4510 Support Vector Machines 2 (SVMs)

11/12/2012ISC471 / HCI571 Isabelle Bichindaritz 1 Prediction.

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 3: LINEAR MODELS FOR REGRESSION.

Overview of Supervised Learning Overview of Supervised Learning2 Outline Linear Regression and Nearest Neighbors method Statistical Decision.

Text Classification 2 David Kauchak cs459 Fall 2012 adapted from:

INTRODUCTION TO MACHINE LEARNING 3RD EDITION ETHEM ALPAYDIN Modified by Prof. Carolina Ruiz © The MIT Press, 2014 for CS539 Machine Learning at WPI

CROSS-VALIDATION AND MODEL SELECTION Many Slides are from: Dr. Thomas Jensen -Expedia.com and Prof. Olga Veksler - CS Learning and Computer Vision.

Lecture3 – Overview of Supervised Learning Rice ELEC 697 Farinaz Koushanfar Fall 2006.

Bias and Variance of the Estimator PRML 3.2 Ethem Chp. 4.

Review of fundamental 1 Data mining in 1D: curve fitting by LLS Approximation-generalization tradeoff First homework assignment.

Chapter1: Introduction Chapter2: Overview of Supervised Learning

Kernel Methods Dept. Computer Science & Engineering, Shanghai Jiao Tong University.

ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.

Bias and Variance of the Estimator PRML 3.2 Ethem Chp. 4.

Machine Learning 5. Parametric Methods.

Kernel Methods Arie Nakhmani. Outline Kernel Smoothers Kernel Density Estimators Kernel Density Classifiers.

Basis Expansions and Generalized Additive Models Basis expansion Piecewise polynomials Splines Generalized Additive Model MARS.

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 3: LINEAR MODELS FOR REGRESSION.

Model Selection and the Bias–Variance Tradeoff All models described have a smoothing or complexity parameter that has to be considered: multiplier of the.

ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 1: INTRODUCTION.

Support Vector Machines

PREDICT 422: Practical Machine Learning

Probability Theory and Parameter Estimation I

Dept. Computer Science & Engineering, Shanghai Jiao Tong University

Dept. Computer Science & Engineering, Shanghai Jiao Tong University

Machine learning, pattern recognition and statistical data modelling

Bias and Variance of the Estimator

Regression Analysis PhD Course.

Linear regression Fitting a straight line to observations.

COSC 4335: Other Classification Techniques

Nonlinear Fitting.

Generally Discriminant Analysis

Linear Discrimination

Support Vector Machines 2

Presentation transcript:

Kernel Methods Jong Cheol Jeong

Out line 6.1 One-Dimensional Kernel Smoothers Local Linear Regression Local Polynomial Regression 6.2 Selecting the Width of Kernel 6.3 Local Regression in R p 6.4 Structured Local Regression Models in R p 6.5 Local Likelihood and Other Models 6.6 Kernel Density Estimation and Classification 6.7Radial Basis Functions and Kernels 6.8 Mixture Models for Density Estimation and Classification

Kernel Function: the kernel function is a weighting function by assigning weights to the nearby data points in making an estimate.

One-Dimensional Kernel Smoothers (6.1) K-nearest-neighbor

One-Dimensional Kernel Smoothers (6.2) Nadaraya-Watson kernel-weighted average With the Epanechnikov quadratic kernel (6.3) (6.4)

One-Dimensional Kernel Smoothers Adaptive neighborhoods with kernels (6.5) X [k] is the kth closest x i to x 0 (6.6) Tri-cube

One-Dimensional Kernel Smoothers Nearest-Neighbor kernel Vs. Epanechnikov kernel

Local Linear Regression

Locally weighted linear regression (6.7) Estimate function with equivalent kernel (6.8) (6.9)

Local Polynomial Regression Local quadratic regression (6.11) Trimming the hills and filling the valleys

Local Polynomial Regression Bias-variance tradeoff in selecting the polynomial degree

Selecting the Width of Kernel Bias-variance tradeoff in selecting the width The window is narrow then its variance will be relatively large, and the bias will tend to be small The window is wide then its variance will be relatively small, and the bias will tend to be higher

Local Regression in R p Local regression in p-dimension (6.12) D can be radial function or tri-cube function (6.13)

Structured Local Regression Models in R p Structured kernels (6.14) When the dimension to sample-size ratio is unfavorable, local regression does not help us much, unless we are willing to make some structural assumptions about the model - Downgrading or omitting coordinates can reduce the error Equation 6.13 gives equal weight to each coordinate, so we can modify the Kernel in order to control the weight on each coordinate

Structured Regression functions Fitting a regression function: considering every labels of interaction ANOVA decompositions: a statistical idea of analyzing the variances between different variables and find certain dependencies on subset of variables (6.15) Eliminating some of higher-order terms

Structured Regression functions Dividing the p predictors in X (6.16) Regression model by locally weighted least squares (6.17) Constructing a linear model for given Z Varying coefficient models Varying coefficient models: a special case of structured model

Questions Section 6.2 details how we may select the optimal lambda parameter for a kernel. How do we select the optimal kernel function? Are there kernels that tend to outperform others in most cases? If not, are there ways to determine a kernel that may perform well without doing an experiment?

Questions One benefit of using kernels with SVM's is that we can expand the dimensionality of the dataset and make it more likely to find a separating hyperplane with a hard margin. But section 6.3 says that for local regression, the proportion of points on the boundary increases to 1 as the dimensionality increases. Thus, the predictions we make will have even more bias. Is there a compromise solution that will work, or is the kernel trick best applied in classification problems?

Questions?