Kernel Methods Dept. Computer Science & Engineering, Shanghai Jiao Tong University.

Slides:



Advertisements
Similar presentations
Pattern Recognition and Machine Learning
Advertisements

Pattern Recognition and Machine Learning
Modeling of Data. Basic Bayes theorem Bayes theorem relates the conditional probabilities of two events A, and B: A might be a hypothesis and B might.
Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) ETHEM ALPAYDIN © The MIT Press, 2010
Notes Sample vs distribution “m” vs “µ” and “s” vs “σ” Bias/Variance Bias: Measures how much the learnt model is wrong disregarding noise Variance: Measures.
Pattern Recognition and Machine Learning
Pattern Recognition and Machine Learning: Kernel Methods.
Computer vision: models, learning and inference Chapter 8 Regression.
Chapter 4: Linear Models for Classification
Data mining and statistical learning - lecture 6
Basis Expansion and Regularization
Visual Recognition Tutorial
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Kernel methods - overview
MACHINE LEARNING 9. Nonparametric Methods. Introduction Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 
6/10/ Visual Recognition1 Radial Basis Function Networks Computer Science, KAIST.
Lecture Notes for CMPUT 466/551 Nilanjan Ray
Kernel Methods Part 2 Bing Han June 26, Local Likelihood Logistic Regression.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
1 An Introduction to Nonparametric Regression Ning Li March 15 th, 2004 Biostatistics 277.
Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)
8/10/ RBF NetworksM.W. Mak Radial Basis Function Networks 1. Introduction 2. Finding RBF Parameters 3. Decision Surface of RBF Networks 4. Comparison.
Chapter 6-2 Radial Basis Function Networks 1. Topics Basis Functions Radial Basis Functions Gaussian Basis Functions Nadaraya Watson Kernel Regression.
Binary Variables (1) Coin flipping: heads=1, tails=0 Bernoulli Distribution.
Outline Separating Hyperplanes – Separable Case
Model Inference and Averaging
INTRODUCTION TO MACHINE LEARNING 3RD EDITION ETHEM ALPAYDIN © The MIT Press, Lecture.
Regression analysis Control of built engineering objects, comparing to the plan Surveying observations – position of points Linear regression Regression.
COMMON EVALUATION FINAL PROJECT Vira Oleksyuk ECE 8110: Introduction to machine Learning and Pattern Recognition.
Learning Theory Reza Shadmehr logistic regression, iterative re-weighted least squares.
Logistic Regression Debapriyo Majumdar Data Mining – Fall 2014 Indian Statistical Institute Kolkata September 1, 2014.
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 3: LINEAR MODELS FOR REGRESSION.
Overview of Supervised Learning Overview of Supervised Learning2 Outline Linear Regression and Nearest Neighbors method Statistical Decision.
Mixture Models, Monte Carlo, Bayesian Updating and Dynamic Models Mike West Computing Science and Statistics, Vol. 24, pp , 1993.
Ch 2. Probability Distributions (1/2) Pattern Recognition and Machine Learning, C. M. Bishop, Summarized by Yung-Kyun Noh and Joo-kyung Kim Biointelligence.
Ch 4. Linear Models for Classification (1/2) Pattern Recognition and Machine Learning, C. M. Bishop, Summarized and revised by Hee-Woong Lim.
Kernel Methods Jong Cheol Jeong. Out line 6.1 One-Dimensional Kernel Smoothers Local Linear Regression Local Polynomial Regression 6.2 Selecting.
INTRODUCTION TO MACHINE LEARNING 3RD EDITION ETHEM ALPAYDIN Modified by Prof. Carolina Ruiz © The MIT Press, 2014 for CS539 Machine Learning at WPI
Lecture3 – Overview of Supervised Learning Rice ELEC 697 Farinaz Koushanfar Fall 2006.
Jakob Verbeek December 11, 2009
Lecture 16 - Approximation Methods CVEN 302 July 15, 2002.
Chapter1: Introduction Chapter2: Overview of Supervised Learning
Chapter 20 Classification and Estimation Classification – Feature selection Good feature have four characteristics: –Discrimination. Features.
Chapter 13 (Prototype Methods and Nearest-Neighbors )
Introduction to Machine Learning Multivariate Methods 姓名 : 李政軒.
Computational Intelligence: Methods and Applications Lecture 29 Approximation theory, RBF and SFN networks Włodzisław Duch Dept. of Informatics, UMK Google:
1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.
Kernel Methods Arie Nakhmani. Outline Kernel Smoothers Kernel Density Estimators Kernel Density Classifiers.
Linear Classifiers Dept. Computer Science & Engineering, Shanghai Jiao Tong University.
Ch 2. Probability Distributions (1/2) Pattern Recognition and Machine Learning, C. M. Bishop, Summarized by Joo-kyung Kim Biointelligence Laboratory,
Local Likelihood & other models, Kernel Density Estimation & Classification, Radial Basis Functions.
Part 3: Estimation of Parameters. Estimation of Parameters Most of the time, we have random samples but not the densities given. If the parametric form.
CS Statistical Machine learning Lecture 7 Yuan (Alan) Qi Purdue CS Sept Acknowledgement: Sargur Srihari’s slides.
1 C.A.L. Bailer-Jones. Machine Learning. Data exploration and dimensionality reduction Machine learning, pattern recognition and statistical data modelling.
Estimating standard error using bootstrap
Probability Theory and Parameter Estimation I
Dept. Computer Science & Engineering, Shanghai Jiao Tong University
Dept. Computer Science & Engineering, Shanghai Jiao Tong University
Model Inference and Averaging
Ch8: Nonparametric Methods
Machine learning, pattern recognition and statistical data modelling
Overview of Supervised Learning
Mathematical Foundations of BME Reza Shadmehr
Generally Discriminant Analysis
Learning Theory Reza Shadmehr
Mathematical Foundations of BME
Mathematical Foundations of BME
Linear Discrimination
Support Vector Machines 2
Presentation transcript:

Kernel Methods Dept. Computer Science & Engineering, Shanghai Jiao Tong University

Kernel Methods2 Outline One-Dimensional Kernel Smoothers Local Regression Local Likelihood Kernel Density estimation Naive Bayes Radial Basis Functions Mixture Models and EM

Kernel Methods3 One-Dimensional Kernel Smoothers k-NN: 30-NN curve is bumpy, since is discontinuous in x. The average changes in a discrete way, leading to a discontinuous.

Kernel Methods4 Nadaraya-Watson Kernel weighted average: Epanechnikov quadratic kernel: One-Dimensional Kernel Smoothers

Kernel Methods5 One-Dimensional Kernel Smoothers More general kernel: – : width function that determines the width of the neighborhood at x 0. –For quadratic kernel –For k-NN kernel Variance constant –The Epanechnikov kernel has compact support

Kernel Methods6 Three popular kernel for local smoothing: Epanechnikov kernel and tri-cube kernel are compact but tri-cube has two continuous derivatives Gaussian kernel is infinite support One-Dimensional Kernel Smoothers

Kernel Methods7 Boundary issue –Badly biased on the boundaries because of the asymmetry of the kernel in the region. –Linear fitting remove the bias to first order Local Linear Regression

Kernel Methods8 Local Linear Regression Locally weighted linear regression make a first- order correction Separate weighted least squares at each target point x 0 : The estimate: b(x) T =(1,x); B: Nx2 regression matrix with i- th row b(x) T ;

Kernel Methods9 Local Linear Regression The weights combine the weighting kernel and the least squares operations——Equivalent Kernel

Kernel Methods10 The expansion for, using the linearity of local regression and a series expansion of the true function f around x 0 For local regression The bias depends only on quadratic and higher-order terms in the expansion of. Local Linear Regression

Kernel Methods11 Local Polynomial Regression Fit local polynomial fits of any degree d

Kernel Methods12 Local Polynomial Regression Bias only have components of degree d+1 and higher. The reduction for bias costs the increased variance.

Kernel Methods13 选择核的宽度 核 中, 是参数,控制核宽度: – 对于有紧支集的核, 取其支集区域的半径 – 对于高斯核, 取其方差 – 对 k- 对近邻域法, 取 k/N 窗口宽度导致偏倚 - 方差权衡: – 窗口较窄,方差误差大,均值误差偏倚小 – 窗口较宽,方差误差小,均值误差偏倚大

Kernel Methods14 Structured Local Regression Structured kernels –Introduce structure by imposing appropriate restrictions on A Structured regression function –Introduce structure by eliminating some of the higher-order terms

Kernel Methods15 Any parametric model can be made local: –Parameter associated with : –Log-likelihood: –Model likelihood local to : –A varying coefficient model Local Likelihood & Other Models

Kernel Methods16 Logistic Regression –Local log-likelihood for the J class model –Center the local regressions at Local Likelihood & Other Models

Kernel Methods17 A natural local estimate The smooth Parzen estimate –For Gaussian kernel –The estimate become Kernel Density Estimation

Kernel Methods18 Kernel Density Estimation A kernel density estimate for systolic blood pressure. The density estimate at each point is the average contribution from each of the kernels at that point.

Kernel Methods19 Bayes’ theorem: The estimate for CHD uses the tri-cube kernel with k-NN bandwidth. Kernel Density Classification

Kernel Methods20 Kernel Density Classification The population class densities and the posterior probabilities

Kernel Methods21 Naïve Bayes Naïve Bayes model assume that given a class G=j, the features X k are independent: – is kernel density estimate, or Gaussian, for coordinate X k in class j. –If X k is categorical, use Histogram.

Kernel Methods22 Radial Basis Function & Kernel Radial basis function combine the local and flexibility of kernel methods. –Each basis element is indexed by a location or prototype parameter and a scale parameter –, a pop choice is the standard Gaussian density function.

Kernel Methods23 Radial Basis Function & Kernel For simplicity, focus on least squares methods for regression, and use the Gaussian kernel. RBF network model: Estimate the separately from the. A undesirable side effect of creating holes—— regions of IR p where none of the kernels has appreciable support.

Kernel Methods24 Gaussian radial basis function with fixed width can leave holes. Renormalized Gaussian radial basis function produce basis functions similar in some respects to B-splines. Renormalized radial basis functions. The expansion in renormalized RBF Radial Basis Function & Kernel

Kernel Methods25 Mixture Models & EM Gaussian Mixture Model – are mixture proportions, EM algorithm for mixtures –Given log-likelihood: –Suppose we observe Latent Binary Bad Good

Kernel Methods26 Mixture Models & EM Given,compute In Example

Kernel Methods27 Mixture Models & EM Application of mixtures to the heart disease risk factor study.

Kernel Methods28 Mixture Models & EM Mixture model used for classification of the simulated data