Kernel methods - overview

Slides:



Advertisements
Similar presentations
Pattern Recognition and Machine Learning
Advertisements

Introduction to Support Vector Machines (SVM)
Brief introduction on Logistic Regression
Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) ETHEM ALPAYDIN © The MIT Press, 2010
Copyright © 2006 The McGraw-Hill Companies, Inc. Permission required for reproduction or display. 1 ~ Curve Fitting ~ Least Squares Regression Chapter.
Pattern Recognition and Machine Learning
Pattern Recognition and Machine Learning: Kernel Methods.
FTP Biostatistics II Model parameter estimations: Confronting models with measurements.
Model Assessment, Selection and Averaging
Model assessment and cross-validation - overview
Data mining and statistical learning - lecture 6
Basis Expansion and Regularization Presenter: Hongliang Fei Brian Quanz Brian Quanz Date: July 03, 2008.
MACHINE LEARNING 9. Nonparametric Methods. Introduction Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 
Chapter 10 Simple Regression.
Lecture Notes for CMPUT 466/551 Nilanjan Ray
Missing at Random (MAR)  is unknown parameter of the distribution for the missing- data mechanism The probability some data are missing does not depend.
Linear Regression Models Based on Chapter 3 of Hastie, Tibshirani and Friedman Slides by David Madigan.
Nonparametric Smoothing Methods and Model Selections T.C. Lin Dept. of Statistics National Taipei University 5/4/2005.
Data mining and statistical learning - lecture 13 Separating hyperplane.
Statistical analysis and modeling of neural data Lecture 4 Bijan Pesaran 17 Sept, 2007.
Linear and generalised linear models
CHAPTER 4: Parametric Methods. Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 Parametric Estimation X = {
1 An Introduction to Nonparametric Regression Ning Li March 15 th, 2004 Biostatistics 277.
Linear and generalised linear models
Chapter 6-2 Radial Basis Function Networks 1. Topics Basis Functions Radial Basis Functions Gaussian Basis Functions Nadaraya Watson Kernel Regression.
Binary Variables (1) Coin flipping: heads=1, tails=0 Bernoulli Distribution.
Regression Analysis Regression analysis is a statistical technique that is very useful for exploring the relationships between two or more variables (one.
LINEAR REGRESSION Introduction Section 0 Lecture 1 Slide 1 Lecture 5 Slide 1 INTRODUCTION TO Modern Physics PHYX 2710 Fall 2004 Intermediate 3870 Fall.
Methods in Medical Image Analysis Statistics of Pattern Recognition: Classification and Clustering Some content provided by Milos Hauskrecht, University.
Nonparametric Regression
PATTERN RECOGNITION AND MACHINE LEARNING
Machine Learning CUNY Graduate Center Lecture 3: Linear Regression.
Outline Separating Hyperplanes – Separable Case
Chapter 15 Modeling of Data. Statistics of Data Mean (or average): Variance: Median: a value x j such that half of the data are bigger than it, and half.
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. 1 Part 4 Curve Fitting.
COMMON EVALUATION FINAL PROJECT Vira Oleksyuk ECE 8110: Introduction to machine Learning and Pattern Recognition.
Managerial Economics Demand Estimation. Scatter Diagram Regression Analysis.
Extrapolation Models for Convergence Acceleration and Function ’ s Extension David Levin Tel-Aviv University MAIA Erice 2013.
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 3: LINEAR MODELS FOR REGRESSION.
Overview of Supervised Learning Overview of Supervised Learning2 Outline Linear Regression and Nearest Neighbors method Statistical Decision.
Kernel Methods Jong Cheol Jeong. Out line 6.1 One-Dimensional Kernel Smoothers Local Linear Regression Local Polynomial Regression 6.2 Selecting.
Lecture3 – Overview of Supervised Learning Rice ELEC 697 Farinaz Koushanfar Fall 2006.
Bias and Variance of the Estimator PRML 3.2 Ethem Chp. 4.
Chapter1: Introduction Chapter2: Overview of Supervised Learning
Kernel Methods Dept. Computer Science & Engineering, Shanghai Jiao Tong University.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
NON-LINEAR REGRESSION Introduction Section 0 Lecture 1 Slide 1 Lecture 6 Slide 1 INTRODUCTION TO Modern Physics PHYX 2710 Fall 2004 Intermediate 3870 Fall.
Bias and Variance of the Estimator PRML 3.2 Ethem Chp. 4.
Machine Learning 5. Parametric Methods.
Kernel Methods Arie Nakhmani. Outline Kernel Smoothers Kernel Density Estimators Kernel Density Classifiers.
CHAPTER 8: Nonparametric Methods Alpaydin transparencies significantly modified, extended and changed by Ch. Eick Last updated: March 4, 2011.
Basis Expansions and Generalized Additive Models Basis expansion Piecewise polynomials Splines Generalized Additive Model MARS.
1 AAEC 4302 ADVANCED STATISTICAL METHODS IN AGRICULTURAL RESEARCH Part II: Theory and Estimation of Regression Models Chapter 5: Simple Regression Theory.
Model Selection and the Bias–Variance Tradeoff All models described have a smoothing or complexity parameter that has to be considered: multiplier of the.
Ch 1. Introduction Pattern Recognition and Machine Learning, C. M. Bishop, Updated by J.-H. Eom (2 nd round revision) Summarized by K.-I.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 1: INTRODUCTION.
1 C.A.L. Bailer-Jones. Machine Learning. Data exploration and dimensionality reduction Machine learning, pattern recognition and statistical data modelling.
PREDICT 422: Practical Machine Learning
Dept. Computer Science & Engineering, Shanghai Jiao Tong University
Dept. Computer Science & Engineering, Shanghai Jiao Tong University
Machine learning, pattern recognition and statistical data modelling
CHAPTER 29: Multiple Regression*
Linear regression Fitting a straight line to observations.
Biointelligence Laboratory, Seoul National University
Simple Linear Regression
Basis Expansions and Generalized Additive Models (2)
Probabilistic Surrogate Models
Presentation transcript:

Kernel methods - overview Kernel smoothers Local regression Kernel density estimation Radial basis functions Data Mining and Statistical Learning - 2008

Data Mining and Statistical Learning - 2008 Introduction Kernel methods are regression techniques used to estimate a response function from noisy data Properties: Different models are fitted at each query point, and only those observations close to that point are used to fit the model The resulting function is smooth The models require only a minimum of training Data Mining and Statistical Learning - 2008

A simple one-dimensional kernel smoother where Data Mining and Statistical Learning - 2008

Kernel methods, splines and ordinary least squares regression (OLS) OLS: A single model is fitted to all data Splines: Different models are fitted to different subintervals (cuboids) of the input domain Kernel methods: Different models are fitted at each query point Data Mining and Statistical Learning - 2008

Kernel-weighted averages and moving averages The Nadaraya-Watson kernel-weighted average where  indicates the window size and the function D shows how the weights change with distance within this window The estimated function is smooth! K-nearest neighbours The estimated function is piecewise constant! Data Mining and Statistical Learning - 2008

Examples of one-dimesional kernel smoothers Epanechnikov kernel Tri-cube kernel Data Mining and Statistical Learning - 2008

Issues in kernel smoothing The smoothing parameter λ has to be defined When there are ties at xi : Compute an average y value and introduce weights representing the number of points Boundary issues Varying density of observations: bias is constant the variance is inversely proportional to the density Data Mining and Statistical Learning - 2008

Boundary effects of one-dimensional kernel smoothers Locally-weighted averages can be badly biased on the boundaries if the response function has a significant slope apply local linear regression Data Mining and Statistical Learning - 2008

Local linear regression Find the intercept and slope parameters solving The solution is a linear combination of yi: Data Mining and Statistical Learning - 2008

Kernel smoothing vs local linear regression Solve the minimization problem Local linear regression Data Mining and Statistical Learning - 2008

Properties of local linear regression Automatically modifies the kernel weights to correct for bias Bias depends only on the terms of order higher than one in the expansion of f. Data Mining and Statistical Learning - 2008

Local polynomial regression Fitting polynomials instead of straight lines Behavior of estimated response function: Data Mining and Statistical Learning - 2008

Polynomial vs local linear regression Advantages: Reduces the ”Trimming of hills and filling of valleys” Disadvantages: Higher variance (tails are more wiggly) Data Mining and Statistical Learning - 2008

Selecting the width of the kernel Bias-Variance tradeoff: Selecting narrow window leads to high variance and low bias whilst selecting wide window leads to high bias and low variance. Data Mining and Statistical Learning - 2008

Selecting the width of the kernel Automatic selection ( cross-validation) Fixing the degrees of freedom Data Mining and Statistical Learning - 2008

Data Mining and Statistical Learning - 2008 Local regression in RP The one-dimensional approach is easily extended to p dimensions by Using the Euclidian norm as a measure of distance in the kernel. Modifying the polynomial Data Mining and Statistical Learning - 2008

Data Mining and Statistical Learning - 2008 Local regression in RP ”The curse of dimensionality” The fraction of points close to the boundary of the input domain increases with its dimension Observed data do not cover the whole input domain Data Mining and Statistical Learning - 2008

Structured local regression models Structured kernels (standardize each variable) Note: A is positive semidefinite Data Mining and Statistical Learning - 2008

Structured local regression models Structured regression functions ANOVA decompositions (e.g., additive models) Backfitting algorithms can be used Varying coefficient models (partition X) INSERT FORMULA 6.17 Data Mining and Statistical Learning - 2008

Structured local regression models Varying coefficient models (example) Data Mining and Statistical Learning - 2008

Data Mining and Statistical Learning - 2008 Local methods Assumption: model is locally linear ->maximize the log-likelihood locally at x0: Autoregressive time series. yt=β0+β1yt-1+…+ βkyt-k+et -> yt=ztT β+et. Fit by local least-squares with kernel K(z0,zt) Data Mining and Statistical Learning - 2008

Kernel density estimation Straightforward estimates of the density are bumpy Instead, Parzen’s smooth estimate is preferred: Normally, Gaussian kernels are used Data Mining and Statistical Learning - 2008

Radial basis functions and kernels Using the idea of basis expansion, we treat kernel functions as basis functions: where ξj –prototype parameter, λj-scale parameter Data Mining and Statistical Learning - 2008

Radial basis functions and kernels Choosing the parameters: Estimate {λj, ξj } separately from βj (often by using the distribution of X alone) and solve least-squares. Data Mining and Statistical Learning - 2008