Model Selection and the Bias–Variance Tradeoff All models described have a smoothing or complexity parameter that has to be considered: multiplier of the.

Slides:



Advertisements
Similar presentations
Pattern Recognition and Machine Learning
Advertisements

Machine Learning and Data Mining Linear regression
Chapter Outline 3.1 Introduction
Prediction with Regression
Pattern Recognition and Machine Learning
Chapter 15 Above: GPS time series from southern California after removing several curve fits to the data.
Data Modeling and Parameter Estimation Nov 9, 2005 PSCI 702.
6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
Model generalization Test error Bias, variance and complexity
Linear regression models
Model assessment and cross-validation - overview
P M V Subbarao Professor Mechanical Engineering Department
Data mining and statistical learning - lecture 6
Basis Expansion and Regularization Presenter: Hongliang Fei Brian Quanz Brian Quanz Date: July 03, 2008.
Basis Expansion and Regularization
Data mining in 1D: curve fitting
Kernel methods - overview
Linear Methods for Regression Dept. Computer Science & Engineering, Shanghai Jiao Tong University.
MACHINE LEARNING 9. Nonparametric Methods. Introduction Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 
Lecture Notes for CMPUT 466/551 Nilanjan Ray
Curve-Fitting Regression
1 An Introduction to Nonparametric Regression Ning Li March 15 th, 2004 Biostatistics 277.
Improving Stable Processes Professor Tom Kuczek Purdue University
Classification and Prediction: Regression Analysis
Calibration & Curve Fitting
PATTERN RECOGNITION AND MACHINE LEARNING
Chapter 15 Modeling of Data. Statistics of Data Mean (or average): Variance: Median: a value x j such that half of the data are bigger than it, and half.
STA302/ week 911 Multiple Regression A multiple regression model is a model that has more than one explanatory variable in it. Some of the reasons.
Non-Linear Models. Non-Linear Growth models many models cannot be transformed into a linear model The Mechanistic Growth Model Equation: or (ignoring.
Outline 1-D regression Least-squares Regression Non-iterative Least-squares Regression Basis Functions Overfitting Validation 2.
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 3: LINEAR MODELS FOR REGRESSION.
Curve-Fitting Regression
Overview of Supervised Learning Overview of Supervised Learning2 Outline Linear Regression and Nearest Neighbors method Statistical Decision.
Kernel Methods Jong Cheol Jeong. Out line 6.1 One-Dimensional Kernel Smoothers Local Linear Regression Local Polynomial Regression 6.2 Selecting.
ISCG8025 Machine Learning for Intelligent Data and Information Processing Week 3 Practical Notes Application Advice *Courtesy of Associate Professor Andrew.
Lecture3 – Overview of Supervised Learning Rice ELEC 697 Farinaz Koushanfar Fall 2006.
Bias and Variance of the Estimator PRML 3.2 Ethem Chp. 4.
Review of fundamental 1 Data mining in 1D: curve fitting by LLS Approximation-generalization tradeoff First homework assignment.
Lecture 6: Point Interpolation
Non-Linear Models. Non-Linear Growth models many models cannot be transformed into a linear model The Mechanistic Growth Model Equation: or (ignoring.
Chapter1: Introduction Chapter2: Overview of Supervised Learning
CSC321: Lecture 7:Ways to prevent overfitting
Bias and Variance of the Estimator PRML 3.2 Ethem Chp. 4.
Machine Learning 5. Parametric Methods.
Regression. We have talked about regression problems before, as the problem of estimating the mapping f(x) between an independent variable x and a dependent.
CS 2750: Machine Learning The Bias-Variance Tradeoff Prof. Adriana Kovashka University of Pittsburgh January 13, 2016.
Kernel Methods Arie Nakhmani. Outline Kernel Smoothers Kernel Density Estimators Kernel Density Classifiers.
MACHINE LEARNING 3. Supervised Learning. Learning a Class from Examples Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
Basis Expansions and Generalized Additive Models Basis expansion Piecewise polynomials Splines Generalized Additive Model MARS.
Pattern recognition – basic concepts. Sample input attribute, attribute, feature, input variable, independent variable (atribut, rys, příznak, vstupní.
Overfitting, Bias/Variance tradeoff. 2 Content of the presentation Bias and variance definitions Parameters that influence bias and variance Bias and.
The “Big Picture” (from Heath 1995). Simple Linear Regression.
Bias-Variance Analysis in Regression  True function is y = f(x) +  where  is normally distributed with zero mean and standard deviation .  Given a.
1 BUSI 6220 By Dr. Nick Evangelopoulos, © 2012 Brief overview of Linear Regression Models (Pre-MBA level)
Computational Intelligence: Methods and Applications Lecture 14 Bias-variance tradeoff – model selection. Włodzisław Duch Dept. of Informatics, UMK Google:
Inference for Regression
Piecewise Polynomials and Splines
Analysis of Variance in Matrix form
The Elements of Statistical Learning
Bias and Variance of the Estimator
Introduction to Instrumentation Engineering
CS 2750: Machine Learning Line Fitting + Bias-Variance Trade-off
Modelling data and curve fitting
Biointelligence Laboratory, Seoul National University
Simple Linear Regression
Basis Expansions and Generalized Additive Models (2)
Basis Expansions and Generalized Additive Models (1)
Chapter 11 Variable Selection Procedures
Presentation transcript:

Model Selection and the Bias–Variance Tradeoff All models described have a smoothing or complexity parameter that has to be considered: multiplier of the penalty term width of the kernel number of basis functions.

Model Selection and the Bias–Variance Tradeoff Smoothing spline parameter λ indexes models ranging from a straight line fit to the interpolating model local degree m polynomial model ranges between a degree-m global polynomial when window size is infinitely large, to an interpolating fit when the window size shrinks to zero. RSS can not be used on the training data to determine these parameters would pick parameters that gave interpolating fits and hence zero residuals unlikely to predict future data well at all.

Model Selection and the Bias–Variance Tradeoff k-nearest-neighbor regression fit fˆ k (x0) illustrates the competing forces that affect the predictive ability of such approximations Suppose the data arise from a model Y = f(X) + ε, with E(ε) = 0 and Var(ε) = σ2. For simplicity assume that the values of xi in the sample are fixed in advance (nonrandom). EPE at x0, also known as test or generalization error, can be decomposed:

subscripts in parentheses (l) indicate the sequence of nearest neighbors to x0 3 terms in this expression 1 st σ2 : the irreducible error variance of the new test target beyond our control, even if we know the true f(x0) Model Selection and the Bias–Variance Tradeoff

2 nd &3 rd terms: under control and make up the mean squared error of fˆ k (x0) in estimating f(x0) 2 bias term squared difference between the true mean f(x0) and the expected value of the estimate [ET (fˆ k(x0)) − f(x0)] 2 — where the expectation averages the randomness in the training data most likely increase with k, if the true function is reasonably smooth Model Selection and the Bias–Variance Tradeoff

2 nd : bias term For small k the few closest neighbors values f(x()) close to f(x0 average should be close to f(x0 k grows, the neighbors are further away, and then anything can happen. Model Selection and the Bias–Variance Tradeoff

3 rd :variance term variance of an average decreases as the inverse of k. So as k varies, there is a bias– variance tradeoff. In general, as the model complexity of our procedure increased variance tends to increase squared bias tends to decrease opposite behavior occurs as the model complexity is decreased For k-nearest neighbors, model complexity is controlled by k.

Typically, choose model complexity to trade bias off with variance in such a way as to minimize the test error obvious estimate of test error is the training error N 1 i(yi − y ˆi) 2 but training error is not a good estimate of test error, as it does not properly account for model complexity Model Selection and the Bias–Variance Tradeoff

training error tends to decrease whenever we increase the model complexity too much fitting, the model adapts itself too closely to the training data, and will not generalize well (error) model is not complex enough underfit and may have large bias, again resulting in poor generalization

Linear Methods for Regression linear regression model assumes that the regression function E(Y |X) is linear in the inputs X1,..., Xp. simple and often provide an adequate and interpretable description of how the inputs affect the output. For prediction purposes sometimes outperform fancier nonlinear models, especially in situations with small numbers of training cases, low signal-to- noise ratio or sparse data. linear methods applied to transformations of the inputs and this considerably expands their scope. generalizations are sometimes called basis-function methods

Linear Regression Models and Least Squares linear model either assumes that the regression function E(Y |X) is linear, or that the linear model is a reasonable approximation βj’s: unknown parameters or coefficients

Linear Regression Models and Least Squares Variables Xj can come from different sources: