Bias-Variance Analysis in Regression  True function is y = f(x) +  where  is normally distributed with zero mean and standard deviation .  Given a.

Slides:



Advertisements
Similar presentations
Review bootstrap and permutation
Advertisements

Managerial Economics in a Global Economy
Uncertainty in fall time surrogate Prediction variance vs. data sensitivity – Non-uniform noise – Example Uncertainty in fall time data Bootstrapping.
Pattern Recognition and Machine Learning
Ensemble Methods An ensemble method constructs a set of base classifiers from the training data Ensemble or Classifier Combination Predict class label.
CmpE 104 SOFTWARE STATISTICAL TOOLS & METHODS MEASURING & ESTIMATING SOFTWARE SIZE AND RESOURCE & SCHEDULE ESTIMATING.
Sampling: Final and Initial Sample Size Determination
Estimation  Samples are collected to estimate characteristics of the population of particular interest. Parameter – numerical characteristic of the population.
6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
Review of the Basic Logic of NHST Significance tests are used to accept or reject the null hypothesis. This is done by studying the sampling distribution.
Model Assessment and Selection
Model Assessment, Selection and Averaging
Objectives (BPS chapter 24)
Data mining in 1D: curve fitting
MCS Multiple Classifier Systems, Cagliari 9-11 June Giorgio Valentini Random aggregated and bagged ensembles.
Chapter 10 Simple Regression.
Point estimation, interval estimation
Chapter 6 Introduction to Sampling Distributions
Curve-Fitting Regression
Fall 2006 – Fundamentals of Business Statistics 1 Chapter 6 Introduction to Sampling Distributions.
Evaluating Hypotheses
Lehrstuhl für Informatik 2 Gabriella Kókai: Maschine Learning 1 Evaluating Hypotheses.
Thomas G. Dietterich Department of Computer Science Oregon State University Corvallis, Oregon Bias-Variance Analysis.
Lecture 19 Simple linear regression (Review, 18.5, 18.8)
QUIZ CHAPTER Seven Psy302 Quantitative Methods. 1. A distribution of all sample means or sample variances that could be obtained in samples of a given.
Standard error of estimate & Confidence interval.
Bootstrap spatobotp ttaoospbr Hesterberger & Moore, chapter 16 1.
Correlation & Regression
1 Machine Learning: Lecture 5 Experimental Evaluation of Learning Algorithms (Based on Chapter 5 of Mitchell T.., Machine Learning, 1997)
Copyright © Cengage Learning. All rights reserved. 13 Linear Correlation and Regression Analysis.
Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Chapter Inference on the Least-Squares Regression Model and Multiple Regression 14.
Today’s lesson Confidence intervals for the expected value of a random variable. Determining the sample size needed to have a specified probability of.
Bootstrapping (And other statistical trickery). Reminder Of What We Do In Statistics Null Hypothesis Statistical Test Logic – Assume that the “no effect”
Inference for Regression Simple Linear Regression IPS Chapter 10.1 © 2009 W.H. Freeman and Company.
MGS3100_04.ppt/Sep 29, 2015/Page 1 Georgia State University - Confidential MGS 3100 Business Analysis Regression Sep 29 and 30, 2015.
1 Multiple Regression A single numerical response variable, Y. Multiple numerical explanatory variables, X 1, X 2,…, X k.
Chapter 7 Sampling and Sampling Distributions ©. Simple Random Sample simple random sample Suppose that we want to select a sample of n objects from a.
ELEC 303 – Random Signals Lecture 18 – Classical Statistical Inference, Dr. Farinaz Koushanfar ECE Dept., Rice University Nov 4, 2010.
Limits to Statistical Theory Bootstrap analysis ESM April 2006.
Business Statistics: A Decision-Making Approach, 6e © 2005 Prentice-Hall, Inc. Chap 13-1 Introduction to Regression Analysis Regression analysis is used.
CpSc 881: Machine Learning Evaluating Hypotheses.
INTRODUCTION TO Machine Learning 3rd Edition
Chapter 7 Point Estimation of Parameters. Learning Objectives Explain the general concepts of estimating Explain important properties of point estimators.
Bias and Variance of the Estimator PRML 3.2 Ethem Chp. 4.
Machine Learning Chapter 5. Evaluating Hypotheses
1 CSI5388 Current Approaches to Evaluation (Based on Chapter 5 of Mitchell T.., Machine Learning, 1997)
Confidence Interval & Unbiased Estimator Review and Foreword.
Review of fundamental 1 Data mining in 1D: curve fitting by LLS Approximation-generalization tradeoff First homework assignment.
Bias-Variance in Machine Learning. Bias-Variance: Outline Underfitting/overfitting: –Why are complex hypotheses bad? Simple example of bias/variance Error.
Bias and Variance of the Estimator PRML 3.2 Ethem Chp. 4.
Statistical Inference Statistical inference is concerned with the use of sample data to make inferences about unknown population parameters. For example,
Machine Learning 5. Parametric Methods.
Regression. We have talked about regression problems before, as the problem of estimating the mapping f(x) between an independent variable x and a dependent.
1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.
Regression Variance-Bias Trade-off. Regression We need a regression function h(x) We need a loss function L(h(x),y) We have a true distribution p(x,y)
Ka-fu Wong © 2007 ECON1003: Analysis of Economic Data Lesson0-1 Supplement 2: Comparing the two estimators of population variance by simulations.
CSE 446: Ensemble Learning Winter 2012 Daniel Weld Slides adapted from Tom Dietterich, Luke Zettlemoyer, Carlos Guestrin, Nick Kushmerick, Padraig Cunningham.
Evaluating Hypotheses. Outline Empirically evaluating the accuracy of hypotheses is fundamental to machine learning – How well does this estimate its.
Bootstrapping James G. Anderson, Ph.D. Purdue University.
1 Ka-fu Wong University of Hong Kong A Brief Review of Probability, Statistics, and Regression for Forecasting.
Week 21 Statistical Model A statistical model for some data is a set of distributions, one of which corresponds to the true unknown distribution that produced.
Statistics 350 Lecture 2. Today Last Day: Section Today: Section 1.6 Homework #1: Chapter 1 Problems (page 33-38): 2, 5, 6, 7, 22, 26, 33, 34,
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Sampling and Sampling Distributions
Chapter 4. Inference about Process Quality
Ch3: Model Building through Regression
Bias and Variance of the Estimator
Regression Computer Print Out
Sampling Distribution
Sampling Distribution
Presentation transcript:

Bias-Variance Analysis in Regression  True function is y = f(x) +  where  is normally distributed with zero mean and standard deviation .  Given a set of training examples, {(x i, y i )}, we fit an hypothesis h(x) = w ¢ x + b to the data to minimize the squared error  i [y i – h(x i )] 2

Example: 20 points y = x + 2 sin(1.5x) + N(0,0.2)

50 fits (20 examples each)

Bias-Variance Analysis  Now, given a new data point x* (with observed value y* = f(x*) +  ), we would like to understand the expected prediction error E[ (y* – h(x*)) 2 ]

Classical Statistical Analysis  Imagine that our particular training sample S is drawn from some population of possible training samples according to P(S).  Compute E P [ (y* – h(x*)) 2 ]  Decompose this into “bias”, “variance”, and “noise”

Lemma  Let Z be a random variable with probability distribution P(Z)  Let Z = E P [ Z ] be the average value of Z.  Lemma: E[ (Z – Z) 2 ] = E[Z 2 ] – Z 2 E[ (Z – Z) 2 ] = E[ Z 2 – 2 Z Z + Z 2 ] = E[Z 2 ] – 2 E[Z] Z + Z 2 = E[Z 2 ] – 2 Z 2 + Z 2 = E[Z 2 ] – Z 2  Corollary: E[Z 2 ] = E[ (Z – Z) 2 ] + Z 2

Bias-Variance-Noise Decomposition E[ (h(x*) – y*) 2 ] = E[ h(x*) 2 – 2 h(x*) y* + y* 2 ] = E[ h(x*) 2 ] – 2 E[ h(x*) ] E[y*] + E[y* 2 ] = E[ (h(x*) – h(x*)) 2 ] + h(x*) 2 (lemma) – 2 h(x*) f(x*) + E[ (y* – f(x*)) 2 ] + f(x*) 2 (lemma) = E[ (h(x*) – h(x*)) 2 ] + [variance] (h(x*) – f(x*)) 2 + [bias 2 ] E[ (y* – f(x*)) 2 ] [noise]

Derivation (continued) E[ (h(x*) – y*) 2 ] = = E[ (h(x*) – h(x*)) 2 ] + (h(x*) – f(x*)) 2 + E[ (y* – f(x*)) 2 ] = Var(h(x*)) + Bias(h(x*)) 2 + E[  2 ] = Var(h(x*)) + Bias(h(x*)) 2 +  2 Expected prediction error = Variance + Bias 2 + Noise 2

Bias, Variance, and Noise  Variance: E[ (h(x*) – h(x*)) 2 ] Describes how much h(x*) varies from one training set S to another  Bias: [h(x*) – f(x*)] Describes the average error of h(x*).  Noise: E[ (y* – f(x*)) 2 ] = E[  2 ] =  2 Describes how much y* varies from f(x*)

50 fits (20 examples each)

Bias

Variance

Noise

50 fits (20 examples each)

Distribution of predictions at x=2.0

50 fits (20 examples each)

Distribution of predictions at x=5.0

Measuring Bias and Variance  In practice (unlike in theory), we have only ONE training set S.  We can simulate multiple training sets by bootstrap replicates S’ = {x | x is drawn at random with replacement from S} and |S’| = |S|.

Procedure for Measuring Bias and Variance  Construct B bootstrap replicates of S (e.g., B = 200): S 1, …, S B  Apply learning algorithm to each replicate S b to obtain hypothesis h b  Let T b = S \ S b be the data points that do not appear in S b (out of bag points)  Compute predicted value h b (x) for each x in T b

Estimating Bias and Variance (continued)  For each data point x, we will now have the observed corresponding value y and several predictions y 1, …, y K.  Compute the average prediction h.  Estimate bias as (h – y)  Estimate variance as  k (y k – h) 2 /(K – 1)  Assume noise is 0

Approximations in this Procedure  Bootstrap replicates are not real data  We ignore the noise If we have multiple data points with the same x value, then we can estimate the noise We can also estimate noise by pooling y values from nearby x values