Presentation is loading. Please wait.

Presentation is loading. Please wait.

Model Selection and the Bias–Variance Tradeoff All models described have a smoothing or complexity parameter that has to be considered: multiplier of the.

Similar presentations


Presentation on theme: "Model Selection and the Bias–Variance Tradeoff All models described have a smoothing or complexity parameter that has to be considered: multiplier of the."— Presentation transcript:

1 Model Selection and the Bias–Variance Tradeoff All models described have a smoothing or complexity parameter that has to be considered: multiplier of the penalty term width of the kernel number of basis functions.

2 Model Selection and the Bias–Variance Tradeoff Smoothing spline parameter λ indexes models ranging from a straight line fit to the interpolating model local degree m polynomial model ranges between a degree-m global polynomial when window size is infinitely large, to an interpolating fit when the window size shrinks to zero. RSS can not be used on the training data to determine these parameters would pick parameters that gave interpolating fits and hence zero residuals unlikely to predict future data well at all.

3 Model Selection and the Bias–Variance Tradeoff k-nearest-neighbor regression fit fˆ k (x0) illustrates the competing forces that affect the predictive ability of such approximations Suppose the data arise from a model Y = f(X) + ε, with E(ε) = 0 and Var(ε) = σ2. For simplicity assume that the values of xi in the sample are fixed in advance (nonrandom). EPE at x0, also known as test or generalization error, can be decomposed:

4 subscripts in parentheses (l) indicate the sequence of nearest neighbors to x0 3 terms in this expression 1 st σ2 : the irreducible error variance of the new test target beyond our control, even if we know the true f(x0) Model Selection and the Bias–Variance Tradeoff

5 2 nd &3 rd terms: under control and make up the mean squared error of fˆ k (x0) in estimating f(x0) 2 bias term squared difference between the true mean f(x0) and the expected value of the estimate [ET (fˆ k(x0)) − f(x0)] 2 — where the expectation averages the randomness in the training data most likely increase with k, if the true function is reasonably smooth Model Selection and the Bias–Variance Tradeoff

6 2 nd : bias term For small k the few closest neighbors values f(x()) close to f(x0 average should be close to f(x0 k grows, the neighbors are further away, and then anything can happen. Model Selection and the Bias–Variance Tradeoff

7 3 rd :variance term variance of an average decreases as the inverse of k. So as k varies, there is a bias– variance tradeoff. In general, as the model complexity of our procedure increased variance tends to increase squared bias tends to decrease opposite behavior occurs as the model complexity is decreased For k-nearest neighbors, model complexity is controlled by k.

8 Typically, choose model complexity to trade bias off with variance in such a way as to minimize the test error obvious estimate of test error is the training error N 1 i(yi − y ˆi) 2 but training error is not a good estimate of test error, as it does not properly account for model complexity Model Selection and the Bias–Variance Tradeoff

9

10 training error tends to decrease whenever we increase the model complexity too much fitting, the model adapts itself too closely to the training data, and will not generalize well (error) model is not complex enough underfit and may have large bias, again resulting in poor generalization

11 Linear Methods for Regression linear regression model assumes that the regression function E(Y |X) is linear in the inputs X1,..., Xp. simple and often provide an adequate and interpretable description of how the inputs affect the output. For prediction purposes sometimes outperform fancier nonlinear models, especially in situations with small numbers of training cases, low signal-to- noise ratio or sparse data. linear methods applied to transformations of the inputs and this considerably expands their scope. generalizations are sometimes called basis-function methods

12 Linear Regression Models and Least Squares linear model either assumes that the regression function E(Y |X) is linear, or that the linear model is a reasonable approximation βj’s: unknown parameters or coefficients

13 Linear Regression Models and Least Squares Variables Xj can come from different sources:

14


Download ppt "Model Selection and the Bias–Variance Tradeoff All models described have a smoothing or complexity parameter that has to be considered: multiplier of the."

Similar presentations


Ads by Google