© 1998, Geoff Kuenning Linear Regression Models What is a (good) model? Estimating model parameters Allocating variation Confidence intervals for regressions.

Data Analysis Overview Experimental environment prototype real sys exec- driven sim trace- driven sim stochastic sim Workload parameters System Config parameters Factor levels Raw Data Samples...... Predictor values x, factor levels Samples of response... y1y2yny1y2yn Regression models response var = f (predictors)

© 1998, Geoff Kuenning What Is a (Good) Model? For correlated data, model predicts response given an input Model should be equation that fits data Standard definition of “fits” is least-squares –Minimize squared error –While keeping mean error zero –Minimizes variance of errors

© 1998, Geoff Kuenning Variants of Linear Regression Some non-linear relationships can be handled by transformations –For y = ae bx take logarithm of y, do regression on log(y) = b 0 +b 1 x, let b = b 1, –For y = a+b log(x), take log of x before fitting parameters, let b = b 1, a = b 0 –For y = ax b, take log of both x and y, let b = b 1,

© 1998, Geoff Kuenning Allocating Variation If no regression, best guess of y is Observed values of y differ from, giving rise to errors (variance) Regression gives better guess, but there are still errors We can evaluate quality of regression by allocating sources of errors

© 1998, Geoff Kuenning The Sum of Squares from Regression Recall that regression error is Error without regression is SST So regression explains SSR = SST - SSE Regression quality measured by coefficient of determination.

© 1998, Geoff Kuenning Example of Coefficient of Determination For previous regression example –  y = 11.60,  y 2 = 29.79,  xy = 88.54, –SSE = 29.79-(0.35)(11.60)-(0.29)(88.54) = 0.05 –SST = 29.79-26.9 = 2.89 –SSR = 2.89-.05 = 2.84 –R 2 = (2.89-0.05)/2.89 = 0.98

© 1998, Geoff Kuenning Standard Deviation of Errors Variance of errors is SSE divided by degrees of freedom –DOF is n  2 because we’ve calculated 2 regression parameters from the data –So variance (mean squared error, MSE) is SSE/(n  2) Standard deviation of errors is square root:

© 1998, Geoff Kuenning Checking Degrees of Freedom Degrees of freedom always equate: –SS0 has 1 (computed from ) –SST has n  1 (computed from data and, which uses up 1) –SSE has n  2 (needs 2 regression parameters) –So

© 1998, Geoff Kuenning Example of Standard Deviation of Errors For our regression example, SSE was 0.05, so MSE is 0.05/3 = 0.017 and s e = 0.13 Note high quality of our regression: –R 2 = 0.98 –s e = 0.13 –Why such a nice straight-line fit?

© 1998, Geoff Kuenning Confidence Intervals for Regressions Regression is done from a single population sample (size n) –Different sample might give different results –True model is y =  0 +  1 x –Parameters b 0 and b 1 are really means taken from a population sample

© 1998, Geoff Kuenning Confidence Intervals for Nonlinear Regressions For nonlinear fits using exponential transformations: –Confidence intervals apply to transformed parameters –Not valid to perform inverse transformation on intervals

© 1998, Geoff Kuenning Confidence Intervals for Predictions Previous confidence intervals are for parameters –How certain can we be that the parameters are correct? Purpose of regression is prediction –How accurate are the predictions? –Regression gives mean of predicted response, based on sample we took

© 1998, Geoff Kuenning Predicting m Samples Standard deviation for mean of future sample of m observations at x p is Note deviation drops as m  Variance minimal at x = Use t-quantiles with n–2 DOF for interval

© 1998, Geoff Kuenning Example of Confidence of Predictions Using previous equation, what is predicted time for a single run of 8 loops? Time = 0.35 + 0.29(8) = 2.67 Standard deviation of errors s e = 0.13 90% interval is then !

© 1998, Geoff Kuenning Verifying Assumptions Visually Regressions are based on assumptions: –Linear relationship between response y and predictor x Or nonlinear relationship used in fitting –Predictor x nonstochastic and error-free –Model errors statistically independent With distribution N(0,c) for constant c If assumptions violated, model misleading or invalid

© 1998, Geoff Kuenning Linear Regression Can Be Misleading Regression throws away some information about the data –To allow more compact summarization Sometimes vital characteristics are thrown away –Often, looking at data plots can tell you whether you will have a problem

© 1998, Geoff Kuenning Example of Misleading Regression IIIIIIIV xyxyxyxy 108.04109.14107.4686.58 86.9588.1486.7785.76 137.58138.741312.7487.71 98.8198.7797.1188.84 118.33119.26117.8188.47 149.96148.10148.8487.04 67.2466.1366.0885.25 44.2643.1045.391912.50 1210.84129.13128.1585.56 74.8277.2676.4287.91 55.6854.7455.7386.89

© 1998, Geoff Kuenning What Does Regression Tell Us About These Data Sets? Exactly the same thing for each! N = 11 Mean of y = 7.5 Y = 3 +.5 X Standard error of regression is 0.118 All the sums of squares are the same Correlation coefficient =.82 R 2 =.67

© 1998, Geoff Kuenning Linear Regression Models What is a (good) model? Estimating model parameters Allocating variation Confidence intervals for regressions.

Similar presentations

Presentation on theme: "© 1998, Geoff Kuenning Linear Regression Models What is a (good) model? Estimating model parameters Allocating variation Confidence intervals for regressions."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

© 1998, Geoff Kuenning Linear Regression Models What is a (good) model? Estimating model parameters Allocating variation Confidence intervals for regressions.

Similar presentations

Presentation on theme: "© 1998, Geoff Kuenning Linear Regression Models What is a (good) model? Estimating model parameters Allocating variation Confidence intervals for regressions."— Presentation transcript:

Similar presentations

About project

Feedback