Presentation is loading. Please wait.

Presentation is loading. Please wait.

Regression. Correlation measures the strength of the linear relationship Great! But what is that relationship? How do we describe it? –regression, regression.

Similar presentations


Presentation on theme: "Regression. Correlation measures the strength of the linear relationship Great! But what is that relationship? How do we describe it? –regression, regression."— Presentation transcript:

1 Regression

2 Correlation measures the strength of the linear relationship Great! But what is that relationship? How do we describe it? –regression, regression line, regression equation Regression line is used for prediction

3 Predicting weights from heights Independent variable: height Dependent variable: weight How can we predict one from the other ? Regression is to a scatter plot as the mean is to a histogram.

4 Weights vs. Heights

5 Salary by years employed

6 Regression by local averages Approximation of Local averages by regression line Inappropriate use of regression line (use other methods)

7 The equation of a line a represents the y-intercept –when x equals zero, y equals a –Is this always meaningful in the context of a problem? –Is it always useful in defining a line? b represents the slope of the line (rise/run) –for every unit change in x, y changes by b. –Does this mean that if we physically change x by one unit, y will change by b units? Say we gain another year of experience. Will our salary go up by 1107?

8 Regression equation What is the predicted weight of somebody whose height is h cm ? w = intercept + slope x h This is known as the regression equation. How do we get this formula ? We have a statistical model

9 A residual Regression line by minimising residual errors  i = error of i-th obs from regression line The best candidate line will minimise these errors No line can make all errors vanish (some +ve, some –ve)

10 Regression and correlation Want to predict weight for those people who are 1 SD more than avg. height. SD line says: pred. wt. = overall avg. wt. + SD of wt. Regression line says: Predicted wt. = overall avg. wt. + r x SD of wt. For people who are k SDs away from avg. height: Predicted wt. = overall avg. wt. + r x k SD of wt. Clearly valid for r  0 or r  1

11 RMS error of regression RMS error = SD of y RMS inversely related to correlation RMS error is to regression what SD is to average

12 Residuals residual = observed -predicted

13 Example: ozone vs. temperature > air[,c(1,3)] ozone temperature 3.45 67 3.30 72 2.29 74 2.62 62 2.84 65... > cor(ozone,temperature) [1] 0.7531038

14 Fitting a regression model in S > ozone.lm <- lm(ozone ~ temperature, data = air) Coefficients:. Value Std. Error tvalue Pr(>|t|) (Intercept) -2.23 0.46 -4.82 0.0000 temperature 0.07 0.01 11.95 0.0000 Multiple R-Squared: 0.5672 > var(ozone) [1] 0.7928069 > var(resid(ozone.lm)) [1] 0.3431544 > cor(ozone,temperature) [1] 0.7531038

15 Checking model appropriateness What assumptions have we made in the regression model ? Checking model assumptions in S-plus > par(mfrow=c(2,3)) > plot(ozone.lm)

16 Residual diagnostics for ozone data

17 Pizza party at the Frat. How many laps would you predict a pledge could run if he ate 6 slices of pizza? How many laps if he ate 9 slices of pizza? A pledge shows off and eats 35 slices of pizza. How many laps would you predict he would run? Beware of extrapolation


Download ppt "Regression. Correlation measures the strength of the linear relationship Great! But what is that relationship? How do we describe it? –regression, regression."

Similar presentations


Ads by Google