Presentation is loading. Please wait.

Presentation is loading. Please wait.

STA291 Statistical Methods Lecture 27. Inference for Regression.

Similar presentations


Presentation on theme: "STA291 Statistical Methods Lecture 27. Inference for Regression."— Presentation transcript:

1 STA291 Statistical Methods Lecture 27

2 Inference for Regression

3 The Population and the Sample The movie budget sample is based on 120 observations. But we know observations vary from sample to sample. So we imagine a true line that summarizes the relationship between x and y for the entire population, Where µ y is the population mean of y at a given value of x. We write µ y instead of y because the regression line assumes that the means of the y values for each value of x fall exactly on the line.

4 For a given value x :  Most, if not all, of the y values obtained from a particular sample will not lie on the line.  The sampled y values will be distributed about µ y.  We can account for the difference between ŷ and µ y by adding the error residual, or ε : The Population and the Sample

5 Regression Inference  Collect a sample and estimate the population β’s by finding a regression line (Chapter 6):  The residuals e = y – ŷ are the sample-based versions of ε.  Account for the uncertainties in β 0 and β 1 by making confidence intervals, as we’ve done for means and proportions. The Population and the Sample

6 Assumptions and Conditions In this order: 1.Linearity Assumption 2.Independence Assumption 3.Equal Variance Assumption 4.Normal Population Assumption

7 Summary of Assumptions and Conditions Assumptions and Conditions

8 Summary of Assumptions and Conditions 1.Make a scatterplot of the data to check for linearity. (Linearity Assumption) 2.Fit a regression and find the residuals, e, and predicted values ŷ. 3.Plot the residuals against time (if appropriate) and check for evidence of patterns (Independence Assumption). 4.Make a scatterplot of the residuals against x or the predicted values. This plot should not exhibit a “fan” or “cone” shape. (Equal Variance Assumption) 5.Make a histogram and Normal probability plot of the residuals (Normal Population Assumption) Assumptions and Conditions

9 The Standard Error of the Slope

10 Which of these scatterplots would give the more consistent regression slope estimate if we were to sample repeatedly from the underlying population? Hint: Compare s e ’s. The Standard Error of the Slope

11 Which of these scatterplots would give the more consistent regression slope estimate if we were to sample repeatedly from the underlying population? Hint: Compare s x ’s. The Standard Error of the Slope

12 Which of these scatterplots would give the more consistent regression slope estimate if we were to sample repeatedly from the underlying population? Hint: Compare n’s. The Standard Error of the Slope

13 A Test for the Regression Slope When the conditions are met, the standardized estimated regression slope, Follows a t-distribution with df = n – 2. We estimate SE ( b 1 ) with: Where s x is the ordinary standard deviation of the x ’s and

14 The usual null hypothesis about the slope is that it’s equal to 0. Why? A slope of zero says that y doesn’t tend to change linearly when x changes. In other words, if the slope equals zero, there is no linear association between the two variables. H 0 : β 1 = 0. This would mean that x and y are not linearly related. H a : β 1 ≠ 0. This would mean... A Test for the Regression Slope

15 CI for the Regression Slope When the assumptions and conditions are met, we can find a confidence interval for  1 from Where the critical value t* depends on the confidence level and has df = n – 2.

16 16.4 A Test for the Regression Slope Example : Soap A soap manufacturer tested a standard bar of soap to see how long it would last. A test subject showered with the soap each day for 15 days and recorded the weight (in grams) remaining. Conditions were met so a linear regression gave the following: Dependent variable is: Weight R squared = 99.5%s = 2.949 Variable Coefficient SE(Coeff) t-ratio P-value Intercept 123.141 1.382 89.1 <0.0001 Day -5.57476 0.1068 -52.2 <0.0001 What is the standard deviation of the residuals? What is the standard error of b 1 ? What are the hypotheses for the regression slope? At α = 0.05, what is the conclusion?

17 16.4 A Test for the Regression Slope Example : Soap A soap manufacturer tested a standard bar of soap to see how long it would last. A test subject showered with the soap each day for 15 days and recorded the weight (in grams) remaining. Conditions were met so a linear regression gave the following: Dependent variable is: Weight R squared = 99.5%s = 2.949 Variable Coefficient SE(Coeff) t-ratio P-value Intercept 123.141 1.382 89.1 <0.0001 Day -5.57476 0.1068 -52.2 <0.0001 What is the standard deviation of the residuals? s e = 2.949 What is the standard error of ? SE( ) = 0.0168

18 16.4 A Test for the Regression Slope Example : Soap A soap manufacturer tested a standard bar of soap to see how long it would last. A test subject showered with the soap each day for 15 days and recorded the weight (in grams) remaining. Conditions were met so a linear regression gave the following: Dependent variable is: Weight R squared = 99.5%s = 2.949 Variable Coefficient SE(Coeff) t-ratio P-value Intercept 123.141 1.382 89.1 <0.0001 Day -5.57476 0.1068 -52.2 <0.0001 What are the hypotheses for the regression slope? At α = 0.05, what is the conclusion? Since the p-value is small (<0.0001), reject the null hypothesis. There is strong evidence of a linear relationship between Weight and Day.

19 16.4 A Test for the Regression Slope Example : Soap A soap manufacturer tested a standard bar of soap to see how long it would last. A test subject showered with the soap each day for 15 days and recorded the weight (in grams) remaining. Conditions were met so a linear regression gave the following: Dependent variable is: Weight R squared = 99.5%s = 2.949 Variable Coefficient SE(Coeff) t-ratio P-value Intercept 123.141 1.382 89.1 <0.0001 Day -5.57476 0.1068 -52.2 <0.0001 Find a 95% confidence interval for the slope? Interpret the 95% confidence interval for the slope? At α = 0.05, is the confidence interval consistent with the hypothesis test conclusion?

20 16.4 A Test for the Regression Slope Example : Soap A soap manufacturer tested a standard bar of soap to see how long it would last. A test subject showered with the soap each day for 15 days and recorded the weight (in grams) remaining. Conditions were met so a linear regression gave the following: Dependent variable is: Weight R squared = 99.5%s = 2.949 Variable Coefficient SE(Coeff) t-ratio P-value Intercept 123.141 1.382 89.1 <0.0001 Day -5.57476 0.1068 -52.2 <0.0001 Find a 95% confidence interval for the slope? Interpret the 95% confidence interval for the slope? We can be 95% confident that weight of soap decreases by between 5.34 and 5.8 grams per day. At α = 0.05, is the confidence interval consistent with the hypothesis test conclusion? Yes, the interval does not contain zero, so reject the null hypothesis.

21  Don’t fit a linear regression to data that aren’t straight.  Watch out for changing spread.  Watch out for non-Normal errors. Check the histogram and the Normal probability plot.  Watch out for extrapolation. It is always dangerous to predict for x-values that lie far away from the center of the data.

22  Watch out for high-influence points and unusual observations.  Watch out for one-tailed tests. Most software packages perform only two-tailed tests. Adjust your P-values accordingly.

23 Looking back o Know the Assumptions and conditions for inference about regression coefficients and how to check them, in this order: LIEN o Know the components of the standard error of the slope coefficient o Test statistic o CI Interpretation


Download ppt "STA291 Statistical Methods Lecture 27. Inference for Regression."

Similar presentations


Ads by Google