Presentation is loading. Please wait.

Presentation is loading. Please wait.

Inference for Regression

Similar presentations


Presentation on theme: "Inference for Regression"β€” Presentation transcript:

1 Inference for Regression
Chapter 27 Inference for Regression

2 Objectives Review of Regression coefficients Hypothesis Tests
Confidence Intervals

3 Inference for Regression
In the last few topics we explored inference for proportions and means. The same principles used for these topics will be applied in our study of inference for regression. Our inference for regression will focus on the slope (𝛽) of the least squares regression equation. There are two forms of inference dealing with the slope, 𝛽. A t-Test for slope 𝛽 A Confidence Interval for the slope 𝛽

4 Two Continuous Variables
Visually summarize the relationship between two continuous variables with a scatterplot Numerically, we focus on LSRL (regression) Education and Mortality Draft Order and Birthday Mortality = Β· Education Draft Order = Β· Birthday

5 Regression Parameters
When we fit a LSRL, 𝑦 =π‘Ž+𝑏π‘₯, to a data set of two quantitative variables, we are hoping to be able to predict values of the response variable 𝑦 for given values of the explanatory variable π‘₯ . Different samples produce different estimates. The mean, πœ‡ 𝑦 , of all of the possible responses has a linear relationship that represents the true regression line where πœ‡ 𝑦 =𝛼+𝛽π‘₯.

6 Regression Parameters
The parameters 𝛽 and 𝛼 are estimated by a and b of the regression equation. 𝑏=π‘Ÿ 𝑠 𝑦 𝑠 π‘₯ Estimate of slope  Estimate of intercept  π‘Ž= 𝑦 βˆ’π‘ π‘₯

7 Degrees of Freedom True regression line: πœ‡ 𝑦 =𝛼+𝛽π‘₯
The parameters 𝛼 and 𝛽 are estimated by a and b of the regression equation, 𝑦 =π‘Ž+𝑏π‘₯, this equation having been derived from sample data. The appropriate test is a t-test for slope. Since there are two estimates in our equation, the degrees of freedom = n – 2.

8 Hypothesis Test for the Slope 𝛽 of a LSRL

9 Significance of Regression Line
Does the regression line show a significant linear relationship between the two variables? If there is not a linear relationship, then we would expect zero correlation (r = 0) So the slope b should also be zero Therefore, our test for a significant relationship will focus on testing whether our slope  is significantly different from zero H0 :  = versus Ha :  ο‚Ή 0  > 0  < 0

10 Standard Error The slope, b, varies with the sample taken.
The sampling distribution model for the regression slopes is centered at the slope of the true regression line, 𝛽. We can estimate the standard error of the regression slope b: You won’t have to calculate SEb. Usually, it will be provided as part of a computer printout. s is the standard error about the line.

11 Standard Error The standard error about the line (s) tells us how our sample data vary from our predicted values. The standard error of the slope (SEb) tells us how our regression slope might vary with different samples. We use s in the calculation of SEb. The variability of our data affects the reliability of our slope.

12 Test Statistic The test statistic, tdf, is calculated in the same way as the test statistic for means and proportions.

13 Example of Computer Output for Regression Analysis
From the printout:

14 Assumptions and Conditions
In Chapter 8 when we fit lines to data, we needed to check only the Straight Enough Condition. Now, when we want to make inferences about the coefficients of the line, we’ll have to make more assumptions (and thus check more conditions). We need to be careful about the order in which we check conditions. If an initial assumption is not true, it makes no sense to check the later ones.

15 Assumptions and Conditions (cont.)
Linearity Assumption: Straight Enough Condition: Check the scatterplotβ€”the shape must be linear or we can’t use regression at all.

16 Assumptions and Conditions (cont.)
Linearity Assumption: If the scatterplot is straight enough, we can go on to some assumptions about the errors. If not, stop here, or consider re-expressing the data to make the scatterplot more nearly linear. Check the Quantitative Data Condition. The data must be quantitative for this to make sense.

17 Assumptions and Conditions (cont.)
Independence Assumption: Randomization Condition: the individuals are a representative sample from the population. Check the residual plot (part 1)β€”the residuals should appear to be randomly scattered.

18 Assumptions and Conditions (cont.)
Equal Variance Assumption: Does The Plot Thicken? Condition: Check the residual plot (part 2)β€”the spread of the residuals should be uniform.

19 Assumptions and Conditions (cont.)
Normal Population Assumption: Nearly Normal Condition: Check a histogram of the residuals. The distribution of the residuals should be unimodal and symmetric. Outlier Condition: Check for outliers.

20 Conditions Straight Enough Condition Quantitative Data Condition
Randomization Condition Does The Plot Thicken? Condition Nearly Normal Condition Outlier Condition

21 Assumptions and Conditions (cont.)
If all four assumptions are true, the idealized regression model would look like this: At each value of x there is a distribution of y-values that follows a Normal model, and each of these Normal models is centered on the line and has the same standard deviation.

22 Example: Draft Lottery
Is the negative linear association we see between birthday and draft order statistically significant? H0: 𝛽 = 0 There is no linear relationship between birthday and draft order. Ha: 𝛽 β‰  0 There is a linear relationship between birthday and draft order. p-value

23 Example: Draft Lottery
Conclusion: P-value = , so we reject null hypothesis and conclude that there is a statistically significant linear relationship between birthday and draft order Statistical evidence that the randomization was not done properly!

24 Example: Education Dataset of 78 seventh-graders: relationship between IQ and GPA Clear positive association between IQ and grade point average

25 Example: Education Is the positive linear association we see between GPA and IQ statistically significant? H0: 𝛽 = 0 There is no linear relationship between GPA and IQ. Ha: 𝛽 β‰  0 There is a linear relationship between GPA and IQ. p-value

26 Example: Education Conclusion: P-value = so we reject null hypothesis and conclude that there is a statistically significant positive relationship between IQ and GPA.

27 Ti - 84 Xlist: Stat/Tests/LinRegTTest Ylist: Freq:
𝛽 & Ξ±: β‰ 0 <0 >0 RegEQ: Calculate df = n – 2

28 Confidence Interval for the Slope 𝛽

29 Confidence Interval To construct a confidence interval 𝛽, the true slope of the LSRL, we use the usual format for a confidence interval: statistic Β± (critical)(standard deviation of statistic) CI for : b Β± t* (SEb) The values for b and SEb are obtained from a computer printout. The t* critical value is obtained from the t-distribution with n-2 degrees of freedom.

30 Example: Education and Mortality

31 Confidence Interval for Example
We have n = 60, so our critical value t* comes from a t distribution with d.f. = 58. For a 95% C.I., t* = 2.00 95 % confidence interval for slope  : -37.6 ± (2.0)(8.307) = (-54.2,-21.0) Conclusion: We are 95% confident that the true slope is between and Note that this interval does not contain zero!


Download ppt "Inference for Regression"

Similar presentations


Ads by Google