Presentation is loading. Please wait.

Presentation is loading. Please wait.

Objectives 10.1Simple linear regression  Statistical model for linear regression  Estimating the regression parameters  Confidence interval for regression.

Similar presentations


Presentation on theme: "Objectives 10.1Simple linear regression  Statistical model for linear regression  Estimating the regression parameters  Confidence interval for regression."— Presentation transcript:

1 Objectives 10.1Simple linear regression  Statistical model for linear regression  Estimating the regression parameters  Confidence interval for regression parameters  Significance test for the slope  Confidence interval for µ y  Prediction intervals

2 Statistical model for linear regression  In the population, the linear regression equation is y =  0 +  1 x + e, where e is the random deviation (or error) of the response variable from the prediction formula.  Usually, we assume that e has Normal(0, σ ) distribution.   0 (y-intercept) and  1 (slope) are the parameters.  Statistical inference is conducted to draw conclusions about the parameters.  Confidence interval and hypothesis test for  1. We especially want to test whether the slope equals zero.  Confidence interval for  0 +  1 x, given a value for x.  Prediction interval for a random y, given a value for x.

3 Estimating the parameters  The population linear regression equation is y =  0 +  1 x + e.  The sample fitted regression line is ŷ = b 0 + b 1 x.  b 0 is the estimate for the intercept  0 and  b 1 is the estimate for the slope  1.  We also estimate σ ( the standard deviation of e), using  s e is a measure of the typical size of a residual y − ŷ.  We will use s e to compute the standard errors we need.

4 Confidence interval for the slope parameter  Before we do inference for the slope parameter  1, we need the standard error for the estimate b 1 :  We use the t distribution, now with n – 2 degrees of freedom.  A level C confidence interval for the slope,  1, is  t* is the table value for the t(n – 2) distribution with area C between −t* and t*.  “Confidence” has the same interpretation as always.

5 Significance test for the slope parameter We can test the hypothesis H 0 :  1 = m versus either a 1-sided or a 2-sided alternative, using a t-statistic. (The primary case is with m = 0.) We calculate and use the t(n – 2) distribution to find the P-value of the test. Note: Software typically provides two-sided p-values.

6 Relationship between ozone and carbon pollutants In StatCrunch: Stat-Regression-Simple Linear; choose Hypothesis Test To test H 0 :  1 = 0 with α = 0.05, we compute From the t-table, using df = 28 − 2 = 26, we can see that the P-value is less than Since it is very small we reject H 0 and conclude the slope is not zero. sese df = n − 2

7 Relationship between ozone and carbon pollutants In StatCrunch: Stat-Regression-Simple Linear; choose Confidence Interval Having decided that the slope is not zero, we next estimate it with a 95% confidence interval:

8 Confidence interval for  0 +  1 x We can also calculate a confidence interval for the regression line itself, at any choice x. Generally this is sensible as long as x is within the range of data observed (interpolation). Extrapolation should only be done with a great deal of caution. The interval is centered on ŷ = b 0 + b 1 x, but we need a standard error for this particular estimate. The confidence interval is then calculated in the usual fashion: This is an estimate of the point on the line (the expected value of y) for the given value of x.

9 Prediction interval for a new obs. y It often is of greater interest to predict what the actual y value might be (not just what it is expected to be). Such a prediction interval for an actual (new) observation y, must necessarily account for both the estimation of the line and the random deviation e away from that line. The interval is again centered on ŷ = b 0 + b 1 x, but now we also account for the random deviation. The prediction interval for the actual y, with given value for x, is The distinction between a confidence interval and a prediction interval is whether you want to capture the expected value of y or the actual value of y.

10 Prediction intervals  Unlike confidence intervals, the size of the prediction interval does not get narrower as you increase the sample size. This is because:  The confidence interval is estimating a parameter, such as the mean, the slope, the slope equation. For example, if I am interesting in the mean grade of all people taking midterm 3 who scored 10 on midterm 2, the CI will get narrower as the sample size grows (because the estimators tend to get better for large sample size).  The prediction interval is completely different. Here we are trying to predict the grade of a randomly selected person who scored 10 on midterm 2. There will be a lot of variability, and it does not improve as we increase the sample size: very individual is different (it is like predicting the weight of someone who is 6 foot tall, even if we know what the average weight of a 6 footer is, there is a huge variation in this group, thus the prediction interval must be wide for us to be able to capture the height).  This is a fundamental difference between predicting the measurement of an individual and estimating the mean. The mean estimator will get better with sample size, the individual won’t.

11 Efficiency of a biofilter, by temperature For a 95% confidence interval of the expected ozone level, with temperature = 16, we compute For a 95% prediction interval of the actual ozone level, with temperature = 16, we compute In StatCrunch: Stat-Regression-Simple Linear; choose Predict Y for X


Download ppt "Objectives 10.1Simple linear regression  Statistical model for linear regression  Estimating the regression parameters  Confidence interval for regression."

Similar presentations


Ads by Google