Presentation is loading. Please wait.

Presentation is loading. Please wait.

STA 286 week 131 Inference for the Regression Coefficient Recall, b 0 and b 1 are the estimates of the slope β 1 and intercept β 0 of population regression.

Similar presentations


Presentation on theme: "STA 286 week 131 Inference for the Regression Coefficient Recall, b 0 and b 1 are the estimates of the slope β 1 and intercept β 0 of population regression."— Presentation transcript:

1 STA 286 week 131 Inference for the Regression Coefficient Recall, b 0 and b 1 are the estimates of the slope β 1 and intercept β 0 of population regression line. We can shows that b 0 and b 1 are the unbiased estimates of β 1 and β 0 and furthermore that b 0 and b 1 are Normally distributed with means β 1 and β 0 and standard deviation that can be estimated from the data. We use the above facts to obtain confidence intervals and conduct hypothesis testing about β 1 and β 0.

2 STA 286 week 132 CI for Regression Slope and Intercept A level 100(1-α)% confidence interval for the intercept β 0 is where the standard error of the intercept is A level 100(1-α)% confidence interval for the slope β 1 is where the standard error of the slope is Example ….

3 STA 286 week 133 Significance Tests for Regression Slope To test the null hypothesis H 0 : β 1 = 0 we compute the test statistic The above test statistic has a t distribution with n-2 degrees of freedom. We can use this distribution to obtain the P-value for the various possible alternative hypotheses. Note: testing the null hypothesis H 0 : β 1 = 0 is equivalent to testing the null hypothesis H 0 : ρ = 0 where ρ is the population correlation.

4 STA 286 week 134 Example Refer to the heart rate and oxygen example….

5 STA 286 week 135 Confidence Interval for the Mean Response For any specific value of x, say x 0, the mean of the response y in this subpopulation is given by: μ y = β 0 + β 1 x 0. We can estimate this mean from the sample by substituting the least- square estimates of β 0 and β 1 : A 100(1-α)% level confidence interval for the mean response μ y when x takes the value x 0 is where the standard error of is

6 STA 286 week 136 Example Data on the wages and length of service (LOS) in months for 60 women who work in Indiana banks. We are interested to know how LOS relates to wages. The Minitab output and commands are given in a separate file.

7 STA 286 week 137 Prediction Interval The predicted response y for an individual case with a specific value x 0 of the explanatory variable x is: A useful prediction should include a margin of error to indicate its accuracy. The interval used to predict a future observation is called a prediction interval. A 100(1-α)% level prediction interval for a future observation on the response variable y from the subpopulation corresponding to x 0 is where the standard error of is

8 STA 286 week 138 Example Calculate a 95% PI for the wage of an employee with 3 years experience (i.e. LOS=36). Calculate a 90% PI for the wage of an employee with 3 years experience (i.e. LOS=36).

9 STA 286 week 139 Analysis of Variance for Regression Analysis of variance, ANOVA, is essential for multiple regression and for comparing several means. ANOVA summarizes information about the sources of variation in the data. It is based on the framework of DATA = FIT + RESIDUAL. The total variation in the response y is expressed by the deviations The overall deviation of any y observation from the mean of the y’s can be split into two main sources of variation and expressed as

10 STA 286 week 1310 Sum of Squares Sum of squares (SS) represent variation presented in the responses. They are calculated by summing squatted deviations. Analysis of variance partition the total variation between two sources. The total variation in the data is expressed as SST = SSM + SSE. SST stands for sum of squares for total it is given by... SSM stands for sum of squares for model it is given by... SSE stands for sum of squares for errors it is given by... Each of the above SS has degrees of freedom associated with it. The degrees of freedom are…

11 STA 286 week 1311 Coefficient of Determination R 2 The coefficient of variation R 2 is the fraction of variation in the values of y that is explained by the least-squares regression. The SS make this interpretation precise. We can show that This equation is the precise statement of the fact that R 2 is the fraction of variation in y explained by x.

12 STA 286 week 1312 Mean Square For each source, the ratio of the SS to the degrees of freedom is called the mean square (MS). To calculate mean squares, use the formula

13 STA 286 week 1313 ANOVA Table and F Test In the simple linear regression model, the hypotheses H 0 : β 1 = 0 vr H 1 : β 1 ≠ 0 are tested by the F statistic. The F statistic is given by The F statistic has an F(1, n-2) distribution which we can use to find the P-value. Example…

14 STA 286 week 1314 Residual Analysis We will use residuals for examining the following six types of departures from the model.  The regression is nonlinear  The error terms do not have constant variance  The error terms are not independent  The model fits but some outliers  The error terms are not normally distributed  One or more important variables have been omitted from the model

15 STA 286 week 1315 Residual plots We will use residual plots to examine the aforementioned types of departures. The plots that we will use are:  Residuals versus the fitted values  Residuals versus time (when the data are obtained in a time sequence) or other variables  Normal probability plot of the residuals  Histogram, Stemplots and boxplots of residuals

16 STA 286 week 1316 Example Below are the residual plots from the model predicting GPA based on SAT scores….


Download ppt "STA 286 week 131 Inference for the Regression Coefficient Recall, b 0 and b 1 are the estimates of the slope β 1 and intercept β 0 of population regression."

Similar presentations


Ads by Google