Presentation is loading. Please wait.

Presentation is loading. Please wait.

Chapter 14 Inference on the Least-Squares Regression Model and Multiple Regression.

Similar presentations


Presentation on theme: "Chapter 14 Inference on the Least-Squares Regression Model and Multiple Regression."— Presentation transcript:

1 Chapter 14 Inference on the Least-Squares Regression Model and Multiple Regression

2 Testing the Significance of the Least-Squares Regression Model
Section 14.1 Testing the Significance of the Least-Squares Regression Model

3 Objectives State the requirements of the least-squares regression model Compute the standard error of the estimate Verify that the residuals are normally distributed Conduct inference on the slope 5. Construct a confidence interval about the slope of the least-squares regression model

4 Objective 1 State the Requirements of the Least-Squares Regression Model

5 Requirement 1 for Inference on the Least-Squares Regression Model
For any particular value of the explanatory variable x, the mean of the corresponding responses in the population depends linearly on x. That is, for some numbers 0 and 1, where y|x represents the population mean response when the value of the explanatory variable is x.

6 Requirement 2 for Inference on the Least-Squares Regression Model
The response variables are normally distributed with mean and standard deviation .

7 “In Other Words” When doing inference on the least-squares regression model, we require (1) that for any explanatory variable, x, the mean of the response variable, y, depends on the value of x through a linear equation, and (2) that the response variable, y, is normally distributed with a constant standard deviation, . The mean increases/decreases at a constant rate depending on the slope, while the variance remains constant.

8

9 The least-squares regression model is given by
where yi is the value of the response variable for the ith individual 0 and 1 are the parameters to be estimated based on sample data xi is the value of the explanatory variable for the ith individual i is a random error term with mean 0 and variance , the error terms are independent. i=1,…,n, where n is the sample size (number of ordered pairs in the data set)

10 Objective 2 Compute the Standard Error of the Estimate

11 The standard error of the estimate, se, is found using the formula

12 Compute the standard error of the estimate for the
Parallel Example 2: Compute the Standard Error Compute the standard error of the estimate for the drilling data which is presented on the next slide.

13

14 Solution Step 1: Using technology, we find the least squares regression line to be Step 2, 3: The predicted values as well as the residuals for the 12 observations are given in the table on the next slide

15

16 Solution Step 4: We find the sum of the squared residuals by summing the last column of the table: Step 5: The standard error of the estimate is then given by:

17 CAUTION! Be sure to divide by n-2 when computing the standard error of the estimate.

18 Objective 3 Verify That the Residuals Are Normally Distributed

19 Parallel Example 4: Compute the Standard Error
Verify that the residuals from the drilling example are normally distributed.

20

21 Objective 4 Conduct Inference on the Slope

22 Hypothesis Test Regarding the Slope Coefficient, 1
To test whether two quantitative variables are linearly related, we use the following steps provided that the sample is obtained using random sampling. the residuals are normally distributed with constant error variance.

23 Step 1: Determine the null and alternative hypotheses
Step 1: Determine the null and alternative hypotheses. The hypotheses can be structured in one of three ways: Step 2: Select a level of significance, , depending on the seriousness of making a Type I error. Two-tailed Left-Tailed Right-Tailed H0: 1 = 0 H1: 1  0 H1: 1 < 0 H1: 1 > 0

24 Step 3: Compute the test statistic
which follows Student’s t-distribution with n-2 degrees of freedom. Remember, when computing the test statistic, we assume the null hypothesis to be true. So, we assume that 1=0.

25 Classical Approach Step 4: Use Table VI to determine the critical value using n-2 degrees of freedom.

26 Classical Approach Two-Tailed

27 Classical Approach Left-Tailed

28 Classical Approach Right-Tailed

29 Classical Approach Step 5: Compare the critical value with the test statistic.

30 P-Value Approach Step 4: Use Table VI to estimate the P-value using n-2 degrees of freedom.

31 P-Value Approach Two-Tailed

32 P-Value Approach Left-Tailed

33 P-Value Approach Right-Tailed

34 P-Value Approach Step 5: If the P-value < , reject the null hypothesis.

35 Step 6: State the conclusion.

36 CAUTION! Before testing H0: 1 = 0, be sure to draw a residual plot to verify that a linear model is appropriate.

37 Parallel Example 5: Testing for a Linear Relation
Test the claim that there is a linear relation between drill depth and drill time at the  = 0.05 level of significance using the drilling data.

38 Solution Verify the requirements:
We assume that the experiment was randomized so that the data can be assumed to represent a random sample. In Parallel Example 4 we confirmed that the residuals were normally distributed by constructing a normal probability plot. To verify the requirement of constant error variance, we plot the residuals against the explanatory variable, drill depth.

39 There is no discernable pattern.

40 Solution Step 1: We want to determine whether a linear relation exists between drill depth and drill time without regard to the sign of the slope. This is a two-tailed test with H0: 1 = 0 versus H1: 1  0 Step 2: The level of significance is  = 0.05. Step 3: Using technology, we obtained an estimate of 1 in Parallel Example 2, b1= To determine the standard deviation of b1, we compute The calculations are on the next slide.

41

42 Solution Step 3, cont’d: We have The test statistic is

43 Solution: Classical Approach
Step 4: Since this is a two-tailed test, we determine the critical t-values at the  = 0.05 level of significance with n-2=12-2=10 degrees of freedom to be -t0.025 = and t0.025 = Step 5: Since the value of the test statistic, 3.867, is greater than 2.228, we reject the null hypothesis.

44 Solution: P-Value Approach
Step 4: Since this is a two-tailed test, the P-value is the sum of the area under the t-distribution with 12-2=10 degrees of freedom to the left of -t0 = and to the right of t0 = Using Table VI we find that with 10 degrees of freedom, the value is between and corresponding to right-tail areas of and 0.001, respectively. Thus, the P-value is between and Step 5: Since the P-value is less than the level of significance, 0.05, we reject the null hypothesis.

45 Solution Step 6: There is sufficient evidence at the  = 0.05 level of significance to conclude that a linear relation exists between drill depth and drill time.

46 Objective 5 Construct a Confidence Interval about the Slope of the Least-Squares Regression Model

47 Confidence Intervals for the Slope of the Regression Line
A (1- )100% confidence interval for the slope of the true regression line, 1, is given by the following formulas: Lower bound: Upper bound: Here, t/2 is computed using n-2 degrees of freedom.

48 Note: The confidence interval formula for 1 can be computed only if the data are randomly obtained, the residuals are normally distributed, and there is constant error variance.

49 Parallel Example 7: Constructing a Confidence Interval for
Parallel Example 7: Constructing a Confidence Interval for the Slope of the True Regression Line Construct a 95% confidence interval for the slope of the least-squares regression line for the drilling example.

50 Solution The requirements for the usage of the confidence interval formula were verified in previous examples. We also determined b1 = in previous examples.

51 Solution Since t0.025=2.228 for 10 degrees of freedom, we have Lower bound = 0.003=0.0049 Upper bound = 0.003= We are 95% confident that the mean increase in the time it takes to drill 5 feet for each additional foot of depth at which the drilling begins is between and minutes.

52 Confidence and Prediction Intervals
Section 14.2 Confidence and Prediction Intervals

53 Objectives Construct confidence intervals for a mean response
Construct prediction intervals for an individual response

54 Confidence intervals for a mean response are intervals constructed about the predicted value of y, at a given level of x, that are used to measure the accuracy of the mean response of all the individuals in the population. Prediction intervals for an individual response are intervals constructed about the predicted value of y that are used to measure the accuracy of a single individual’s predicted value.

55 Objective 1 Construct Confidence Intervals for a Mean Response

56 Confidence Interval for the Mean Response of y, .
A (1-)100% confidence interval for , the mean response of y for a specified value of x, is given by Lower bound: Upper bound: where x* is the given value of the explanatory variable, n is the number of observations, and t/2 is the critical value with n-2 degrees of freedom.

57 Parallel Example 1: Constructing a Confidence Interval for a
Parallel Example 1: Constructing a Confidence Interval for a Mean Response Construct a 95% confidence interval about the predicted mean time to drill 5 feet for all drillings started at a depth of 110 feet.

58 Solution The least squares regression line is .
To find the predicted mean time to drill 5 feet for all drillings started at 110 feet, let x*=110 in the regression equation and obtain Recall: se=0.5197 t0.025=2.228 for 10 degrees of freedom

59 Solution Therefore, Lower bound: Upper bound:

60 Solution We are 95% confident that the mean time to drill 5
feet for all drillings started at a depth of 110 feet is between 6.45 and 7.15 minutes.

61 Objective 2 Prediction Interval for an Individual Response about .

62 Prediction Interval for an Individual Response about .
A (1-)100% prediction interval for , the individual response of y, is given by Lower bound: Upper bound: where x* is the given value of the explanatory variable, n is the number of observations, and t/2 is the critical value with n-2 degrees of freedom.

63 Parallel Example 2: Constructing a Prediction Interval for an
Parallel Example 2: Constructing a Prediction Interval for an Individual Response Construct a 95% prediction interval about the predicted time to drill 5 feet for a single drilling started at a depth of 110 feet.

64 Solution The least squares regression line is . To find the predicted
mean time to drill 5 feet for all drillings started at 110 feet, let x*=110 in the regression equation and obtain Recall: se=0.5197 t0.025=2.228 for 10 degrees of freedom

65 Solution Therefore, Lower bound: Upper bound:

66 Solution We are 95% confident that the time to drill 5
feet for a random drilling started at a depth of 110 feet is between 5.59 and 8.01 minutes.

67 Section 14.3 Multiple Regression

68 Objectives Obtain the correlation matrix
Use technology to find a multiple regression equation Interpret the coefficients of a multiple regression equation Determine R2 and adjusted R2

69 Objectives (continued)
Perform an F-test for lack of fit Test individual regression coefficients for significance Construct confidence and prediction intervals Build a regression model

70 Objective 1 Obtain the Correlation Matrix

71 A multiple regression model is given by
where yi is the value of the response variable for the ith individual 0, 1,…, k are the parameters to be estimated based on sample data x1i is the ith observation for the first explanatory variable, x2i is the ith observation for the second explanatory variable, and so on i is a random error term that is normally distributed with mean 0 and variance The error terms are independent, and i=1,…,n, where n is the sample size.

72 A correlation matrix shows the linear correlation between each pair of variables under consideration in a multiple regression model.

73 Multicollinearity exists between two explanatory variables if they have a high linear correlation.

74 CAUTION! If two explanatory variables in the regression model are highly correlated with each other, watch out for strange results in the regression output.

75 Parallel Example 1: Constructing a Correlation Matrix
As cheese ages, various chemical processes take place that determine the taste of the final product. The next two slides give concentrations of various chemicals in 30 samples of mature cheddar cheese and a subjective measure of taste for each sample. Source: Moore, David S., and George P. McCabe (1989)

76

77

78 Solution The following correlation matrix is from MINITAB:
Correlations: taste, Acetic, H2S, Lactic taste Acetic H2S Acetic H2S Lactic Cell Contents: Pearson correlation

79 Objective 2 Use Technology to Find a Multiple Regression Equation

80 Parallel Example 2: Multiple Regression
Use technology to obtain the least-squares regression equation where x1 represents the natural logarithm of the cheese’s acetic acid concentration, x2 represents the natural logarithm of the cheese’s hydrogen sulfide concentration, x3 represents the cheese’s lactic acid concentration and y represents the subjective taste score (combined score of several tasters). Draw residual plots and a boxplot of the residuals to assess the adequacy of the model.

81 Solution 1. 2.

82 Solution 2.

83 Solution 2.

84 Objective 3 Interpret the Coefficients of a Multiple Regression Equation

85 Parallel Example 3: Interpreting Regression Coefficients
Interpret the regression coefficients for the least-squares regression equation found in Parallel Example 2.

86 Solution The least-squares regression equation found in
Parallel Example 2 was Since b1=0.328, for every 1 unit increase in the natural logarithm of acetic acid concentration, the cheese’s taste score will increase by 0.328, assuming that the hydrogen sulfide and lactic acid concentrations remain unchanged.

87 Solution Since b2=3.912, for every 1 unit increase in the natural logarithm of hydrogen sulfide concentration, the cheese’s taste score will increase by 3.912, assuming that the acetic acid and lactic acid concentrations remain unchanged. Since b3=19.671, for every 1unit increase in lactic acid concentration, the cheese’s taste score will increase by , assuming that the hydrogen sulfide and acetic acid concentrations remain unchanged.

88 Objective 4 Determine R2 and Adjusted R2

89 The adjusted R2, denoted , is the adjusted coefficient of determination. It modifies the value of R2 based on the sample size, n, and the number of explanatory variables, k. The formula for the adjusted R2 is

90 “In Other Words” The value of R2 always increases by adding one more explanatory variable.

91 CAUTION! Never use R2 to compare regression models with a different number of explanatory variables. Rather, use the adjusted R2.

92 Parallel Example 4: Coefficient of Determination
For the regression model obtained in Parallel Example 2, determine the coefficient of determination and the adjusted R2.

93 Solution

94 Solution From the MINITAB output, we see that R2=65.2%. This means that 65.2% of the variation in the taste scores is explained by the least-squares regression model. We also find that

95 Objective 5 Perform an F-Test for Lack of Fit

96 Test Statistic for Multiple Regression
with k-1 degrees of freedom in the numerator and n-k degrees of freedom in the denominator, where k is the number of explanatory variables and n is the sample size.

97 F-Test Statistic for Multiple Regression Using R2
where R2 is the coefficient of determination k is the number of explanatory variables n is the sample size.

98 Decision Rule for Testing H0: 1=2= ··· =k=0
If the P-value is less than the level of significance, , reject the null hypothesis. Otherwise, do not reject the null hypothesis.

99 “In Other Words” The null hypothesis states that there is no linear relation between the explanatory variables and the response variable. The alternative hypothesis states that there is a linear relation between at least one explanatory variable and the response variable.

100 Parallel Example 5: Inference on the Regression Model
Test H0: 1=2=3=0 versus H1: at least one i0 for the multiple regression model for the cheese taste data.

101 Solution We must first determine whether it is reasonable to believe that the residuals are normally distributed with no outliers.

102 Solution Although there appears to be one outlier, the sample size is large enough for this to not be of great concern. We look at the P-value associated with the F-test statistic from the MINITAB output. Since the P-value < 0.001, we reject H0 and conclude that at least one of the regression coefficients is different from zero.

103 CAUTION! If we reject the null hypothesis that the slope coefficients are zero, then we are saying that at least one of the slopes is different from zero, not that they all are different from zero.

104 Objective 6 Test Individual Regression Coefficients for Significance

105 Test the following hypotheses for the cheese taste data:
Parallel Example 6: Testing the Significance of Individual Predictor Variables Test the following hypotheses for the cheese taste data: a) H0: 1=0 versus H1: 10 b) H0: 2=0 versus H1: 20 c) H0: 3=0 versus H1: 30

106 Solution We will again use the MINITAB output
The test statistic for acetic acid is 0.07 with a P-value of so we fail to reject H0. The test statistic for hydrogen sulfide is 3.13 with a P-value of so we reject H0. The test statistic for lactic acid is 2.28 with a P-value of so we reject H0. We conclude that the natural logarithm of hydrogen sulfide concentration and lactic acid concentration are useful predictors for taste, but the natural logarithm of acetic acid concentration is not.

107 We refit the model using the natural logarithm of hydrogen sulfide concentration and lactic acid concentration to obtain The adjusted R2=62.6% which is actually higher than it was for the model with three predictors ( ).

108 Objective 7 Construct Confidence and Prediction Intervals

109 Parallel Example 7: Testing the Significance of Individual
Parallel Example 7: Testing the Significance of Individual Predictor Variables Construct a 95% confidence interval for the mean taste score of all cheddar cheeses whose natural logarithm of hydrogen sulfide concentration is 5.5 and whose lactic acid concentration is 1.75. Construct a 95% prediction interval for the taste score of an individual cheddar cheese whose natural logarithm of hydrogen sulfide concentration is 5.5 and whose lactic acid concentration is 1.75.

110 Solution Predicted Values for New Observations New
Obs Fit SE Fit % CI % PI (22.07, 35.76) (7.40, 50.43) Values of Predictors for New Observations Obs H2S Lactic

111 Solution Based on the MINITAB output, we are 95% confident that the mean taste score of all cheddar cheeses with ln(hydrogen sulfide) =5.5 and a lactic acid concentration of 1.75 is between and We are 95% confident that the mean taste score of an individual cheddar cheese with ln(hydrogen sulfide) =5.5 and a lactic acid concentration of 1.75 will be between 7.40 and

112 Objective 8 Build a Regression Model

113 Guidelines in Developing a Multiple Regression Model
Construct a correlation matrix to help identify the explanatory variables that have a high correlation with the response variable. In addition, look at the correlation matrix for any indication that the explanatory variables are correlated with each other. Remember just because two explanatory variables have high correlation, this does not mean that multicollinearity is a problem, but it is a tip-off to watch out for strange results from the regression model.

114 Guidelines in Developing a Multiple Regression Model
Determine the multiple regression model using all the explanatory variables that have been identified by the researcher. If the null hypothesis that all the slope coefficients are zero has been rejected, we proceed to look at the individual slope coefficients. Identify those slope coefficients that have small t-test statistics (and therefore large P-values). These are candidates for explanatory variables that may be removed from the model We should only remove one explanatory variables at a time from the model before recomputing the regression model.

115 Guidelines in Developing a Multiple Regression Model
Repeat Step 3 until all slope coefficients are significantly different from zero. Be sure that the model is appropriate by drawing residual plots.


Download ppt "Chapter 14 Inference on the Least-Squares Regression Model and Multiple Regression."

Similar presentations


Ads by Google