Presentation is loading. Please wait.

Presentation is loading. Please wait.

Multiple Regression continued… STAT E-150 Statistical Methods.

Similar presentations


Presentation on theme: "Multiple Regression continued… STAT E-150 Statistical Methods."— Presentation transcript:

1 Multiple Regression continued… STAT E-150 Statistical Methods

2 2 When we discussed simple linear regression, we briefly introduced prediction intervals and confidence intervals: Confidence Intervals and Prediction Intervals Let x be a specific value of x. The predicted value of y is We can create two different intervals: a prediction interval for an individual value of x a confidence interval for the mean predicted value at x

3 3 The basic format for an interval is When we want to find a mean predicted value, When we want to find an individual predicted value,

4 4 Let us return to our earlier discussion of the age of adolescent mothers and the weight of their babies. We found that there was a linear relationship between these variables: weight = 245.15 age – 1163.45 How can we use this model to make predictions?

5 5 Suppose we want to predict the weight of a baby born to a mother who is 16 years old. When we analyze the data, we can choose to save the predicted values, the confidence interval and the prediction interval for each predictor value. The results will appear in the datasheet: x-value predicted 95% CI 95% CI y-value confidence interval prediction interval

6 6 What weight is expected for a baby of a 16 year old mother?

7 7 What weight is expected for a baby of a 16 year old mother? 2759 g

8 8 What is the prediction interval estimate for the weight of a baby of a 16 year old mother?

9 9 What is the prediction interval estimate for the weight of a baby of a 16 year old mother? 2251.24 to 3266.66 g What does it tell you? We are 95% confident that the birthweight of a baby born to a 16 year old mother is between 2575.59 and 2942.31 g.

10 10 What is the prediction interval estimate for the weight of a baby of a 16 year old mother? 2251.24 to 3266.66 g What does it tell you? We are 95% confident that the birthweight of a baby born to a 16 year old mother is between 2251.24 and 3266.66 g.

11 11 What is the confidence interval estimate for the mean weight of babies of 16 year old mothers?

12 12 What is the confidence interval estimate for the mean weight of babies of 16 year old mothers? 2575.59 to 2942.31 g What does it tell you? We are 95% confident

13 13 What is the confidence interval estimate for the mean weight of babies of 16 year old mothers? 2575.59 to 2942.31 g What does it tell you? We are 95% confident that the mean birthweight of babies born to 16 year old mothers is between 2575.59 and 2942.31 g. We are 95% confident

14 14 The 95% confidence interval is (2575.59, 2942.31) The 95% prediction interval is (2251.24, 3266.66) Which is interval is wider? Why?

15 15 The 95% confidence interval is (2575.59, 2942.31) The 95% prediction interval is (2251.24, 3266.66) Which is interval is wider? Why? The prediction interval is wider, because means vary less than individual values.

16 16 In the data concerning body fat percentages in men, the predictor variables were waist and height, and we found a regression equation which we can now use to make predictions: %BodyFat = 1.773 waist -.601 height – 3.110 We can find prediction intervals and confidence intervals as we did when we used a single predictor.

17 17 Suppose we want to predict the body fat percentage associated with a waist size of 34 inches and a height of 6 feet. We can proceed as we did with a single predictor, by entering these values in the data window, and then saving the results of the linear regression analysis.

18 18 When you scroll to the right, you will see these results: What is the predicted body fat %?

19 19 When you scroll to the right, you will see these results: What is the predicted body fat %? 13.874%

20 20 When you scroll to the right, you will see these results: What is the prediction interval? What does it tell you?

21 21 When you scroll to the right, you will see these results: What is the prediction interval? What does it tell you? The 95% prediction interval is (5.05, 22.69)

22 22 When you scroll to the right, you will see these results: What is the prediction interval? What does it tell you? We are 95% confident that a man who is 6 feet tall and has a 34 inch waist will have a body fat percentage between 5.05 and 22.69.

23 23 When you scroll to the right, you will see these results: What is the confidence interval? What does it tell you?

24 24 When you scroll to the right, you will see these results: What is the confidence interval? What does it tell you? The 95% confidence interval is (13.10, 14.65)

25 25 When you scroll to the right, you will see these results: What is the confidence interval? What does it tell you? We are 95% confident that the mean body fat percentage for men who are 6 feet tall and have a 34 inch waist is between 13.10 and 14.65.

26 26 Models with Categorical Predictors Categorical (or qualitative) variables can also be included in multiple regression models. These variables are coded as numbers so that we can employ the methods we have discussed. These coded values are called indicator variables or dummy variables. They are often coded using 0 and 1, where 0 = absence or 0 = "no" 1 = presence 1 = "yes"

27 27 Example: One way colleges measure success is by graduation rates. The Education Trust publishes 6-year graduation rates along with other college characteristics on its website, www.collegeresults.org.

28 28 Here is a sample of the data, which represents a random sample of 22 colleges selected from the 1037 colleges in the United States with enrollments under 5000 students:

29 29 We define these variables: y = 6-year graduation rate x 1 = median SAT score of students accepted to the college x 2 = student-related expense per full-time student (in dollars)

30 30 The regression model is y = β 0 + β 1 x 1 + β 2 x 2 + β 3 x 3 + ε For single-sex colleges: Rate = β 0 + β 1 SAT + β 2 Expense + β 3 (1) = β 0 + β 1 SAT + β 2 Expense + β 3 + ε For coeducational colleges: Rate = β 0 + β 1 SAT + β 2 Expense + β 3 (0) = β 0 + β 1 SAT + β 2 Expense + ε In either case, the slopes are determined using data from both types of colleges.

31 31 For single-sex colleges, the intercept is β 0 + β 3 : Rate = β 0 + β 1 SAT + β 2 Expense + β 3 (1) = β 0 + β 1 SAT + β 2 Expense + β 3 + ε = (β 0 + β 3 ) + β 1 SAT + β 2 Expense + ε For coeducational colleges: Rate = β 0 + β 1 SAT + β 2 Expense + β 3 (0) = β 0 + β 1 SAT + β 2 Expense + ε In other words, the coefficient of the indicator variable represents the difference in intercepts for the regression lines for the two types of colleges.

32 32 What are the hypotheses? H 0 : β 1 = β 2 = β 3 = 0 H a : The coefficients are not all zero

33 33 What are the hypotheses? H 0 : β 1 = β 2 = β 3 = 0 H a : The coefficients are not all zero

34 34 Here is part of the SPSS analysis: What is your conclusion?

35 35 What is your conclusion? Since F is large and p is close to 0, the null hypothesis is rejected. We can conclude that there is a linear relationship between the 6- year graduation rate and the median SAT score, the student-related expense per full-time student, and the gender of the student body.

36 36 What is the regression equation?

37 37 What is the regression equation? y =.001x 1 +.00000697x 2 +.125x 3 -.391

38 38 For single-sex colleges: y =.001x 1 +.00000697x 2 +.125(1) -.391 y =.001x 1 +.00000697x 2 -.266

39 39 For coed colleges: y =.001x 1 +.00000697x 2 -.391

40 40 What is the meaning of the coefficient β 3 ? We can interpret the value.125 as the “correction” we would make to the predicted graduation rate to incorporate the difference associated with having only male or only female students.

41 41 What is the meaning of the coefficient β 3 ? We can interpret the value.125 as the difference in intercepts for the two different types of colleges.

42 42 Interaction and Collinearity If the change in the mean y-value associated with a 1-unit increase in one predictor variable depends on the value of a second predictor variable, there is interaction between the two predictor variables. If we represent the variables as x 1 and x 2, the interaction can be modeled by including their product, x 1 x 2, as a predictor variable.

43 43 Interaction and Collinearity The regression model for two predictor variables would now include a cross-product term: Y = β 0 + β 1 x 1 + β 2 x 2 + β 3 x 1 x 2 +ε where β 1 + β 3 x 2 represents the change in Y for every one-unit increase in x 1, keeping x 2 fixed β 2 + β 3 x 1 represents the change in Y for every one-unit increase in x 2, keeping x 1 fixed If you find that there is a linear association, be sure to check the coefficient of the interaction term.

44 44 We determine collinearity by examining a correlation matrix: What is the correlation between Pct BF and Height?-.029Is this value significant? No; p=.322 Pct BF and Waist? Is this value significant? Height and Waist? Is this value significant? Correlations HeightWaist Pearson CorrelationPct BF-.029.824 Height1.000.187 Waist.1871.000 Sig. (1-tailed)Pct BF.322.000 Height..002 Waist.002. NPct BF250 Height250 Waist250

45 45 We determine collinearity by examining a correlation matrix: What is the correlation between Pct BF and Height?-.029Is this value significant? No; p =.322 Pct BF and Waist?.824Is this value significant? Yes; p =.000 Height and Waist?.187Is this value significant? Yes; p =.002 It is important to note that this information only refers to the pair of variables in question, without regard to the influences of other variables. Correlations HeightWaist Pearson CorrelationPct BF-.029.824 Height1.000.187 Waist.1871.000 Sig. (1-tailed)Pct BF.322.000 Height..002 Waist.002. NPct BF250 Height250 Waist250

46 46 Another way to assess collinearity: VIF is the Variance Inflation Factor, which indicates whether a predictor has a strong linear relationship with the other predictors. There is reason for concern if the largest VIF is greater than 5. The Tolerance statistic is the reciprocal of the VIF. There is a serious problem if this value is less than.2. Coefficients a Model Unstandardized Coefficients Standardized Coefficients tSig. Collinearity Statistics BStd. ErrorBetaToleranceVIF 1(Constant)-3.1107.687 -.405.686 Waist1.773.072.85924.768.000.9651.036 Height-.601.110-.190-5.470.000.9651.036 a. Dependent Variable: Pct BF


Download ppt "Multiple Regression continued… STAT E-150 Statistical Methods."

Similar presentations


Ads by Google