Presentation is loading. Please wait.

Presentation is loading. Please wait.

Overview of our study of the multiple linear regression model Regression models with more than one slope parameter.

Similar presentations


Presentation on theme: "Overview of our study of the multiple linear regression model Regression models with more than one slope parameter."— Presentation transcript:

1 Overview of our study of the multiple linear regression model Regression models with more than one slope parameter

2 Is brain and body size predictive of intelligence? Sample of n = 38 college students Response (y): intelligence based on PIQ (performance) scores from the (revised) Wechsler Adult Intelligence Scale. Potential predictor (x 1 ): Brain size based on MRI scans (given as count/10,000). Potential predictor (x 2 ): Height in inches. Potential predictor (x 3 ): Weight in pounds. Example 1

3 Scatter matrix plot Example 1

4 Scatter matrix plot Example 1

5 Scatter matrix plot Illustrates the marginal relationships between each pair of variables without regard to the other variables. The challenge is how the response y relates to all three predictors simultaneously.

6 A multiple linear regression model with three quantitative predictors where … y i is intelligence (PIQ) of student i x i1 is brain size (MRI) of student i x i2 is height (Height) of student i x i3 is weight (Weight) of student i Example 1 and … the independent error terms  i follow a normal distribution with mean 0 and equal variance  2.

7 Some research questions Which predictors – brain size, height, or weight – explain some variation in PIQ? What is the effect of brain size on PIQ, after taking into account height and weight? What is the PIQ of an individual with a given brain size, height, and weight? Example 1

8 The regression equation is PIQ = 111 + 2.06 Brain - 2.73 Height + 0.001 Weight Predictor Coef SE Coef T P Constant 111.35 62.97 1.77 0.086 Brain 2.0604 0.5634 3.66 0.001 Height -2.732 1.229 -2.22 0.033 Weight 0.0006 0.1971 0.00 0.998 S = 19.79 R-Sq = 29.5% R-Sq(adj) = 23.3% Analysis of Variance Source DF SS MS F P Regression 3 5572.7 1857.6 4.74 0.007 Residual Error 34 13321.8 391.8 Total 37 18894.6 Source DF Seq SS Brain 1 2697.1 Height 1 2875.6 Weight 1 0.0

9 Baby bird breathing habits in burrows? Experiment with n = 120 nestling bank swallows Response (y): % increase in “minute ventilation”, Vent, i.e., total volume of air breathed per minute Potential predictor (x 1 ): percentage of oxygen, O2, in the air the baby birds breathe Potential predictor (x 2 ): percentage of carbon dioxide, CO2, in the air the baby birds breathe Example 2

10 Scatter matrix plot Example 2

11 Three-dimensional scatter plot Example 2

12 A first order model with two quantitative predictors where … y i is percentage of minute ventilation x i1 is percentage of oxygen x i2 is percentage of carbon dioxide and … the independent error terms  i follow a normal distribution with mean 0 and equal variance  2. Example 2

13 Some research questions Is oxygen related to minute ventilation, after taking into account carbon dioxide? Is carbon dioxide related to minute ventilation, after taking into account oxygen? What is the mean minute ventilation of all nestling bank swallows whose breathing air is comprised of 15% oxygen and 5% carbon dioxide? Example 2

14 The regression equation is Vent = 86 - 5.33 O2 + 31.1 CO2 Predictor Coef SE Coef T P Constant 85.9 106.0 0.81 0.419 O2 -5.330 6.425 -0.83 0.408 CO2 31.103 4.789 6.50 0.000 S = 157.4 R-Sq = 26.8% R-Sq(adj) = 25.6% Analysis of Variance Source DF SS MS F P Regression 2 1061819 530909 21.44 0.000 Residual Error 117 2897566 24766 Total 119 3959385 Source DF Seq SS O2 1 17045 CO2 1 1044773

15 Is baby’s birth weight related to smoking during pregnancy? Sample of n = 32 births Response (y): birth weight in grams of baby Potential predictor (x 1 ): smoking status of mother (yes or no) Potential predictor (x 2 ): length of gestation in weeks Example 3

16 Scatter matrix plot Example 3

17 A first order model with one binary predictor where … y i is birth weight of baby i x i1 is length of gestation of baby i x i2 = 1, if mother smokes and x i2 = 0, if not and … the independent error terms  i follow a normal distribution with mean 0 and equal variance  2. Example 3

18 Estimated first order model with one binary predictor The regression equation is Weight = - 2390 + 143 Gest - 245 Smoking Example 3

19 Some research questions Is baby’s birth weight related to smoking during pregnancy? How is birth weight related to gestation, after taking into account smoking status? Example 3

20 The regression equation is Weight = - 2390 + 143 Gest - 245 Smoking Predictor Coef SE Coef T P Constant -2389.6 349.2 -6.84 0.000 Gest 143.100 9.128 15.68 0.000 Smoking -244.54 41.98 -5.83 0.000 S = 115.5 R-Sq = 89.6% R-Sq(adj) = 88.9% Analysis of Variance Source DF SS MS F P Regression 2 3348720 1674360 125.45 0.000 Residual Error 29 387070 13347 Total 31 3735789 Source DF Seq SS Gest 1 2895838 Smoking 1 452881

21 Compare three treatments (A, B, C) for severe depression Random sample of n = 36 severely depressed individuals. y = measure of treatment effectiveness x 1 = age (in years) x 2 = 1 if patient received A and 0, if not x 3 = 1 if patient received B and 0, if not Example 4

22 Compare three treatments (A, B, C) for severe depression Example 4

23 A second order model with one quantitative predictor, a three-group qualitative variable, and interactions where … y i is treatment effectiveness for patient i x i1 is age of patient i x i2 = 1, if treatment A and x i2 = 0, if not x i3 = 1, if treatment B and x i3 = 0, if not Example 4

24 The estimated regression function Example 4 Regression equation is y = 6.21 + 1.03 age + 41.3 x2 + 22.7 x3 - 0.703 agex2 - 0.510 agex3

25 Potential research questions Does the effectiveness of the treatment depend on age? Is one treatment superior to the other treatment for all ages? What is the effect of age on the effectiveness of the treatment? Example 4

26 Regression equation is y = 6.21 + 1.03 age + 41.3 x2 + 22.7 x3 - 0.703 agex2 - 0.510 agex3 Predictor Coef SE Coef T P Constant 6.211 3.350 1.85 0.074 age 1.03339 0.07233 14.29 0.000 x2 41.304 5.085 8.12 0.000 x3 22.707 5.091 4.46 0.000 agex2 -0.7029 0.1090 -6.45 0.000 agex3 -0.5097 0.1104 -4.62 0.000 S = 3.925 R-Sq = 91.4% R-Sq(adj) = 90.0% Analysis of Variance Source DF SS MS F P Regression 5 4932.85 986.57 64.04 0.000 Residual Error 30 462.15 15.40 Total 35 5395.00 Source DF Seq SS age 1 3424.43 x2 1 803.80 x3 1 1.19 agex2 1 375.00 agex3 1 328.42 Example 4

27 How is the length of a bluegill fish related to its age? In 1981, n = 78 bluegills randomly sampled from Lake Mary in Minnesota. y = length (in mm) x 1 = age (in years) Example 5

28 Scatter plot Example 5

29 A second order polynomial model with one quantitative predictor where … y i is length of bluegill (fish) i (in mm) x i is age of bluegill (fish) i (in years) and … the independent error terms  i follow a normal distribution with mean 0 and equal variance  2. Example 5

30 Estimated regression function Example 5

31 Potential research questions How is the length of a bluegill fish related to its age? What is the length of a randomly selected five-year-old bluegill fish? Example 5

32 The regression equation is length = 148 + 19.8 c_age - 4.72 c_agesq Predictor Coef SE Coef T P Constant 147.604 1.472 100.26 0.000 c_age 19.811 1.431 13.85 0.000 c_agesq -4.7187 0.9440 -5.00 0.000 S = 10.91 R-Sq = 80.1% R-Sq(adj) = 79.6% Analysis of Variance Source DF SS MS F P Regression 2 35938 17969 151.07 0.000 Residual Error 75 8921 119 Total 77 44859... Predicted Values for New Observations New Fit SE Fit 95.0% CI 95.0% PI 1 165.90 2.77 (160.39, 171.42) (143.49, 188.32) Values of Predictors for New Observations New c_age c_agesq 1 1.37 1.88 Example 5

33 The good news! Everything you learned about the simple linear regression model extends, with at most minor modification, to the multiple linear regression model: –same assumptions, same model checking –(adjusted) R 2 –t-tests and t-intervals for one slope –prediction (confidence) intervals for (mean) response

34 New things we need to learn! The above research scenarios (models) and a few more The “general linear test” which helps to answer many research questions F-tests for more than one slope Interactions between two or more predictor variables Identifying influential data points

35 New things we need to learn! Detection of (“variance inflation factors”) correlated predictors (“multicollinearity”) and the limitations they cause Selection of variables from a large set of variables for inclusion in a model (“stepwise regression and “best subsets regression”)


Download ppt "Overview of our study of the multiple linear regression model Regression models with more than one slope parameter."

Similar presentations


Ads by Google