Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Experimental Statistics - week 12 Chapter 11: Linear Regression and Correlation Chapter 12: Multiple Regression.

Similar presentations


Presentation on theme: "1 Experimental Statistics - week 12 Chapter 11: Linear Regression and Correlation Chapter 12: Multiple Regression."— Presentation transcript:

1 1 Experimental Statistics - week 12 Chapter 11: Linear Regression and Correlation Chapter 12: Multiple Regression

2 2 April 5 -- Lab

3 3 Analysis of Variance Approach Mathematical Fact SS(Total) = SS(Regression) + SS(Residuals) p. 649 (SS “explained” by the model) (SS “unexplained” by the model) (S yy )

4 4 Plot of Production vs Cost

5 5 SS(???)

6 6

7 7

8 8 measures the proportion of the variability in Y that is explained by the regression on X

9 9 12 8 7 12 4 15 11 10 15 12 20 8 17 14 24 7 8 12 4 12 11 15 YXX

10 10 The GLM Procedure Dependent Variable: y Sum of Source DF Squares Model 1 19.575 Error 6 174.425 Corrected Total 7 194.000 The GLM Procedure Dependent Variable: y Sum of Source DF Squares Model =SS(reg) 1 170.492 Error =SS(Res) 6 23.508 Corrected Total 7 194.000 =SS(Total)

11 11 RECALL Theoretical Model Regression line residuals

12 12 Residual Analysis Examination of residuals to help determine if: - assumptions are met - regression model is appropriate Residual Plot: Plot of x vs residuals

13 13

14 14

15 15 Study Time Data PROC GLM; MODEL score=time; OUTPUT out=new r=resid; RUN; PROC GPLOT; TITLE 'Plot of Residuals'; PLOT resid*time; RUN;

16 16 Average Height of Girls by Age

17 17 Average Height of Girls by Age

18 18 Residual Plot

19 19 Residual Analysis Examination of residuals to help determine if: - assumptions are met - regression model is appropriate Residual Plot: - plot of x vs residuals Normality of Residuals: - probability plot - histogram

20 20 Residuals from Car Dataset fit using √ hp

21 21 Residuals from Car Dataset fit using log(hp)

22 22 Y X 4.3 4 5.5 5 6.8 6 8.0 7 4.0 4 5.2 5 6.6 6 7.5 7 2.0 4 4.0 5 5.7 6 6.5 7 Data – Page 572 Y = weight loss (wtloss) X = exposure time (exptime) Weight loss in a chemical compound as a function of how long it is exposed to air

23 23 PROC REG; MODEL wtloss=exptime/r cli clm; output out=new r=resid; RUN; The REG Procedure Dependent Variable: wtloss Analysis of Variance Sum of Mean Source DF Squares Square F Value Pr > F Model 1 26.00417 26.00417 40.22 <.0001 Error 10 6.46500 0.64650 Corrected Total 11 32.46917 Root MSE 0.80405 R-Square 0.8009 Dependent Mean 5.50833 Adj R-Sq 0.7810 Coeff Var 14.59701 Parameter Estimates Parameter Standard Variable DF Estimate Error t Value Pr > |t| Intercept 1 -1.73333 1.16518 -1.49 0.1677 exptime 1 1.31667 0.20761 6.34 <.0001

24 24 Plot of Residuals - MLR Model The REG Procedure Dependent Variable: wtloss Output Statistics Dependent Predict Std Error Obs Variable Value Mean Predict 95% CL Mean 95% CL Predict Residual 1 4.3000 3.5333 0.3884 2.6679 4.3987 1.5437 5.5229 0.7667 2 5.5000 4.8500 0.2543 4.2835 5.4165 2.9710 6.7290 0.6500 3 6.8000 6.1667 0.2543 5.6001 6.7332 4.2877 8.0456 0.6333 4 8.0000 7.4833 0.3884 6.6179 8.3487 5.4937 9.4729 0.5167 5 4.0000 3.5333 0.3884 2.6679 4.3987 1.5437 5.5229 0.4667 6 5.2000 4.8500 0.2543 4.2835 5.4165 2.9710 6.7290 0.3500 7 6.6000 6.1667 0.2543 5.6001 6.7332 4.2877 8.0456 0.4333 8 7.5000 7.4833 0.3884 6.6179 8.3487 5.4937 9.4729 0.0167 9 2.0000 3.5333 0.3884 2.6679 4.3987 1.5437 5.5229 -1.5333 10 4.0000 4.8500 0.2543 4.2835 5.4165 2.9710 6.7290 -0.8500 11 5.7000 6.1667 0.2543 5.6001 6.7332 4.2877 8.0456 -0.4667 12 6.5000 7.4833 0.3884 6.6179 8.3487 5.4937 9.4729 -0.9833

25 25

26 26 The REG Procedure Dependent Variable: wtloss Analysis of Variance Sum of Mean Source DF Squares Square F Value Pr > F Model 1 26.00417 26.00417 40.22 <.0001 Error 10 6.46500 0.64650 Corrected Total 11 32.46917 Root MSE 0.80405 R-Square 0.8009 Dependent Mean 5.50833 Adj R-Sq 0.7810 Coeff Var 14.59701 Parameter Estimates Parameter Standard Variable DF Estimate Error t Value Pr > |t| Intercept 1 -1.73333 1.16518 -1.49 0.1677 exptime 1 1.31667 0.20761 6.34 <.0001 ??? For testing H 0 :    For testing H 0 :   

27 27 Analysis of Variance Sum of Mean Source DF Squares Square F Value Pr > F Model 1 26.00417 26.00417 40.22 <.0001 Error 10 6.46500 0.64650 Corrected Total 11 32.46917 Recall: SS(Regression) = “Model SS” SS(Residual) = “Error SS”

28 28 H 0 : there is no linear relationship between X and Y H 1 : there is a linear relationship between X and Y Reject H 0 if F > F  (1,n – 2) where

29 29 H 0 : there is no linear relationship between weight loss and exposure time H 1 : there is a linear relationship between weight loss and exposure time

30 30 Note: In simple linear regression H 0 : there is no linear relationship between X and Y H 1 : there is a linear relationship between X and Y and H 0 :    0 H 1 :   ≠ 0 are equivalent and F  t2F  t2

31 31 Multiple Regression Use of more than one independent variable to predict Y Assumptions:

32 32 Data and so we have i th observation, j th independent variable

33 33 Goal: Find “best” prediction equation of the form As before:

34 34 Again: the solution involves calculus -- solving the Normal Equations on page 627

35 35 Analysis of Variance Sum of Mean Source DF Squares Square F Value Model k SS(Reg.) MS(Reg.)=SS(Reg.)/k MS(Reg.)/MSE Error n-k-1 SSE MSE=SSE/(n-k-1) Corr. Total n-1 SS(Total)

36 36 H 0 : there is no linear relationship between Y and the independent variables H 1 : there is a linear relationship between Y and the independent variables Reject H 0 if F > F  (k, n  k  1) where Multiple Regression Setting

37 37 measures the proportion of the variability in Y that is explained by the regression - in MLR Setting has the same interpretation as before

38 38 Y X 1 X 2 4.3 4.2 5.5 5.2 6.8 6.2 8.0 7.2 4.0 4.3 5.2 5.3 6.6 6.3 7.5 7.3 2.0 4.4 4.0 5.4 5.7 6.4 6.5 7.4 Data – Page 628 Y = weight loss (wtloss) X 1 = exposure time (exptime) X 2 = relative humidity (humidity) Weight loss in a chemical compound as a function of exposure time and humidity

39 39 The REG Procedure Dependent Variable: wtloss Number of Observations Read 12 Number of Observations Used 12 Analysis of Variance Sum of Mean Source DF Squares Square F Value Pr > F Model 2 31.12417 15.56208 104.13 <.0001 Error 9 1.34500 0.14944 Corrected Total 11 32.46917 Root MSE 0.38658 R-Square 0.9586 Dependent Mean 5.50833 Adj R-Sq 0.9494 Coeff Var 7.01810 Parameter Estimates Parameter Standard Variable DF Estimate Error t Value Pr > |t| Intercept 1 0.66667 0.69423 0.96 0.3620 exptime 1 1.31667 0.09981 13.19 <.0001 humidity 1 -8.00000 1.36677 -5.85 0.0002 Chemical Weight Loss – MLR Output

40 40 H 0 : there is no linear relationship between weight loss and the variables exposure time and humidity H 1 : there is a linear relationship between weight loss and the variables exposure time and humidity

41 41 Examining Contributions of Individual X variables Use t -test for the X variable in question. - this tests the effect of that particular independent variable while all other independent variables stay constant. Parameter Estimates Parameter Standard Variable DF Estimate Error t Value Pr > |t| Intercept 1 0.66667 0.69423 0.96 0.3620 exptime 1 1.31667 0.09981 13.19 <.0001 humidity 1 -8.00000 1.36677 -5.85 0.0002


Download ppt "1 Experimental Statistics - week 12 Chapter 11: Linear Regression and Correlation Chapter 12: Multiple Regression."

Similar presentations


Ads by Google