Download presentation

Published byEvelin Starbuck Modified over 3 years ago

1
**Xuhua Xia xxia@uottawa.ca http://dambe.bio.uottawa.ca**

Multiple regression Xuhua Xia

2
**Fisher on Experimental Design**

No aphorism is more frequently repeated in connection with field trials, than that we must ask Nature few questions, or ideally, one question at a time. The writer is convinced that this view is wholly mistaken. Nature, he suggests, will respond to a logical and carefully thought-out questionnaire; indeed, if we ask her a single question, she will often refuse to answer until some other topic has been discussed. --Ronald A. Fisher Slide 2 Xuhua Xia

3
**Advantages of multiple regression**

X1 X2 Y Create a data file and read the data into R Regress Y over X1. Statistically significant? Regress Y over X2. Statistically significant? What is your conclusion at this point? 4) Regress Y over both X1 and X2. Statistically significant? 3D graphs: library(scatterplot3d) scatterplot3d(X1,X2,Y, pch=16, highlight.3d=TRUE,type="h", main="3D Scatterplot") Slide 3 Xuhua Xia

4
**Regress Y on X1 Sum of Mean Source DF Squares Square F Value Pr > F**

Model Error Corrected Total Root MSE R-Square Dependent Mean Adj R-Sq Coeff Var Parameter Estimates Parameter Standard Variable DF Estimate Error t Value Pr > |t| Intercept <.0001 X Slide 4 Xuhua Xia

5
**Regress Y on X2 Sum of Mean Source DF Squares Square F Value Pr > F**

Model Error Corrected Total Root MSE R-Square Dependent Mean Adj R-Sq Coeff Var Parameter Estimates Parameter Standard Variable DF Estimate Error t Value Pr > |t| Intercept <.0001 X Slide 5 Xuhua Xia

6
**Regress Y on both X1 and X2 Sum of Mean**

Source DF Squares Square F Value Pr > F Model <.0001 Error Corrected Total Root MSE R-Square Dependent Mean Adj R-Sq Coeff Var Parameter Estimates Parameter Standard Variable DF Estimate Error t Value Pr > |t| Intercept X <.0001 X <.0001 Slide 6 Xuhua Xia

7
3D Scatter plot

8
**Multiple Regression Various types of regression:**

Simple linear regression Multiple linear regression Nonlinear regression (for functions that cannot be linearized by transformation) In this module we deal only with multiple linear regression. Multiple regression is an extension of simple regression Yi = + Xi + i Yi = + 1 X1i + 2 X2i + i Partial regression coefficients 1 expresses how much Y would change for each unit change in X1 when X2 is held constant. 2 expresses how much Y would change for each unit change in X2 when X1 is held constant. Slide 8 Xuhua Xia

9
**Path Diagrams Y1 Y2 Y2 Y1 Y3 Y3 Y1 Y2 Y4 A B C Y3 Y3 Y1 Y1 Y3 Y1 Y4 Y4**

D E F Different structural explanations for the observed correlation r12 between Y1 and Y2 (assuming only linear relations between variables). A. Y2 is the entire cause of the variation of Y1 . In such a case the true r122 equals 1. B. The common cause Y3 totally determines variables Y1 and Y2. Again, true r122 should be 1. C. In this case, Y2 is one of several causes of Y1, and r122 will be less than 1. D. The correlation between variables Y1 and Y2 is due to a common cause Y4. Since other causes, Y3 and Y5, also determine Y1 and Y2, respectively, r122 will be less than 1. E. The correlation between variables Y1 and Y2 is due to two common causes, Y4 and Y5. F. The correlation between variables Y1 and Y2 is due to the direct effect of Y2 on Y1, as well as to a common cause, Y4. Slide 9 Xuhua Xia

10
**Cost of Waste Processing**

How to budget a waste-processing facility? The amounts of solid waste (SOLID), liquid waste (LIQUID), household waste (HOUSHOLD), and radioactive waste (RADIOACT), and the cost of processing these wastes (COST) were recorded for 19 pollution Control centers. Slide 10 Xuhua Xia

11
**The Data Set OBS SOLID LIQUID HOUSHOLD RADIOACT COST**

Slide 11 Xuhua Xia

12
The Objectives Construct a regression equation to estimate the cost per ton attributable to each waste category. Predict the cost of operating a center which manages a given amount of waste in each category. The coefficient i (and bi) is an estimate of the expected change in COST due to one unit increase in Xi, holding all other Xs fixed. Cost = b0 + b1 SOLID + b2 LIQUID + b3 HOUSHOLD + b4 RADIOACT Slide 12 Xuhua Xia

13
**Relevant R functions cor(myD,method= "pearson|spearman")**

pairs(~SOLID+LIQUID+HOUSHOLD+RADIOACT+COST) fit<-lm(COST~SOLID+LIQUID+HOUSHOLD+RADIOACT) anova(fit) summary(fit) OBS SOLID LIQUID HOUSHOLD RADIOACT COST OBS SOLID LIQUID HOUSHOLD RADIOACT COST Which IV is likely to be the most important determinant of COST? Is it possible for the correlation coefficients to be negative in this case? Relevance to one-tailed and two-tailed tests.

14
Scatter plot matrices

15
**Regression output Type I and Type III SS Analysis of Variance Table**

Response: COST Df Sum Sq Mean Sq F value Pr(>F) SOLID e-09 *** LIQUID * HOUSHOLD ** RADIOACT *** Residuals Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) SOLID e-06 *** LIQUID HOUSHOLD RADIOACT *** Multiple R-squared: , Adjusted R-squared: Type I and Type III SS The dependent variable is COST, it will necessarily increase with the amount of waste processed. So the correlation coefficients will not be negative, neither will the partial regression coefficients. Should the test of the significance of the correlation coefficients be one-tailed or two-tailed?

16
Sums of Squares SS(MODEL) = SS(SOLID LIQUID HOUSHOLD RADIOACT I INT) = Type I SS(SOLID) = SS(SOLID | INT) = Type I SS(LIQUID) = SS(LIQUID | INT,SOLID) = Type I SS(HOUSHOLD) = SS(HOUSHOLD | INT,SOLID,LIQUID) = Type I SS(RADIOACT) = SS(RADIOACT | INT,SOLID,LIQUID,HOUSHOLD) = Type III SS(SOLID) = SS(SOLID | INT,LIQUID,HOUSHOLD,RADIOACT) = Type III SS(LIQUID) = (LIQUID | INT,SOLID,HOUSHOLD,RADIOACT) = Type III SS(HOUSHOLD) = SS(HOUSHOLD | INT,SOLID,LIQUID,RADIOACT) = Type III SS(RADIOACT)=SS(RADIOACT | INT,SOLID,LIQUID,HOUSHOLD) Type I SSs are called sequential because they cumulatively account for the variation related to the X variables. The Type I SSs add up to SS(MODEL). Type III SSs are called partial because they account for the variation related to an X variable apart from the variation related to all other variables in the model. Slide 16 Xuhua Xia

17
Prediction Suppose we wish to know how much it would cost to operate a waste processing center that can handle 5.0 tons of SOLID, 3.5 tons of LIQUID, 0.5 tons of HOUSHOLD, and 0.4 tons of RADIOACT. new<-data.frame(SOLID=c(5),LIQUID=c(3.5),HOUSHOLD=c(0.5),RADIOACT=c(0.4)) predict(fit,new,interval="confidence") fit lwr upr predict(fit,new,interval="prediction") Slide 17 Xuhua Xia

Similar presentations

OK

1 Estimating and Testing 2 0 (n-1)s 2 / 2 has a 2 distribution with n-1 degrees of freedom Like other parameters, can create CIs and hypothesis tests.

1 Estimating and Testing 2 0 (n-1)s 2 / 2 has a 2 distribution with n-1 degrees of freedom Like other parameters, can create CIs and hypothesis tests.

© 2018 SlidePlayer.com Inc.

All rights reserved.

To make this website work, we log user data and share it with processors. To use this website, you must agree to our Privacy Policy, including cookie policy.

Ads by Google