# Xuhua Xia xxia@uottawa.ca http://dambe.bio.uottawa.ca Multiple regression Xuhua Xia xxia@uottawa.ca http://dambe.bio.uottawa.ca.

## Presentation on theme: "Xuhua Xia xxia@uottawa.ca http://dambe.bio.uottawa.ca Multiple regression Xuhua Xia xxia@uottawa.ca http://dambe.bio.uottawa.ca."— Presentation transcript:

Xuhua Xia xxia@uottawa.ca http://dambe.bio.uottawa.ca
Multiple regression Xuhua Xia

Fisher on Experimental Design
No aphorism is more frequently repeated in connection with field trials, than that we must ask Nature few questions, or ideally, one question at a time. The writer is convinced that this view is wholly mistaken. Nature, he suggests, will respond to a logical and carefully thought-out questionnaire; indeed, if we ask her a single question, she will often refuse to answer until some other topic has been discussed. --Ronald A. Fisher Slide 2 Xuhua Xia

X1 X2 Y Create a data file and read the data into R Regress Y over X1. Statistically significant? Regress Y over X2. Statistically significant? What is your conclusion at this point? 4) Regress Y over both X1 and X2. Statistically significant? 3D graphs: library(scatterplot3d) scatterplot3d(X1,X2,Y, pch=16, highlight.3d=TRUE,type="h", main="3D Scatterplot") Slide 3 Xuhua Xia

Regress Y on X1 Sum of Mean Source DF Squares Square F Value Pr > F
Model Error Corrected Total Root MSE R-Square Dependent Mean Adj R-Sq Coeff Var Parameter Estimates Parameter Standard Variable DF Estimate Error t Value Pr > |t| Intercept <.0001 X Slide 4 Xuhua Xia

Regress Y on X2 Sum of Mean Source DF Squares Square F Value Pr > F
Model Error Corrected Total Root MSE R-Square Dependent Mean Adj R-Sq Coeff Var Parameter Estimates Parameter Standard Variable DF Estimate Error t Value Pr > |t| Intercept <.0001 X Slide 5 Xuhua Xia

Regress Y on both X1 and X2 Sum of Mean
Source DF Squares Square F Value Pr > F Model <.0001 Error Corrected Total Root MSE R-Square Dependent Mean Adj R-Sq Coeff Var Parameter Estimates Parameter Standard Variable DF Estimate Error t Value Pr > |t| Intercept X <.0001 X <.0001 Slide 6 Xuhua Xia

3D Scatter plot

Multiple Regression Various types of regression:
Simple linear regression Multiple linear regression Nonlinear regression (for functions that cannot be linearized by transformation) In this module we deal only with multiple linear regression. Multiple regression is an extension of simple regression Yi =  +  Xi + i Yi =  + 1 X1i + 2 X2i + i Partial regression coefficients 1 expresses how much Y would change for each unit change in X1 when X2 is held constant. 2 expresses how much Y would change for each unit change in X2 when X1 is held constant. Slide 8 Xuhua Xia

Path Diagrams Y1 Y2 Y2 Y1 Y3 Y3 Y1 Y2 Y4 A B C Y3 Y3 Y1 Y1 Y3 Y1 Y4 Y4
D E F Different structural explanations for the observed correlation r12 between Y1 and Y2 (assuming only linear relations between variables). A. Y2 is the entire cause of the variation of Y1 . In such a case the true r122 equals 1. B. The common cause Y3 totally determines variables Y1 and Y2. Again, true r122 should be 1. C. In this case, Y2 is one of several causes of Y1, and r122 will be less than 1. D. The correlation between variables Y1 and Y2 is due to a common cause Y4. Since other causes, Y3 and Y5, also determine Y1 and Y2, respectively, r122 will be less than 1. E. The correlation between variables Y1 and Y2 is due to two common causes, Y4 and Y5. F. The correlation between variables Y1 and Y2 is due to the direct effect of Y2 on Y1, as well as to a common cause, Y4. Slide 9 Xuhua Xia

Cost of Waste Processing
How to budget a waste-processing facility? The amounts of solid waste (SOLID), liquid waste (LIQUID), household waste (HOUSHOLD), and radioactive waste (RADIOACT), and the cost of processing these wastes (COST) were recorded for 19 pollution Control centers. Slide 10 Xuhua Xia

The Data Set OBS SOLID LIQUID HOUSHOLD RADIOACT COST
Slide 11 Xuhua Xia

The Objectives Construct a regression equation to estimate the cost per ton attributable to each waste category. Predict the cost of operating a center which manages a given amount of waste in each category. The coefficient i (and bi) is an estimate of the expected change in COST due to one unit increase in Xi, holding all other Xs fixed. Cost = b0 + b1 SOLID + b2 LIQUID + b3 HOUSHOLD + b4 RADIOACT Slide 12 Xuhua Xia

Relevant R functions cor(myD,method= "pearson|spearman")
pairs(~SOLID+LIQUID+HOUSHOLD+RADIOACT+COST) fit<-lm(COST~SOLID+LIQUID+HOUSHOLD+RADIOACT) anova(fit) summary(fit) OBS SOLID LIQUID HOUSHOLD RADIOACT COST OBS SOLID LIQUID HOUSHOLD RADIOACT COST Which IV is likely to be the most important determinant of COST? Is it possible for the correlation coefficients to be negative in this case? Relevance to one-tailed and two-tailed tests.

Scatter plot matrices

Regression output Type I and Type III SS Analysis of Variance Table
Response: COST Df Sum Sq Mean Sq F value Pr(>F) SOLID e-09 *** LIQUID * HOUSHOLD ** RADIOACT *** Residuals Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) SOLID e-06 *** LIQUID HOUSHOLD RADIOACT *** Multiple R-squared: , Adjusted R-squared: Type I and Type III SS The dependent variable is COST, it will necessarily increase with the amount of waste processed. So the correlation coefficients will not be negative, neither will the partial regression coefficients. Should the test of the significance of the correlation coefficients be one-tailed or two-tailed?

Sums of Squares SS(MODEL) = SS(SOLID LIQUID HOUSHOLD RADIOACT I INT) = Type I SS(SOLID) = SS(SOLID | INT) = Type I SS(LIQUID) = SS(LIQUID | INT,SOLID) = Type I SS(HOUSHOLD) = SS(HOUSHOLD | INT,SOLID,LIQUID) = Type I SS(RADIOACT) = SS(RADIOACT | INT,SOLID,LIQUID,HOUSHOLD) = Type III SS(SOLID) = SS(SOLID | INT,LIQUID,HOUSHOLD,RADIOACT) = Type III SS(LIQUID) = (LIQUID | INT,SOLID,HOUSHOLD,RADIOACT) = Type III SS(HOUSHOLD) = SS(HOUSHOLD | INT,SOLID,LIQUID,RADIOACT) = Type III SS(RADIOACT)=SS(RADIOACT | INT,SOLID,LIQUID,HOUSHOLD) Type I SSs are called sequential because they cumulatively account for the variation related to the X variables. The Type I SSs add up to SS(MODEL). Type III SSs are called partial because they account for the variation related to an X variable apart from the variation related to all other variables in the model. Slide 16 Xuhua Xia

Prediction Suppose we wish to know how much it would cost to operate a waste processing center that can handle 5.0 tons of SOLID, 3.5 tons of LIQUID, 0.5 tons of HOUSHOLD, and 0.4 tons of RADIOACT. new<-data.frame(SOLID=c(5),LIQUID=c(3.5),HOUSHOLD=c(0.5),RADIOACT=c(0.4)) predict(fit,new,interval="confidence") fit lwr upr predict(fit,new,interval="prediction") Slide 17 Xuhua Xia

Download ppt "Xuhua Xia xxia@uottawa.ca http://dambe.bio.uottawa.ca Multiple regression Xuhua Xia xxia@uottawa.ca http://dambe.bio.uottawa.ca."

Similar presentations