2 Fisher on Experimental Design No aphorism is more frequently repeated in connection with field trials, than that we must ask Nature few questions, or ideally, one question at a time. The writer is convinced that this view is wholly mistaken. Nature, he suggests, will respond to a logical and carefully thought-out questionnaire; indeed, if we ask her a single question, she will often refuse to answer until some other topic has been discussed.--Ronald A. FisherSlide 2Xuhua Xia
3 Advantages of multiple regression X1 X2 YCreate a data file and read the data into RRegress Y over X1. Statistically significant?Regress Y over X2. Statistically significant?What is your conclusion at this point?4) Regress Y over both X1 and X2. Statistically significant?3D graphs:library(scatterplot3d)scatterplot3d(X1,X2,Y, pch=16, highlight.3d=TRUE,type="h",main="3D Scatterplot")Slide 3Xuhua Xia
4 Regress Y on X1 Sum of Mean Source DF Squares Square F Value Pr > F ModelErrorCorrected TotalRoot MSE R-SquareDependent Mean Adj R-SqCoeff VarParameter EstimatesParameter StandardVariable DF Estimate Error t Value Pr > |t|Intercept <.0001XSlide 4Xuhua Xia
5 Regress Y on X2 Sum of Mean Source DF Squares Square F Value Pr > F ModelErrorCorrected TotalRoot MSE R-SquareDependent Mean Adj R-SqCoeff VarParameter EstimatesParameter StandardVariable DF Estimate Error t Value Pr > |t|Intercept <.0001XSlide 5Xuhua Xia
6 Regress Y on both X1 and X2 Sum of Mean Source DF Squares Square F Value Pr > FModel <.0001ErrorCorrected TotalRoot MSE R-SquareDependent Mean Adj R-SqCoeff VarParameter EstimatesParameter StandardVariable DF Estimate Error t Value Pr > |t|InterceptX <.0001X <.0001Slide 6Xuhua Xia
8 Multiple Regression Various types of regression: Simple linear regressionMultiple linear regressionNonlinear regression (for functions that cannot be linearized by transformation)In this module we deal only with multiple linear regression.Multiple regression is an extension of simple regression Yi = + Xi + i Yi = + 1 X1i + 2 X2i + iPartial regression coefficients1 expresses how much Y would change for each unit change in X1 when X2 is held constant. 2 expresses how much Y would change for each unit change in X2 when X1 is held constant.Slide 8Xuhua Xia
9 Path Diagrams Y1 Y2 Y2 Y1 Y3 Y3 Y1 Y2 Y4 A B C Y3 Y3 Y1 Y1 Y3 Y1 Y4 Y4 D E FDifferent structural explanations for the observed correlation r12 between Y1 and Y2 (assuming only linear relations between variables). A. Y2 is the entire cause of the variation of Y1 . In such a case the true r122 equals 1. B. The common cause Y3 totally determines variables Y1 and Y2. Again, true r122 should be 1. C. In this case, Y2 is one of several causes of Y1, and r122 will be less than 1. D. The correlation between variables Y1 and Y2 is due to a common cause Y4. Since other causes, Y3 and Y5, also determine Y1 and Y2, respectively, r122 will be less than 1. E. The correlation between variables Y1 and Y2 is due to two common causes, Y4 and Y5. F. The correlation between variables Y1 and Y2 is due to the direct effect of Y2 on Y1, as well as to a common cause, Y4.Slide 9Xuhua Xia
10 Cost of Waste Processing How to budget a waste-processing facility?The amounts of solid waste (SOLID), liquid waste (LIQUID), household waste (HOUSHOLD), and radioactive waste (RADIOACT), and the cost of processing these wastes (COST) were recorded for 19 pollution Control centers.Slide 10Xuhua Xia
11 The Data Set OBS SOLID LIQUID HOUSHOLD RADIOACT COST Slide 11Xuhua Xia
12 The ObjectivesConstruct a regression equation to estimate the cost per ton attributable to each waste category.Predict the cost of operating a center which manages a given amount of waste in each category.The coefficient i (and bi) is an estimate of the expected change in COST due to one unit increase in Xi, holding all other Xs fixed.Cost = b0 + b1 SOLID + b2 LIQUID + b3 HOUSHOLD +b4 RADIOACTSlide 12Xuhua Xia
13 Relevant R functions cor(myD,method= "pearson|spearman") pairs(~SOLID+LIQUID+HOUSHOLD+RADIOACT+COST)fit<-lm(COST~SOLID+LIQUID+HOUSHOLD+RADIOACT)anova(fit)summary(fit)OBS SOLID LIQUID HOUSHOLD RADIOACT COSTOBSSOLIDLIQUIDHOUSHOLDRADIOACTCOSTWhich IV is likely to be the most important determinant of COST?Is it possible for the correlation coefficients to be negative in this case? Relevance to one-tailed and two-tailed tests.
15 Regression output Type I and Type III SS Analysis of Variance Table Response: COSTDf Sum Sq Mean Sq F value Pr(>F)SOLID e-09 ***LIQUID *HOUSHOLD **RADIOACT ***ResidualsCoefficients:Estimate Std. Error t value Pr(>|t|)(Intercept)SOLID e-06 ***LIQUIDHOUSHOLDRADIOACT ***Multiple R-squared: , Adjusted R-squared:Type I and Type III SSThe dependent variable is COST, it will necessarily increase with the amount of waste processed. So the correlation coefficients will not be negative, neither will the partial regression coefficients.Should the test of the significance of the correlation coefficients be one-tailed or two-tailed?
16 Sums of SquaresSS(MODEL) = SS(SOLID LIQUID HOUSHOLD RADIOACT I INT) =Type I SS(SOLID) = SS(SOLID | INT) =Type I SS(LIQUID) = SS(LIQUID | INT,SOLID) =Type I SS(HOUSHOLD) = SS(HOUSHOLD | INT,SOLID,LIQUID) =Type I SS(RADIOACT) = SS(RADIOACT | INT,SOLID,LIQUID,HOUSHOLD) =Type III SS(SOLID) = SS(SOLID | INT,LIQUID,HOUSHOLD,RADIOACT) =Type III SS(LIQUID) = (LIQUID | INT,SOLID,HOUSHOLD,RADIOACT) =Type III SS(HOUSHOLD) = SS(HOUSHOLD | INT,SOLID,LIQUID,RADIOACT) =Type III SS(RADIOACT)=SS(RADIOACT | INT,SOLID,LIQUID,HOUSHOLD)Type I SSs are called sequential because they cumulatively account for the variationrelated to the X variables. The Type I SSs add up to SS(MODEL).Type III SSs are called partial because they account for the variation related to an Xvariable apart from the variation related to all other variables in the model.Slide 16Xuhua Xia
17 PredictionSuppose we wish to know how much it would cost to operate a waste processing center that can handle 5.0 tons of SOLID, 3.5 tons of LIQUID, 0.5 tons of HOUSHOLD, and 0.4 tons of RADIOACT.new<-data.frame(SOLID=c(5),LIQUID=c(3.5),HOUSHOLD=c(0.5),RADIOACT=c(0.4))predict(fit,new,interval="confidence")fit lwr uprpredict(fit,new,interval="prediction")Slide 17Xuhua Xia