Presentation on theme: "Experimental design and analysis Multiple linear regression Gerry Quinn & Mick Keough, 1998 Do not copy or distribute without permission of authors."— Presentation transcript:
Experimental design and analysis Multiple linear regression Gerry Quinn & Mick Keough, 1998 Do not copy or distribute without permission of authors.
Multiple regression One response (dependent) variable: –Y–Y More than one predictor (independent variable) variable: –X 1, X 2, X 3 etc. –number of predictors = p Number of observations = n
Example A sample of 51 mammal species (n = 51) Response variable: –total sleep time in hrs/day (y) Predictors: –body weight in kg (x 1 ) –brain weight in g (x 2 ) –maximum life span in years (x 3 ) –gestation time in days (x 4 )
Regression models Population model (equation): y i = 0 + 1 x 1 + 2 x 2 +.... + i Sample equation: y i = b 0 + b 1 x 1 + b 2 x 2 +....
Multiple regression equation Total sleep Log lifespan Log body weight
Partial regression coefficients Ho: 1 = 0 Partial population regression coefficient (slope) for y on x 1, holding all other x’s constant, equals zero Example: –slope of regression of sleep against body weight, holding brain weight, max. life span and gestation time constant, is 0.
Partial regression coefficients Ho: 2 = 0 Partial population regression coefficient (slope) for y on x 2, holding all other x’s constant, equals zero Example: –slope of regression of sleep against brain weight, holding body weight, max. life span and gestation time constant, is 0.
Testing H O : i = 0 Use partial t-tests: t = b i / SEb i Compare with t-distribution with n-2 df Separate t-test for each partial regression coefficient in model Usual logic of t-tests: –reject H O if P < 0.05
Model comparison To test H O : 1 = 0 Fit full model: –y = 0 + 1 x 1 + 2 x 2 + 3 x 3 +… Fit reduced model: –y = 0 + 2 x 2 + 3 x 3 +… Calculate SS extra : –SS Regression(full) - SS Regression(reduced) F = MS extra / MS Residual(full)
Overall regression model Ho: 1 = 2 =... = 0 (all population slopes equal zero). Test of whether overall regression equation is significant. Use ANOVA F-test: –Variation explained by regression –Unexplained (residual) variation
Regression diagnostics Residual is still observed y - predicted y –Studentised residuals still work Other diagnostics still apply: –residual plots –Cook’s D statistics
Assumptions Normality and homogeneity of variance for response variable Independence of observations Linearity No collinearity
Collinearity Collinearity: –predictors correlated Assumption of no collinearity: –predictor variables are uncorrelated with (ie. independent of) each other Collinearity makes estimates of i ’s and their significance tests unreliable: –low power for individual tests on i ’s
Response (y) and 2 predictors (x 1 and x 2 ); n=20 1. x 1 and x 2 uncorrelated (r = -0.24) coeffsetoltP intercept-0.171.03-0.160.873 x 1 188.8.131.527.86<0.001 x 2 0.120.140.950.860.404 R 2 = 0.787, F = 31.38, P < 0.001 Collinearity
intercept0.490.720.690.503 x 1 1.551.210.011.280.219 x 2 -0.451.210.01-0.370.714 2. rearrange x 2 so x 1 and x 2 highly correlated (r = 0.99) coeffsetoltP R 2 = 0.780, F = 30.05, P < 0.001
Checks for collinearity Correlation matrix between predictors Tolerance for each predictor: –1-R 2 for regression of that predictor on all others –if tolerance is low (<0.1) then collinearity is a problem Variance inflation factor (VIF) for each predictor: –1/tolerance –if VIF>10 then collinearity is a problem
Explained variance R 2 proportion of variation in y explained by linear relationship with x 1, x 2 etc. SS Regression SS Total
Example SleepBodywtBrainwtLifespanGestime 3.36654.0005712.038.6645 12.53.38544.514.060 etc. African elephant Arctic fox etc.
Boxplots of variables
Collinearity problem for body weight and brain weight low tolerance highly correlated ParameterEstimateSEToltP Intercept18.943.116.09<0.001 Bodywt-0.761.310.08-0.580.565 Brainwt-0.842.030.05-0.420.680 Lifespan2.602.050.331.270.211 Gestime-5.111.810.36-2.820.007 R 2 = 0.486 Predictors log transformed
No collinearity between any predictors: all tolerances OK reduced SE and larger slope for body weight ParameterEstimateSEToltP Intercept19.063.076.21<0.001 Bodwt-1.250.590.36-2.090.042 Lifespan2.191.780.431.230.225 Gestime-5.391.670.42-3.230.002 R 2 = 0.484 Omit brain weight because body weight and brain weight are so highly correlated.