# Experimental design and analysis Multiple linear regression  Gerry Quinn & Mick Keough, 1998 Do not copy or distribute without permission of authors.

## Presentation on theme: "Experimental design and analysis Multiple linear regression  Gerry Quinn & Mick Keough, 1998 Do not copy or distribute without permission of authors."— Presentation transcript:

Experimental design and analysis Multiple linear regression  Gerry Quinn & Mick Keough, 1998 Do not copy or distribute without permission of authors.

Multiple regression One response (dependent) variable: –Y–Y More than one predictor (independent variable) variable: –X 1, X 2, X 3 etc. –number of predictors = p Number of observations = n

Example A sample of 51 mammal species (n = 51) Response variable: –total sleep time in hrs/day (y) Predictors: –body weight in kg (x 1 ) –brain weight in g (x 2 ) –maximum life span in years (x 3 ) –gestation time in days (x 4 )

Regression models Population model (equation): y i =  0 +  1 x 1 +  2 x 2 +.... +  i Sample equation: y i = b 0 + b 1 x 1 + b 2 x 2 +....

Example Regression model: sleep = intercept +  1 *bodywt +  2 *brainwt +  3 *lifespan +  4 *gestime

Multiple regression equation Total sleep Log lifespan Log body weight

Partial regression coefficients Ho:  1 = 0 Partial population regression coefficient (slope) for y on x 1, holding all other x’s constant, equals zero Example: –slope of regression of sleep against body weight, holding brain weight, max. life span and gestation time constant, is 0.

Partial regression coefficients Ho:  2 = 0 Partial population regression coefficient (slope) for y on x 2, holding all other x’s constant, equals zero Example: –slope of regression of sleep against brain weight, holding body weight, max. life span and gestation time constant, is 0.

Testing H O :  i = 0 Use partial t-tests: t = b i / SEb i Compare with t-distribution with n-2 df Separate t-test for each partial regression coefficient in model Usual logic of t-tests: –reject H O if P < 0.05

Model comparison To test H O :  1 = 0 Fit full model: –y =  0 +  1 x 1 +  2 x 2 +  3 x 3 +… Fit reduced model: –y =  0 +  2 x 2 +  3 x 3 +… Calculate SS extra : –SS Regression(full) - SS Regression(reduced) F = MS extra / MS Residual(full)

Overall regression model Ho:  1 =  2 =... = 0 (all population slopes equal zero). Test of whether overall regression equation is significant. Use ANOVA F-test: –Variation explained by regression –Unexplained (residual) variation

Regression diagnostics Residual is still observed y - predicted y –Studentised residuals still work Other diagnostics still apply: –residual plots –Cook’s D statistics

Assumptions Normality and homogeneity of variance for response variable Independence of observations Linearity No collinearity

Collinearity Collinearity: –predictors correlated Assumption of no collinearity: –predictor variables are uncorrelated with (ie. independent of) each other Collinearity makes estimates of  i ’s and their significance tests unreliable: –low power for individual tests on  i ’s

Response (y) and 2 predictors (x 1 and x 2 ); n=20 1. x 1 and x 2 uncorrelated (r = -0.24) coeffsetoltP intercept-0.171.03-0.160.873 x 1 1.130.140.957.86<0.001 x 2 0.120.140.950.860.404 R 2 = 0.787, F = 31.38, P < 0.001 Collinearity

intercept0.490.720.690.503 x 1 1.551.210.011.280.219 x 2 -0.451.210.01-0.370.714 2. rearrange x 2 so x 1 and x 2 highly correlated (r = 0.99) coeffsetoltP R 2 = 0.780, F = 30.05, P < 0.001

Checks for collinearity Correlation matrix between predictors Tolerance for each predictor: –1-R 2 for regression of that predictor on all others –if tolerance is low (<0.1) then collinearity is a problem Variance inflation factor (VIF) for each predictor: –1/tolerance –if VIF>10 then collinearity is a problem

Explained variance R 2 proportion of variation in y explained by linear relationship with x 1, x 2 etc. SS Regression SS Total

Example SleepBodywtBrainwtLifespanGestime 3.36654.0005712.038.6645 12.53.38544.514.060 etc. African elephant Arctic fox etc.

Boxplots of variables

Collinearity problem for body weight and brain weight low tolerance highly correlated ParameterEstimateSEToltP Intercept18.943.116.09<0.001 Bodywt-0.761.310.08-0.580.565 Brainwt-0.842.030.05-0.420.680 Lifespan2.602.050.331.270.211 Gestime-5.111.810.36-2.820.007 R 2 = 0.486 Predictors log transformed

No collinearity between any predictors: all tolerances OK reduced SE and larger slope for body weight ParameterEstimateSEToltP Intercept19.063.076.21<0.001 Bodwt-1.250.590.36-2.090.042 Lifespan2.191.780.431.230.225 Gestime-5.391.670.42-3.230.002 R 2 = 0.484 Omit brain weight because body weight and brain weight are so highly correlated.

Examples from literature

Lampert (1993) Ecology 74:1455-1466 Response variable: –Daphnia (water flea) clutch size Predictors: –body size (mm) –particulate organic carbon (mg/L) –temperature ( o C)

Lampert (1993) ParameterCoeff.SEtP Intercept-42.3427.52-1.540.168 Body size14.767.102.080.076 POC0.270.430.610.559 Temp0.730.681.070.321 ANOVA P = 0.052, R 2 = 0.684, n = 11

Williams et al. (1993) Ecology 74:904-918 Response variable: –Zostera (seagrass) growth Predictors: –epiphyte biomass –porewater ammonium

Williams et al. (1993) ParameterCoeff.P Epiphyte biomass0.340>0.05 Porewater ammonium0.919<0.05 R 2 = 0.71 Tolerance = 0.839 (so no collinearity)

Download ppt "Experimental design and analysis Multiple linear regression  Gerry Quinn & Mick Keough, 1998 Do not copy or distribute without permission of authors."

Similar presentations