Multiple and complex regression. Extensions of simple linear regression Multiple regression models: predictor variables are continuous Analysis of variance:

Presentation on theme: "Multiple and complex regression. Extensions of simple linear regression Multiple regression models: predictor variables are continuous Analysis of variance:"— Presentation transcript:

Multiple and complex regression

Extensions of simple linear regression Multiple regression models: predictor variables are continuous Analysis of variance: predictor variables are categorical (grouping variables), But… general linear models can include both continuous and categorical predictors

Relative abundance of C 3 and C 4 plants Paruelo & Lauenroth (1996) Geographic distribution and the effects of climate variables on the relative abundance of a number of plant functional types (PFTs): shrubs, forbs, succulents, C 3 grasses and C 4 grasses.

data Relative abundance of PTFs (based on cover, biomass, and primary production) for each site Longitude Latitude Mean annual temperature Mean annual precipitation Winter (%) precipitation Summer (%) precipitation Biomes (grassland, shrubland) 73 sites across temperate central North America Response variablePredictor variables

Box 6.1 Relative abundance transformed ln(dat+1) because positively skewed

Comparing l 10 vs ln

Collinearity Causes computational problems because it makes the determinant of the matrix of X-variables close to zero and matrix inversion basically involves dividing by the determinant (very sensitive to small differences in the numbers) Standard errors of the estimated regression slopes are inflated

Detecting collinearlity Check tolerance values Plot the variables Examine a matrix of correlation coefficients between predictor variables

Dealing with collinearity Omit predictor variables if they are highly correlated with other predictor variables that remain in the model

(lnC 3 )= β o + β 1 (lat)+ β 2 (long)+ β 3 (latxlong) After centering both lat and long

R 2 =0.514

Analysis of variance Source of variation SSdfMS RegressionΣ(y hat -Y) 2 p p ResidualΣ(y obs -y hat ) 2 n-p-1Σ(y obs -y hat ) 2 n-p-1 TotalΣ(y obs -Y) 2 n-1

Matrix algebra approach to OLS estimation of multiple regression models Y=βX+ε XXb=XY b=(XX) -1 (XY)

The forward selection is

The backward selection is

Criteria for best fitting in multiple regression with p predictors. CriterionFormula r2r2 Adjusted r 2 Akaike Information Criteria AIC

Hierarchical partitioning and model selection No predModelr2r2 Adjr 2 AIC (R)AIC 1 Lon 0.00005-0.01449.179-165.10 1 Lat 0.46190.4543.942-204.44 2 Lon + Lat 0.46710.45195.220-201.20 3 Long +Lat + Lon x Lat 0.51370.49260.437-209.69

Similar presentations