Presentation on theme: "More on regression Species richness pH Humidity Light Temper ature Organic matter content Species richness pH Humidity Light Temper ature Organic matter."— Presentation transcript:
More on regression Species richness pH Humidity Light Temper ature Organic matter content Species richness pH Humidity Light Temper ature Organic matter content Is it possible to infer causal relationships between model drivers from regression analysis? Is it possible to compare the goodness of different models? Is it possible to quantify the influence of different drivers?
Path analysis and linear structure models (Structure equation modelling SEM) Multiple regression Path analysis tries to do something that is logically impossible, to derive causal relationships from sets of observations. Path analysis defines a whole model and tries to separate correlations into direct and indirect effects The error term e contains the part of the variance in Y that is not explained by the model. These errors are called residuals Regression analysis does not study the relationships between the predictor variables
Path analysis is largely based on the computation of partial coefficients of correlation. Path coefficients Path analysis is a model confirmatory tool. It should not be used to generate models or even to seek for models that fit the data set. We start from regression functions
Using Z-transformed values we get eZ Y = 0 Z Y Z Y = 1 Z X Z Y = r XY Path analysis is a nice tool to generate hypotheses. It fails at low coefficients of correlation and circular model structures.
Species richness and soil characteristics of ground beetles Species richness pH Humidity Light Temper ature Organic matter content pLH pLT pTH pPHO pTS pHS pOS We formulate a model of causal relationships. We multiply each equation by the other variables. WE have seven unknowns and need seven linear equations.
NXABC R 2 is the explained variance in abivariate comparison
Logistic and other regression techniques We use odds The logistic regression model P defines a probability according to a logistic model Threshold Surely males Surely females P GenderABC Female Female Female Female Female Female Female Male Male Male Male Male Male Male X
GenderABC X a0a1a2a3a Y2.436 eYeY p0.92 X is with probability 0.92 a male.
Regression trees RegionAMTTARRAIRAR Annual mean temperatu re Temperat ure range Annual mean precipitation Precipitati on range Argentina_South Argentina_South Argentina_South Argentina_South Argentina_South Argentina_South Argentina_Pampas Argentina_Pampas Argentina_Pampas Argentina_Pampas Argentina_Pampas Argentina_Pampas Argentina_East Argentina_East
Root Australia Central Other 1229 RAR < 14.5 AMT < Argentina South Other 6 23 AMT < Argentina Pampas Other 6 17 RAI < 380 Argentina East Other 6 11 Regression tree analysis tries to groups cases according to predefined nominal and ordinal variables and returns variables levels that best group these cases. It uses a heuristic pattern seeking algorithm.
NXAB What is the correlation between B and X? What is the pure correlation between B and X excluding the influence of A on both X and B? We need the partial correlation of X and B. AB X r AB r BX r AX
BB XX Partial regressions are the regression of residuals excluding a third factor.
r\pXABC X A B C Coeff.Std.err.r2p Constant A B C R^ Partial linear correlations The partial linear correlations of A, B, and C on X. To show the isolated influence of single predictors we show the squared partial correlation coefficients within linear regression results. Multiple regression results