Presentation on theme: "More on regression Species richness pH Humidity Light Temper ature Organic matter content Species richness pH Humidity Light Temper ature Organic matter."— Presentation transcript:
More on regression Species richness pH Humidity Light Temper ature Organic matter content Species richness pH Humidity Light Temper ature Organic matter content Is it possible to infer causal relationships between model drivers from regression analysis? Is it possible to compare the goodness of different models? Is it possible to quantify the influence of different drivers?
Path analysis and linear structure models (Structure equation modelling SEM) Multiple regression Path analysis tries to do something that is logically impossible, to derive causal relationships from sets of observations. Path analysis defines a whole model and tries to separate correlations into direct and indirect effects The error term e contains the part of the variance in Y that is not explained by the model. These errors are called residuals Regression analysis does not study the relationships between the predictor variables
Path analysis is largely based on the computation of partial coefficients of correlation. Path coefficients Path analysis is a model confirmatory tool. It should not be used to generate models or even to seek for models that fit the data set. We start from regression functions
Using Z-transformed values we get eZ Y = 0 Z Y Z Y = 1 Z X Z Y = r XY Path analysis is a nice tool to generate hypotheses. It fails at low coefficients of correlation and circular model structures.
Species richness and soil characteristics of ground beetles Species richness pH Humidity Light Temper ature Organic matter content pLH pLT pTH pPHO pTS pHS pOS We formulate a model of causal relationships. We multiply each equation by the other variables. WE have seven unknowns and need seven linear equations.
NXABC 11.000.680.552.16 21.300.981.490.45 31.420.740.130.55 41.700.120.282.34 52.470.630.730.60 63.020.731.730.14 73.910.190.282.60 84.420.731.362.74 95.091.911.890.99 105.271.490.961.21 115.5184.108.40.206 126.340.841.311.01 136.641.722.570.92 147.320.871.173.21 R 2 is the explained variance in abivariate comparison
Logistic and other regression techniques We use odds The logistic regression model P defines a probability according to a logistic model 1 0.5 Threshold Surely males Surely females P GenderABC Female0.0380.1652.211 Female0.5000.9872.894 Female0.8640.7590.860 Female0.5901.0712.434 Female0.3850.7490.984 Female0.7030.8792.745 Female0.6291.0472.774 Male0.7300.7982.951 Male1.3671.8413.174 Male1.3250.8501.337 Male0.9581.5513.000 Male1.1731.1641.077 Male1.5591.5213.266 Male1.0271.2513.315 X0.9000.8562.345
GenderABC X0.9000.8562.345 a0a1a2a3a0 37.4252.9008.000-52.5 Y2.436 eYeY 11.43 p0.92 X is with probability 0.92 a male.
Regression trees RegionAMTTARRAIRAR Annual mean temperatu re Temperat ure range Annual mean precipitation Precipitati on range Argentina_South7.327.321730 Argentina_South7.925.737566 Argentina_South7.224.456894 Argentina_South7.123.8685104 Argentina_South7.426.528448 Argentina_South7.825.341674 Argentina_Pampas1530.236333 Argentina_Pampas15.13134232 Argentina_Pampas15.231.632030 Argentina_Pampas15.232.231326 Argentina_Pampas14.732.727527 Argentina_Pampas14.432.519417 Argentina_East18.631.824351 Argentina_East19.23035573
Root Australia Central Other 1229 RAR < 14.5 AMT < 11.15 Argentina South Other 6 23 AMT < 16.45 Argentina Pampas Other 6 17 RAI < 380 Argentina East Other 6 11 Regression tree analysis tries to groups cases according to predefined nominal and ordinal variables and returns variables levels that best group these cases. It uses a heuristic pattern seeking algorithm.
NXAB 11.000.680.55 21.300.981.49 31.420.740.13 41.700.120.28 52.470.630.73 63.020.731.73 73.910.190.28 84.420.731.36 95.091.911.89 105.271.490.96 115.581.111.14 126.340.841.31 136.641.722.57 147.320.871.17 What is the correlation between B and X? What is the pure correlation between B and X excluding the influence of A on both X and B? We need the partial correlation of X and B. AB X r AB r BX r AX
BB XX Partial regressions are the regression of residuals excluding a third factor.
r\pXABC X00.293710.177420.03325 A0.3307300.120240.3568 B0.417040.473200.27957 C0.61517-0.29216-0.339990 Coeff.Std.err.r2p Constant-0.5611.3170.679 A1.4251.2860.1090.294 B1.3880.9570.1740.177 C1.0650.4320.3780.033 R^2 0.000 0.262 0.291 0.094 Partial linear correlations The partial linear correlations of A, B, and C on X. To show the isolated influence of single predictors we show the squared partial correlation coefficients within linear regression results. Multiple regression results