Presentation is loading. Please wait.

Presentation is loading. Please wait.

There is a hypothesis about dependent and independent variables The relation is supposed to be linear We have a hypothesis about the distribution of errors.

Similar presentations


Presentation on theme: "There is a hypothesis about dependent and independent variables The relation is supposed to be linear We have a hypothesis about the distribution of errors."— Presentation transcript:

1 There is a hypothesis about dependent and independent variables The relation is supposed to be linear We have a hypothesis about the distribution of errors around the hypothesized regression line There is a hypothesis about dependent and independent variables The relation is non-linear We have no data about the distribution of errors around the hypothesized regression line There is no clear hypothesis about dependent and independent variables The relation is non-linear We have no data about the distribution of errors around the hypothesized regression line Assumptions of linear regression

2 Least squares method Assumptions: A linear model applies The x-variable has no error term The distribution of the y errors around the regression line is normal

3 The second example is nonlinear We hypothesize the allometric relation W = aB z Linearised regression model Assumption: The distribution of errors is lognormal Nonlinear regression model Assumption: The distribution of errors is normal

4 Y=e 0.1X +norm(0;Y)Y=X 0.5 e norm(0;Y) In both cases we have some sort of autocorrelation Using logarithms reduces the effect of autocorrelation and makes the distribution of errors more homogeneous. Non linear estimation instead puts more weight on the larger y-values. If there is no autocorrelation the log-transformation puts more weight on smaller values.

5 Linear regression European bat species and environmental correlates

6 N=62 Matrix approach to linear regression X is not a square matrix, hence X -1 doesn’t exist.

7 The species – area relationship of European bats What about the part of variance explained by our model? 1.16: Average number of species per unit area (species density) 0.24: spatial species turnover

8

9 How to interpret the coefficient of determination Statistical testing is done by an F or a t-test. Total variance Rest (unexplained) variance Residual (explained) variance

10

11 The general linear model A model that assumes that a dependent variable Y can be expressed by a linear combination of predictor variables X is called a linear model. The vector E contains the error terms of each regression. Aim is to minimize E.

12 The general linear model If the errors of the preictor variables are Gaussian the error term e should also be Gaussian and means and variances are additive Total variance Explained variance Unexplained (rest) variance

13 1.Model formulation 2.Estimation of model parameters 3.Estimation of statistical significance Multiple regression

14 Multiple R and R 2

15 Adjusted R 2 R: correlation matrix n: number of cases k: number of independent variables in the model D<0 is statistically not significant and should be eliminated from the model.

16 A mixed model

17 The final model Is this model realistic? Very low species density (log-scale!) Realistic increase of species richness with area Increase of species richness with winter length Increase of species richness at higher latitudes A peak of species richness at intermediate latitudes The model makes realistic predictions. Problem might arise from the intercorrelation between the predictor variables (multicollinearity). We solve the problem by a step-wise approach eliminating the variables that are either not significant or give unreasonable parameter values The variance explanation of this final model is higher than that of the previous one.

18 Multiple regression solves systems of intrinsically linear algebraic equations The matrix X’X must not be singular. It est, the variables have to be independent. Otherwise we speak of multicollinearity. Collinearity of r<0.7 are in most cases tolerable. Multiple regression to be safely applied needs at least 10 times the number of cases than variables in the model. Statistical inference assumes that errors have a normal distribution around the mean. The model assumes linear (or algebraic) dependencies. Check first for non-linearities. Check the distribution of residuals Y exp -Y obs. This distribution should be random. Check the parameters whether they have realistic values. Multiple regression is a hypothesis testing and not a hypothesis generating technique!! Polynomial regression General additive model

19 Standardized coefficients of correlation Z-tranformed distributions have a mean of 0 an a standard deviation of 1. In the case of bivariate regression Y = aX+b, R xx = 1. Hence B=R XY. Hence the use of Z-transformed values results in standardized correlations coefficients, termed  -values


Download ppt "There is a hypothesis about dependent and independent variables The relation is supposed to be linear We have a hypothesis about the distribution of errors."

Similar presentations


Ads by Google