Presentation is loading. Please wait.

Presentation is loading. Please wait.

Linear regression models. Simple Linear Regression.

Similar presentations


Presentation on theme: "Linear regression models. Simple Linear Regression."— Presentation transcript:

1 Linear regression models

2 Simple Linear Regression

3 History Developed by Sir Francis Galton ( ) in his article “Regression towards mediocrity in hereditary structure”

4 Purposes: To describe the linear relationship between two continuous variables, the response variable (y- axis) and a single predictor variable (x-axis) To determine how much of the variation in Y can be explained by the linear relationship with X and how much of this relationship remains unexplained To predict new values of Y from new values of X

5 The linear regression model is: X i and Y i are paired observations (i = 1 to n) β 0 = population intercept (when X i =0) β 1 = population slope (measures the change in Y i per unit change in X i ) ε i = the random or unexplained error associated with the i th observation. The ε i are assumed to be independent and distributed as N(0, σ 2 ).

6 Linear relationship Y X ß0ß0 ß1ß1 1.0

7 Linear models approximate non-linear functions over a limited domain extrapolation interpolation

8 Y i = β o + β 1 *X i + ε i ε ~ N(0,σ 2 )  E(ε i ) = 0 E(Y i ) = β o + β 1 *X i X1X1 X2X2 E(Y 1 ) E(Y 2 ) Y X For a given value of X, the sampled Y values are independent with normally distributed errors:

9 YiYi ŶiŶi Y i – Ŷ i = ε i (residual) XiXi Fitting data to a linear model:

10 The residual The residual sum of squares

11 Estimating Regression Parameters The “best fit” estimates for the regression population parameters (β 0 and β 1 ) are the values that minimize the residual sum of squares (SS residual ) between each observed value and the predicted value of the model:

12 Sum of squares Sum of cross products

13 Least-squares parameter estimates where

14 Sample variance of X: Sample covariance:

15 Thus, our estimated regression equation is: Solving for the intercept:

16 Hypothesis Tests with Regression Null hypothesis is that there is no linear relationship between X and Y: H 0 : β 1 = 0  Y i = β 0 + ε i H A : β 1 ≠ 0  Y i = β 0 + β 1 X i + ε i We can use an F-ratio (i.e., the ratio of variances) to test these hypotheses

17 Variance of the error of regression: NOTE: this is also referred to as residual variance, mean squared error (MSE) or residual mean square (MS residual )

18 Mean square of regression: The F-ratio is: (MS Regression )/(MS Residual ) This ratio follows the F-distribution with (1, n-2) degrees of freedom

19 Variance components and Coefficient of determination

20 Coefficient of determination

21 ANOVA table for regression SourceDegrees of freedom Sum of squaresMean square Expected mean square F ratio Regression 1 Residual n-2 Total n-1

22 Product-moment correlation coefficient

23 Parametric Confidence Intervals If we assume our parameter of interest has a particular sampling distribution and we have estimated its expected value and variance, we can construct a confidence interval for a given percentile. Example: if we assume Y is a normal random variable with unknown mean μ and variance σ 2, then is distributed as a standard normal variable. But, since we don’t know σ, we must divide by the standard error instead:, giving us a t- distribution with (n-1) degrees of freedom. The 100(1-α)% confidence interval for μ is then given by: IMPORTANT: this does not mean “There is a 100(1-α)% chance that the true population mean μ occurs inside this interval.” It means that if we were to repeatedly sample the population in the same way, 100(1-α)% of the confidence intervals would contain the true population mean μ.

24 Publication form of ANOVA table for regression Source Sum of Squaresdf Mean SquareFSig. Regression Residual Total

25 Variance of estimated intercept

26 Variance of the slope estimator

27 Variance of the fitted value

28 Variance of the predicted value (Ỹ):

29 Regression

30 Assumptions of regression The linear model correctly describes the functional relationship between X and Y The X variable is measured without error For a given value of X, the sampled Y values are independent with normally distributed errors Variances are constant along the regression line

31 Residual plot for species-area relationship


Download ppt "Linear regression models. Simple Linear Regression."

Similar presentations


Ads by Google