Presentation is loading. Please wait.

Presentation is loading. Please wait.

Generalized Linear Models (GLM)

Similar presentations


Presentation on theme: "Generalized Linear Models (GLM)"— Presentation transcript:

1 Generalized Linear Models (GLM)
What are they? When do we use it? The full model The ANCOVA model The common regression model The extra sum of squares principle Assumptions University of Ottawa - Bio 4158 – Applied Biostatistics © Antoine Morin and Scott Findlay 16/04/2017 3:00 AM

2 What are General(ized) Linear Models
GLMs are models of the form: with Y, a vector of dependent variables, b, a vector of estimated coefficients, X, a vector of independent variables and e, a vector of error terms. Multivariate models Simple linear regression Multiple regression Analysis of variance (ANOVA) Analysis of covariance (ANCOVA) University of Ottawa - Bio 4158 – Applied Biostatistics © Antoine Morin and Scott Findlay 16/04/2017 3:00 AM

3 *either categorical or treated as a categorical variable
Some GLM procedures *either categorical or treated as a categorical variable University of Ottawa - Bio 4158 – Applied Biostatistics © Antoine Morin and Scott Findlay 16/04/2017 3:00 AM

4 When do we use ANCOVA? Body size Body mass to compare the relationship between a dependent (Y) and independent (X1) variable for different levels of one or more categorical variables (X2) e.g. relationship between body mass (Y) and body size (X1) for different taxonomic groups (birds & mammals, X2) University of Ottawa - Bio 4158 – Applied Biostatistics © Antoine Morin and Scott Findlay 16/04/2017 3:00 AM

5 When do we use ANCOVA? Y Qualitatively similar models In doing comparisons, we assume that the qualitative form of the model is the same for all levels of the categorical variables... …otherwise, one is comparing apples and oranges! X1 Y Qualitatively different models Level 1 of X2 Level 2 of X2 University of Ottawa - Bio 4158 – Applied Biostatistics © Antoine Morin and Scott Findlay 16/04/2017 3:00 AM

6 When do we use ANCOVA? ANCOVA is used to compare linear models …
X1 Y Non- linear models Linear models ANCOVA is used to compare linear models … … although ANCOVA-like extensions have been developed for nonlinear models. Level 1 of X2 Level 2 of X2 University of Ottawa - Bio 4158 – Applied Biostatistics © Antoine Morin and Scott Findlay 16/04/2017 3:00 AM

7 The simple regression model
The regression model is: So, all simple regression models are described by 2 parameters, the intercept (a) and slope (b). X DX DY ei Xi Yi a (intercept) b = DY/DX (slope) Observed Expected University of Ottawa - Bio 4158 – Applied Biostatistics © Antoine Morin and Scott Findlay 16/04/2017 3:00 AM

8 Simple GLMs Two linear models may differ as follows:
X1 Y Different a & b Two linear models may differ as follows: differences in both intercepts (a) and slopes (b) different intercepts but the same slopes (ANCOVA model) X1 Y Different a, same b University of Ottawa - Bio 4158 – Applied Biostatistics © Antoine Morin and Scott Findlay 16/04/2017 3:00 AM

9 Simple GLMs Two linear models may also differ as follows:
X1 Y Same a, different b Two linear models may also differ as follows: different slopes (b) but the same intercepts (a) same slopes and intercepts (common regression model) X1 Y Same a, same b University of Ottawa - Bio 4158 – Applied Biostatistics © Antoine Morin and Scott Findlay 16/04/2017 3:00 AM

10 Fitting GLMs Proceeds in hierarchical fashion fitting the most complex model first. Evaluate significance of a term by fitting two models: one with the term in, the other with it removed. Test for change in model fit (D MF) associated with removal of the term in question. Model A (term in) D MF Model B (term out) Retain term (D large) Delete term (D small) University of Ottawa - Bio 4158 – Applied Biostatistics © Antoine Morin and Scott Findlay 16/04/2017 3:00 AM

11 Model fitting: evaluating the significance of model terms
Fit higher order model (hom) including all possible terms; retain SSresidual and MSresidual . Fit reduced model (rm), retain SSresidual . Test for significance of removed term by computing: Higher order model Reduced F Delete term (p > .05) Retain term (p < .05) University of Ottawa - Bio 4158 – Applied Biostatistics © Antoine Morin and Scott Findlay 16/04/2017 3:00 AM

12 The full model with 2 independent variables
The full model is: bi is the slope of the regression of Y on X1 (the covariate) estimated for level i of the categorical variable X2 . ai is the difference between the mean of each level i of the categorical variable X2 and the overall mean. Level 1 of variable X2 Level 2 of variable X2 University of Ottawa - Bio 4158 – Applied Biostatistics © Antoine Morin and Scott Findlay 16/04/2017 3:00 AM

13 The full model : null hypotheses
For the full model with 2 independent variables, there are 3 null hypotheses: Level 1 of variable X2 Level 2 of variable X2 University of Ottawa - Bio 4158 – Applied Biostatistics © Antoine Morin and Scott Findlay 16/04/2017 3:00 AM

14 Y Y Y University of Ottawa - Bio 4158 – Applied Biostatistics
© Antoine Morin and Scott Findlay 16/04/2017 3:00 AM

15 Assumptions for full model hypothesis testing
Residuals are independent and normally distributed. Residual variance is equal for all values of X and independent of the value of the categorical variable (homoscedasticity). No error in independent variables Relationship between Y and covariate is linear. University of Ottawa - Bio 4158 – Applied Biostatistics © Antoine Morin and Scott Findlay 16/04/2017 3:00 AM

16 Procedure Fit full model, test for differences among slopes.
X1 Y Fit full model, test for differences among slopes. If H02 rejected, run separate regressions for each level of categorical variable(s). If H02 accepted, proceed to fit ANCOVA model. H02 accepted H02 rejected ANCOVA Separate regressions Level 1 of variable X2 Level 2 of variable X2 University of Ottawa - Bio 4158 – Applied Biostatistics © Antoine Morin and Scott Findlay 16/04/2017 3:00 AM

17 The ANCOVA model with 2 independent variables
Level 1 of variable X2 Level 2 of variable X2 m The full model is: b is the slope of the regression of Y on X1 (the covariate) pooled over levels of the categorical variable X2 . ai is the difference between the mean of each level i of the categorical variable X2 and the overall mean. University of Ottawa - Bio 4158 – Applied Biostatistics © Antoine Morin and Scott Findlay 16/04/2017 3:00 AM

18 The ANCOVA model: null hypotheses
Level 1 of variable X2 Level 2 of variable X2 m For the ANCOVA model with 2 independent variables, there are 2 null hypotheses: University of Ottawa - Bio 4158 – Applied Biostatistics © Antoine Morin and Scott Findlay 16/04/2017 3:00 AM

19 Y Y Y University of Ottawa - Bio 4158 – Applied Biostatistics
© Antoine Morin and Scott Findlay 16/04/2017 3:00 AM

20 Assumptions for hypothesis testing in ANCOVA model
Residuals are independent and normally distributed. Residual variance is equal for all values of X and independent of the value of the categorical variable (homoscedasticity). No error in independent variables Relationship between Y and covariate is linear. The slope of the regression of Y on X1 (the covariate) is the same for all levels of the categorical variable X2 (not an assumption for full model!). University of Ottawa - Bio 4158 – Applied Biostatistics © Antoine Morin and Scott Findlay 16/04/2017 3:00 AM

21 Procedure Fit ANCOVA model; test for differences among intercepts.
X1 Y Fit ANCOVA model; test for differences among intercepts. If H01 rejected, do multiple comparisons to see which intercepts differ (if there are more than 2 levels for X2). If H01 accepted, proceed to fit common regression model. H01 accepted H01 rejected Common regression Multiple comparisons Level 1 of variable X2 Level 2 of variable X2 University of Ottawa - Bio 4158 – Applied Biostatistics © Antoine Morin and Scott Findlay 16/04/2017 3:00 AM

22 The common regression model with 2 independent variables
Level 1 of variable X2 Level 2 of variable X2 The model is: b is the slope of the regression of Y on X1 pooled over levels of the categorical variable X2 . a is the pooled intercept. is the pooled average of X1. University of Ottawa - Bio 4158 – Applied Biostatistics © Antoine Morin and Scott Findlay 16/04/2017 3:00 AM

23 The common regression model : null hypotheses
Level 1 of variable X2 Level 2 of variable X2 a For the common regression model, there are 2 null hypotheses: University of Ottawa - Bio 4158 – Applied Biostatistics © Antoine Morin and Scott Findlay 16/04/2017 3:00 AM

24 Assumptions for hypothesis testing in common regression model
Residuals are independent and normally distributed. Residual variance is equal for all values of X. No error in independent variable Relationship between Y and X is linear. University of Ottawa - Bio 4158 – Applied Biostatistics © Antoine Morin and Scott Findlay 16/04/2017 3:00 AM

25 Example 1: effects of sex and age on sturgeon size at The Pas
University of Ottawa - Bio 4158 – Applied Biostatistics © Antoine Morin and Scott Findlay 16/04/2017 3:00 AM

26 Analysis Log(forklength)(LFKL) is dependent variable; log(age) (LAGE) is the covariate, and sex (SEX) is the categorical variable (2 levels). Q1: is slope of regression of LFKL on LAGE the same for both sexes? University of Ottawa - Bio 4158 – Applied Biostatistics © Antoine Morin and Scott Findlay 16/04/2017 3:00 AM

27 Effects of sex and age on size of sturgeon at The Pas
Type III Sum of Squares Df Sum of Sq Mean Sq F Value Pr(F) SEX LAGE SEX:LAGE Residuals Conclusion 1: slope of regression of LFKL on LAGE is the same for both sexes (accept H03 ) since p(SEX:LAGE) > .05 . Q2: is intercept the same for both males and females? University of Ottawa - Bio 4158 – Applied Biostatistics © Antoine Morin and Scott Findlay 16/04/2017 3:00 AM

28 Effects of sex and age on size of sturgeon at The Pas (ANCOVA model)
Type III Sum of Squares Df Sum of Sq Mean Sq F Value Pr(F) LAGE SEX Residuals Conclusion 2: Intercept is the same for both males and females. H02 is accepted since p(SEX > 0.05), implying that… …best model is common regression model. Note the reduction in residuals MS from full model to ANCOVA model ( to ) indicating that deleting a model term has a positive effect on model fit. University of Ottawa - Bio 4158 – Applied Biostatistics © Antoine Morin and Scott Findlay 16/04/2017 3:00 AM

29 Effects of sex and age on size of sturgeon at The Pas (common regression)
Coefficients: Value Std. Error t value Pr(>|t|) (Intercept) LAGE Residual standard error: on 90 degrees of freedom Multiple R-Squared: F-statistic: on 1 and 90 degrees of freedom, the p-value is 0 University of Ottawa - Bio 4158 – Applied Biostatistics © Antoine Morin and Scott Findlay 16/04/2017 3:00 AM

30 Example 2: Effect of location and age on sturgeon size
University of Ottawa - Bio 4158 – Applied Biostatistics © Antoine Morin and Scott Findlay 16/04/2017 3:00 AM

31 Analysis Log(forklength)(LFKL) is dependent variable; log(age) (LAGE)is the covariate, and location (LOCATE) is the categorical variable (2 levels). Q: is slope of regression of LFKL on LAGE the same at both locations? University of Ottawa - Bio 4158 – Applied Biostatistics © Antoine Morin and Scott Findlay 16/04/2017 3:00 AM

32 Effect of location and age on sturgeon size
Type III Sum of Squares Df Sum of Sq Mean Sq F Value Pr(F) LAGE LOCATE LAGE:LOCATE Conclusion: slope of regression of LFKL on LAGE is different at the two locations (reject H03 ) since p(LOCATION:LAGE) < .05 . So, should fit individual regressions for each location. University of Ottawa - Bio 4158 – Applied Biostatistics © Antoine Morin and Scott Findlay 16/04/2017 3:00 AM

33 What do you do if? More than 2 levels of categorical variable?
Follow above procedure but if H03 (same slope) rejected, do pairwise contrasts of individual slopes. If H03 accepted but H02 (same intercepts) rejected, do pairwise comparisons of intercepts. Always control for experiment-wise Type I error rate. Y X University of Ottawa - Bio 4158 – Applied Biostatistics © Antoine Morin and Scott Findlay 16/04/2017 3:00 AM

34 What do you do if? Biological hypothesis implies one-tailed null(s)? Follow above procedure but if H03 (same slope) rejected, do one-tailed pairwise contrasts of individual slopes. If H03 accepted but H02 (same intercepts) rejected, do one-tailed pairwise comparisons of intercepts. Y X University of Ottawa - Bio 4158 – Applied Biostatistics © Antoine Morin and Scott Findlay 16/04/2017 3:00 AM

35 In any GLM, hypotheses are tested by means of an F-test.
Remember: the appropriate SSerror and dferror depends on the type of analysis and the hypothesis under investigation. Knowing F, we can compute R2, the proportion of the total variance in Y explained by the factor (source) under consideration. Power analysis in GLM University of Ottawa - Bio 4158 – Applied Biostatistics © Antoine Morin and Scott Findlay 16/04/2017 3:00 AM

36 Proportion of variance
Partial and total R2 Proportion of variance accounted for by both A and B (R2Y•A,B) The total R2 (R2Y•B) is the proportion of variance in Y accounted for (explained by) a set of independent variables B. The partial R2 (R2Y•A,B- R2Y•A ) is the proportion of variance in Y accounted for by B when the variance accounted for by another set A is removed. Proportion of variance accounted for by A only (R2Y•A)(total R2) Proportion of variance accounted for by B independent of A (R2Y•A,B- R2Y•A ) (partial R2) University of Ottawa - Bio 4158 – Applied Biostatistics © Antoine Morin and Scott Findlay 16/04/2017 3:00 AM

37 Partial and total R2 Proportion of variance accounted for by B (R2Y•B)(total R2) Proportion of variance independent of A (R2Y•A,B- R2Y•A ) (partial R2) The total R2 (R2Y•B) for set B equals the partial R2 (R2Y•A,B- R2Y•A ) for set B if either (1) the total R2 for A (R2Y•A) is zero; or (2) if A and B are independent (in which case R2Y•A,B= R2Y•A + R2Y•B). Equal iff A Y B University of Ottawa - Bio 4158 – Applied Biostatistics © Antoine Morin and Scott Findlay 16/04/2017 3:00 AM

38 Partial and total R2 X Y In simple linear regression and single-factor ANOVA, there is only one independent variable X (either continuous or categorical). In these cases, set B includes only one variable X and total R2 (R2Y•B) = total R2 (R2Y•X) and the partial and total R2 are the same. Water temperature (°C) 16 20 24 28 0.00 0.04 0.08 0.12 0.16 0.20 Growth rate l (cm/day) University of Ottawa - Bio 4158 – Applied Biostatistics © Antoine Morin and Scott Findlay 16/04/2017 3:00 AM

39 Partial and total R2 X1 Y In ANCOVA and multiple-factor ANOVA, there are several independent variables X1, X2, ... (either continuous or categorical), so set B includes several variables. In this case, the total and partial R2 may be very different. Water temperature (°C) 16 20 24 28 0.00 0.04 0.08 0.12 0.16 0.20 Growth rate l (cm/day) pH = 6.5 pH = 4.5 University of Ottawa - Bio 4158 – Applied Biostatistics © Antoine Morin and Scott Findlay 16/04/2017 3:00 AM

40 Defining effect size in GLM
The effect size, denoted f2, is given by the ratio of the factor (source) R2factor and 1 minus the appropriate error R2error. Note: both R2factor and R2error depend on the null hypothesis under investigation. University of Ottawa - Bio 4158 – Applied Biostatistics © Antoine Morin and Scott Findlay 16/04/2017 3:00 AM

41 Effects of sex and age on size of sturgeon at The Pas (common regression)
Coefficients: Value Std. Error t value Pr(>|t|) (Intercept) LAGE Residual standard error: on 90 degrees of freedom Multiple R-Squared: F-statistic: on 1 and 90 degrees of freedom, the p-value is 0 University of Ottawa - Bio 4158 – Applied Biostatistics © Antoine Morin and Scott Findlay 16/04/2017 3:00 AM

42 Defining effect size in GLM: case 1
Case 1: a set B is related to Y, and the total R2 (R2Y•B) is determined. The error variance proportion is then R2Y•B . H0: R2Y•B = 0 Example: effect of age on sturgeon size at The Pas B = {LAGE} University of Ottawa - Bio 4158 – Applied Biostatistics © Antoine Morin and Scott Findlay 16/04/2017 3:00 AM

43 Effects of sex and age on size of sturgeon at The Pas
Type III Sum of Squares Df Sum of Sq Mean Sq F Value Pr(F) SEX LAGE SEX:LAGE Residuals Multiple R2 = 0.697 University of Ottawa - Bio 4158 – Applied Biostatistics © Antoine Morin and Scott Findlay 16/04/2017 3:00 AM

44 Effects of sex and age on size of sturgeon at The Pas (ANCOVA model)
Type III Sum of Squares Df Sum of Sq Mean Sq F Value Pr(F) LAGE SEX Residuals Multiple R2 = 0.696 University of Ottawa - Bio 4158 – Applied Biostatistics © Antoine Morin and Scott Findlay 16/04/2017 3:00 AM

45 Defining effect size in GLM: case 2
Case 2: the proportion of variance of Y due to B over and above that due to A is determined (R2Y•A,B- R2Y•A ). The error variance proportion is then 1- R2Y•A,B . H0: R2Y•A,B- R2Y•A = 0 Example: effect of SEX$*LAGE on sturgeon size at The Pas B ={SEX$*LAGE}, A,B = {SEX$, LAGE, SEX$*LAGE} University of Ottawa - Bio 4158 – Applied Biostatistics © Antoine Morin and Scott Findlay 16/04/2017 3:00 AM

46 Determining power 1-b n1 = 2
Once f2 has been determined, either a priori (as an alternate hypothesis) or a posteriori (the observed effect size), calculate non-central F parameter f . Knowing f and factor (source) (n1) and error (n2) degrees of freedom, we can determine power from appropriate tables for given a. Decreasing n2 1-b n1 = 2 a = .05 2 3 4 5 a = .01 1 1.5 2.5 f(a = .05) f(a = .01) University of Ottawa - Bio 4158 – Applied Biostatistics © Antoine Morin and Scott Findlay 16/04/2017 3:00 AM

47 Example: effect of pH and nutrient levels on growth rate of bass
Sample of 35 lakes 3 pH levels: acid, circumneutral, basic For each lake, an estimate of growth rate is obtained (e.g. from size-age regression). What is probability of detecting a true effect size as large as the sample effect size for pH*N once effects of N and pH have been controlled for, given a = .05? University of Ottawa - Bio 4158 – Applied Biostatistics © Antoine Morin and Scott Findlay 16/04/2017 3:00 AM

48 Example: effect of pH and nutrient levels on growth rate of bass
Sample effect size f2 for pH once effects of N and pH*N have been controlled for = 0.14 Source (pH) df = n1 = 2; error df = n2 = = 29 Use tables of f based on R2 to get power (NOT the same tables as for ANOVA). University of Ottawa - Bio 4158 – Applied Biostatistics © Antoine Morin and Scott Findlay 16/04/2017 3:00 AM


Download ppt "Generalized Linear Models (GLM)"

Similar presentations


Ads by Google