Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Experimental design and analyses of experimental data Lesson 6 Logistic regression Generalized Linear Models (GENMOD)

Similar presentations


Presentation on theme: "1 Experimental design and analyses of experimental data Lesson 6 Logistic regression Generalized Linear Models (GENMOD)"— Presentation transcript:

1 1 Experimental design and analyses of experimental data Lesson 6 Logistic regression Generalized Linear Models (GENMOD)

2 2 Logistic regression Used when data are dichotomous. Used when data are fractions between 0 and 1

3 3 Example: The distance from the nest to the nearest nest of Herring gull? On the vegetation surrounding the nest? On the number of eggs in the nest? Does predation of eggs in nests of Oyster catcher depend on

4 4 OBS DIST EGGS VEG KILLED 1 0.5 3 B 3 2 1.0 7 C 5 3 5.7 5 B 1 4 3.8 9 A 6 5 3.0 7 C 5 6 6.1 8 A 3........ 57 3.3 3 A 3 Data:

5 5 Analysis of dichotomous data : Nests are categorized according to whether predation has occurred or not. No predation is scored as 0 Predation is scored as 1

6 6 Plus/minus predator visit to Oyster catcher nest

7 7 The purpose is to fit a model to the data – a model that predicts the probability of a nest being predated

8 8 The logistic regression model: where and ε BIN(0, π(1-π)) The logit-transformation The odds (the ratio between the probability of a positive and a negative event)

9 9 y =0 So that

10 10 How to do it in SAS

11 11 DATA logist; OPTIONS LINESIZE = 90; /* Example on logistic regression */ /* The example is inspirered by Dorthe Lahrmann's investigations of Oyster catchers (strandskader) on Langli in Ho Bugt */ INFILE 'h:\lin-mod\logist.prn' FIRSTOBS=2; INPUT dist eggs veg $ killed; /* dist = Distance to the nearest nest of Herring gull (sølvmåge)*/ /* eggs = Number of Oyster catcher eggs in a nest */ /* veg = vegetation type surrounding an Oyster catcher nest*/ IF killed > 0 THEN visit= 1; IF killed = 0 THEN visit = 0; /* If killed > 0 then the nest has been visited by a predator at least once */

12 12 /* Eksempel A: Analysis of a nest has been visited or not-visited by predators, i.e. visit = 1 or 0 */ PROC GENMOD; /* The procedure is Generalized Linear Models */ TITLE 'Eksempel A'; CLASS veg; /* veg is a class variable */ MODEL visit = dist veg /DIST=binomial LINK=logit TYPE3 DSCALE OBSTATS; /* DIST = distribution function (here chosen as binomial) */ /* LINK = the model uses a logit-transformation of data */ /* TYPE3 = type 3 is used in order to evaluate the relative contribution of the different factors on the independent variable */ /* DSCALE = an option which tells SAS to scale the error in order to meet the demands of the model. If DSCALE is approximately 1, scaling is not needed. */ /* OBSTATS = gives the predicted values as well as their confidence limits */ RUN;

13 13 Eksempel A 10:19 Thursday, November 22, 2001 87 The GENMOD Procedure Model Information Description Value Data Set WORK.LOGIST Distribution BINOMIAL Link Function LOGIT Dependent Variable VISIT Observations Used 57 Number Of Events 52 Number Of Trials 57 Class Level Information Class Levels Values VEG 3 A B C

14 14 Criteria For Assessing Goodness Of Fit Criterion DF Value Value/DF Deviance 53 20.2819 0.3827 Scaled Deviance 53 53.0000 1.0000 Pearson Chi-Square 53 22.2740 0.4203 Scaled Pearson X2 53 58.2057 1.0982 Log Likelihood. -26.5000. These values indicate the fit of the model. Low values (for a given DF) indicate a good fit These values should be close to unity if the model’s assumptions are met Values less than unity indicate underdispersion (variance less than expected) Values greater than unity indicate overdispersion (variance greater than expected) Values after scaling with DSCALE

15 15 Analysis Of Parameter Estimates Parameter DF Estimate Std Err ChiSquare Pr>Chi INTERCEPT 1 8.5639 2.1271 16.2093 0.0001 DIST 1 -1.0032 0.2651 14.3173 0.0002 VEG A 1 0.2489 0.9555 0.0678 0.7945 VEG B 1 0.4370 0.9250 0.2232 0.6366 VEG C 0 0.0000 0.0000.. SCALE 0 0.6186 0.0000.. NOTE: The scale parameter was estimated by the square root of DEVIANCE/DOF. LR Statistics For Type 3 Analysis Source NDF DDF F Pr>F ChiSquare Pr>Chi DIST 1 53 34.8596 0.0001 34.8596 0.0001 VEG 2 53 0.1118 0.8944 0.2237 0.8942

16 16 Criteria For Assessing Goodness Of Fit Criterion DF Value Value/DF Deviance 55 20.3675 0.3703 Scaled Deviance 55 55.0000 1.0000 Pearson Chi-Square 55 21.6364 0.3934 Scaled Pearson X2 55 58.4265 1.0623 Log Likelihood. -27.5000. Analysis Of Parameter Estimates Parameter DF Estimate Std Err ChiSquare Pr>Chi INTERCEPT 1 8.8288 2.0182 19.1363 0.0001 DIST 1 -1.0012 0.2587 14.9777 0.0001 SCALE 0 0.6085 0.0000.. NOTE: The scale parameter was estimated by the square root of DEVIANCE/DOF. LR Statistics For Type 3 Analysis Source NDF DDF F Pr>F ChiSquare Pr>Chi DIST 1 55 36.4999 0.0001 36.4999 0.0001

17 17 Observation Statistics VISIT Pred Xbeta Std HessWgt Lower Upper Resraw 1 0.9998 8.3283 1.8909 0.000652 0.9903 1.0000 0.000242 1 0.9996 7.8277 1.7639 0.001075 0.9875 1.0000 0.000398 1 0.9578 3.1222 0.6185 0.1091 0.8710 0.9871 0.0422 1 0.9935 5.0244 1.0628 0.0175 0.9498 0.9992 0.006533 1 0.9971 5.8253 1.2605 0.007924 0.9663 0.9998 0.002943 1 0.9383 2.7217 0.5356 0.1563 0.8418 0.9775 0.0617 1 0.9971 5.8253 1.2605 0.007924 0.9663 0.9998 0.002943 1 0.9973 5.9255 1.2854 0.007173 0.9679 0.9998 0.002663 0 0.3358 -0.6822 0.5813 0.6023 0.1392 0.6123 -0.3358 1 0.9764 3.7229 0.7525 0.0622 0.9045 0.9945 0.0236 0 0.7150..........................................

18 18 Predicted values and 95% confidence limits

19 19 /* Example B: Analysis of the fraction of eggs in a nest that are lost */ PROC GENMOD; /* procedure is Generalized Linear Models */ TITLE 'Eksempel B'; CLASS veg; /* veg is a class variable */ MODEL killed/eggs = dist veg eggs/DIST=binomial LINK=logit TYPE3 DSCALE OBSTATS; /* DIST = distribution function (here chosen as binomial) */ /* LINK = the model uses a logit-transformation of data */ /* TYPE3 = SS3 is used to determine the contribution of the individual factors to the dependent variable */ /* DSCALE = option that can be used if Deviance/DF is different from 1. It reduces the risk of Type 1 errors if the scale parameter is > 1 og the risk of a Type II errors, if the scale parameter is < 1 */ /* OBSTATS = gives the predicted values, and the confidence limits */ RUN; Note that this procedure takes the absolute number of eggs killed out of the total number of eggs into consideration, and not merely the proportion of killed eggs

20 20 Eksempel B 12:26 Thursday, November 22, 2001 7 The GENMOD Procedure Model Information Description Value Data Set WORK.LOGIST Distribution BINOMIAL Link Function LOGIT Dependent Variable KILLED Dependent Variable EGGS Observations Used 57 Number Of Events 183 Number Of Trials 336 Class Level Information Class Levels Values VEG 3 A B C

21 21 Criteria For Assessing Goodness Of Fit Criterion DF Value Value/DF Deviance 52 53.9491 1.0375 Scaled Deviance 52 52.0000 1.0000 Pearson Chi-Square 52 44.1413 0.8489 Scaled Pearson X2 52 42.5465 0.8182 Log Likelihood. -171.3777.

22 22 Analysis Of Parameter Estimates Parameter DF Estimate Std Err ChiSquare Pr>Chi INTERCEPT 1 2.6437 0.5644 21.9369 0.0001 DIST 1 -0.5284 0.0623 71.9060 0.0001 VEG A 1 0.1425 0.3629 0.1541 0.6946 VEG B 1 0.1623 0.3602 0.2029 0.6524 VEG C 0 0.0000 0.0000.. EGGS 1 -0.0314 0.0637 0.2433 0.6219 SCALE 0 1.0186 0.0000.. NOTE: The scale parameter was estimated by the square root of DEVIANCE/DOF. LR Statistics For Type 3 Analysis Source NDF DDF F Pr>F ChiSquare Pr>Chi DIST 1 52 97.2164 0.0001 97.2164 0.0001 VEG 2 52 0.1135 0.8929 0.2271 0.8927 EGGS 1 52 0.2443 0.6232 0.2443 0.6211

23 23 Criteria For Assessing Goodness Of Fit Criterion DF Value Value/DF Deviance 55 54.5182 0.9912 Scaled Deviance 55 55.0000 1.0000 Pearson Chi-Square 55 45.0882 0.8198 Scaled Pearson X2 55 45.4867 0.8270 Log Likelihood. -179.6600. Analysis Of Parameter Estimates Parameter DF Estimate Std Err ChiSquare Pr>Chi INTERCEPT 1 2.5156 0.2950 72.7128 0.0001 DIST 1 -0.5212 0.0589 78.3656 0.0001 SCALE 0 0.9956 0.0000.. NOTE: The scale parameter was estimated by the square root of DEVIANCE/DOF. LR Statistics For Type 3 Analysis Source NDF DDF F Pr>F ChiSquare Pr>Chi DIST 1 55 107.8859 0.0001 107.8859 0.0001

24 24 Predicted values and 95% confidence limits

25 25 Criteria For Assessing Goodness Of Fit Criterion DF Value Value/DF Deviance 52 53.9491 1.0375 Scaled Deviance 52 52.0000 1.0000 Pearson Chi-Square 52 44.1413 0.8489 Scaled Pearson X2 52 42.5465 0.8182 Log Likelihood. -171.3777. What is this?

26 26 The likelihood function

27 27 A nest contains n eggs of which r are eaten by predators. The probability that a given egg is eaten is denoted π. The probability that exactly r of the eggs are killed is The binomial distribution where

28 28 r 1 = number of killed eggs out of n 1 eggs in the first nest r 2 = number of killed eggs out of n 2 eggs in the second nest r i = number of killed eggs out of n i eggs in the ith nest The probability of observing exactly r 1, r 2,...,r i events is times L = P(r 1 ) P(r 2 ) P(r 3 )....... P(r i )...... P(r k ) = ln L = ln P(r 1 ) + ln P(r 2 ) + ln P(r 3 ) +...+ ln P(r i ) +...+ ln P(r k ) = Log-likelihood function

29 29 Maximum likelihood The parameters of are found as the values that maximize the likelihood of observing exactly r 1, r 2,....,r i.... positive events out of n 1, n 2,....,n i.... events The maximum value of L can be found by differentiation of L with respect to β 0, β 1,...., β p, and setting the derivative equal to 0. This is the same as differentiation with respect to ln L......


Download ppt "1 Experimental design and analyses of experimental data Lesson 6 Logistic regression Generalized Linear Models (GENMOD)"

Similar presentations


Ads by Google