1 Experimental design and analyses of experimental data Lesson 6 Logistic regression Generalized Linear Models (GENMOD)

Slides:



Advertisements
Similar presentations
Continued Psy 524 Ainsworth
Advertisements

A. The Basic Principle We consider the multivariate extension of multiple linear regression – modeling the relationship between m responses Y 1,…,Y m and.
Brief introduction on Logistic Regression
Data: Crab mating patterns Data: Typists (Poisson with random effects) (Poisson Regression, ZIP model, Negative Binomial) Data: Challenger (Binomial with.
Logistic Regression Psy 524 Ainsworth.
Logistic Regression I Outline Introduction to maximum likelihood estimation (MLE) Introduction to Generalized Linear Models The simplest logistic regression.
EPI 809/Spring Probability Distribution of Random Error.
Logistic Regression Example: Horseshoe Crab Data
Overview of Logistics Regression and its SAS implementation
6.1.4 AIC, Model Selection, and the Correct Model oAny model is a simplification of reality oIf a model has relatively little bias, it tends to provide.
1 STA 517 – Introduction: Distribution and Inference 1.5 STATISTICAL INFERENCE FOR MULTINOMIAL PARAMETERS  Recall multi(n, =( 1,  2, …,  c ))  Suppose.
April 25 Exam April 27 (bring calculator with exp) Cox-Regression
Log-Linear Models & Dependent Samples Feng Ye, Xiao Guo, Jing Wang.
Linear statistical models 2008 Model diagnostics  Residual analysis  Outliers  Dependence  Heteroscedasticity  Violations of distributional assumptions.
Log-linear and logistic models Generalised linear model ANOVA revisited Log-linear model: Poisson distribution logistic model: Binomial distribution Deviances.
Log-linear and logistic models
EPI 809/Spring Multiple Logistic Regression.
1 Modeling Ordinal Associations Section 9.4 Roanna Gee.
Logistic Regression Biostatistics 510 March 15, 2007 Vanessa Perez.
OLS versus MLE Example YX Here is the data:
An Introduction to Logistic Regression
Linear statistical models 2009 Count data  Contingency tables and log-linear models  Poisson regression.
Review for Exam 2 Some important themes from Chapters 6-9 Chap. 6. Significance Tests Chap. 7: Comparing Two Groups Chap. 8: Contingency Tables (Categorical.
Generalized Linear Models
Logistic Regression with “Grouped” Data Lobster Survival by Size in a Tethering Experiment Source: E.B. Wilkinson, J.H. Grabowski, G.D. Sherwood, P.O.
Logistic Regression In logistic regression the outcome variable is binary, and the purpose of the analysis is to assess the effects of multiple explanatory.
Logistic Regression II Simple 2x2 Table (courtesy Hosmer and Lemeshow) Exposure=1Exposure=0 Disease = 1 Disease = 0.
SAS Lecture 5 – Some regression procedures Aidan McDermott, April 25, 2005.
Chapter 9 – Modeling Breaking Strength with Dichotomous Data You are a statistician working for the Cry Your Eyes Out Tissue Company. The company wants.
April 11 Logistic Regression –Modeling interactions –Analysis of case-control studies –Data presentation.
1 Experimental Statistics - week 10 Chapter 11: Linear Regression and Correlation.
EIPB 698E Lecture 10 Raul Cruz-Cano Fall Comments for future evaluations Include only output used for conclusions Mention p-values explicitly (also.
1 Experimental Statistics - week 10 Chapter 11: Linear Regression and Correlation Note: Homework Due Thursday.
Regression For the purposes of this class: –Does Y depend on X? –Does a change in X cause a change in Y? –Can Y be predicted from X? Y= mX + b Predicted.
April 6 Logistic Regression –Estimating probability based on logistic model –Testing differences among multiple groups –Assumptions for model.
2 December 2004PubH8420: Parametric Regression Models Slide 1 Applications - SAS Parametric Regression in SAS –PROC LIFEREG –PROC GENMOD –PROC LOGISTIC.
4-Oct-07GzLM PresentationBIOL The GzLM and SAS Or why it’s a necessary evil to learn code! Keith Lewis Department of Biology Memorial University,
Applied Epidemiologic Analysis - P8400 Fall 2002
1 היחידה לייעוץ סטטיסטי אוניברסיטת חיפה פרופ’ בנימין רייזר פרופ’ דוד פרג’י גב’ אפרת ישכיל.
When and why to use Logistic Regression?  The response variable has to be binary or ordinal.  Predictors can be continuous, discrete, or combinations.
Linear vs. Logistic Regression Log has a slightly better ability to represent the data Dichotomous Prefer Don’t Prefer Linear vs. Logistic Regression.
April 4 Logistic Regression –Lee Chapter 9 –Cody and Smith 9:F.
Week 5: Logistic regression analysis Overview Questions from last week What is logistic regression analysis? The mathematical model Interpreting the β.
Forecasting Choices. Types of Variable Variable Quantitative Qualitative Continuous Discrete (counting) Ordinal Nominal.
Logistic Regression. Linear Regression Purchases vs. Income.
Generalized Linear Models (GLMs) and Their Applications.
Log-linear Models HRP /03/04 Log-Linear Models for Multi-way Contingency Tables 1. GLM for Poisson-distributed data with log-link (see Agresti.
1 Topic 4 : Ordered Logit Analysis. 2 Often we deal with data where the responses are ordered – e.g. : (i) Eyesight tests – bad; average; good (ii) Voting.
Université d’Ottawa - Bio Biostatistiques appliquées © Antoine Morin et Scott Findlay :32 1 Logistic regression.
Sigmoidal Response (knnl558.sas). Programming Example: knnl565.sas Y = completion of a programming task (1 = yes, 0 = no) X 2 = amount of programming.
Applied Epidemiologic Analysis - P8400 Fall 2002 Labs 6 & 7 Case-Control Analysis ----Logistic Regression Henian Chen, M.D., Ph.D.
Logistic Regression Analysis Gerrit Rooks
ALISON BOWLING MAXIMUM LIKELIHOOD. GENERAL LINEAR MODEL.
Dependent Variable Discrete  2 values – binomial  3 or more discrete values – multinomial  Skewed – e.g. Poisson Continuous  Non-normal.
Applied Epidemiologic Analysis - P8400 Fall 2002 Labs 6 & 7 Case-Control Analysis ----Logistic Regression Henian Chen, M.D., Ph.D.
Roger B. Hammer Assistant Professor Department of Sociology Oregon State University Conducting Social Research Logistic Regression Categorical Data Analysis.
Hypothesis Testing. Statistical Inference – dealing with parameter and model uncertainty  Confidence Intervals (credible intervals)  Hypothesis Tests.
Nonparametric Statistics
Birthweight (gms) BPDNProp Total BPD (Bronchopulmonary Dysplasia) by birth weight Proportion.
1 Experimental Statistics - week 11 Chapter 11: Linear Regression and Correlation.
LOGISTIC REGRESSION. Purpose  Logistical regression is regularly used when there are only two categories of the dependent variable and there is a mixture.
Logistic Regression: Regression with a Binary Dependent Variable.
Logistic Regression When and why do we use logistic regression?
Logistic Regression APKC – STATS AFAC (2016).
CHAPTER 7 Linear Correlation & Regression Methods
Introduction to Logistic Regression
Chapter 9 – Modeling Breaking Strength with Dichotomous Data
Modeling Ordinal Associations Bin Hu
Presentation transcript:

1 Experimental design and analyses of experimental data Lesson 6 Logistic regression Generalized Linear Models (GENMOD)

2 Logistic regression Used when data are dichotomous. Used when data are fractions between 0 and 1

3 Example: The distance from the nest to the nearest nest of Herring gull? On the vegetation surrounding the nest? On the number of eggs in the nest? Does predation of eggs in nests of Oyster catcher depend on

4 OBS DIST EGGS VEG KILLED B C B A C A A 3 Data:

5 Analysis of dichotomous data : Nests are categorized according to whether predation has occurred or not. No predation is scored as 0 Predation is scored as 1

6 Plus/minus predator visit to Oyster catcher nest

7 The purpose is to fit a model to the data – a model that predicts the probability of a nest being predated

8 The logistic regression model: where and ε BIN(0, π(1-π)) The logit-transformation The odds (the ratio between the probability of a positive and a negative event)

9 y =0 So that

10 How to do it in SAS

11 DATA logist; OPTIONS LINESIZE = 90; /* Example on logistic regression */ /* The example is inspirered by Dorthe Lahrmann's investigations of Oyster catchers (strandskader) on Langli in Ho Bugt */ INFILE 'h:\lin-mod\logist.prn' FIRSTOBS=2; INPUT dist eggs veg $ killed; /* dist = Distance to the nearest nest of Herring gull (sølvmåge)*/ /* eggs = Number of Oyster catcher eggs in a nest */ /* veg = vegetation type surrounding an Oyster catcher nest*/ IF killed > 0 THEN visit= 1; IF killed = 0 THEN visit = 0; /* If killed > 0 then the nest has been visited by a predator at least once */

12 /* Eksempel A: Analysis of a nest has been visited or not-visited by predators, i.e. visit = 1 or 0 */ PROC GENMOD; /* The procedure is Generalized Linear Models */ TITLE 'Eksempel A'; CLASS veg; /* veg is a class variable */ MODEL visit = dist veg /DIST=binomial LINK=logit TYPE3 DSCALE OBSTATS; /* DIST = distribution function (here chosen as binomial) */ /* LINK = the model uses a logit-transformation of data */ /* TYPE3 = type 3 is used in order to evaluate the relative contribution of the different factors on the independent variable */ /* DSCALE = an option which tells SAS to scale the error in order to meet the demands of the model. If DSCALE is approximately 1, scaling is not needed. */ /* OBSTATS = gives the predicted values as well as their confidence limits */ RUN;

13 Eksempel A 10:19 Thursday, November 22, The GENMOD Procedure Model Information Description Value Data Set WORK.LOGIST Distribution BINOMIAL Link Function LOGIT Dependent Variable VISIT Observations Used 57 Number Of Events 52 Number Of Trials 57 Class Level Information Class Levels Values VEG 3 A B C

14 Criteria For Assessing Goodness Of Fit Criterion DF Value Value/DF Deviance Scaled Deviance Pearson Chi-Square Scaled Pearson X Log Likelihood These values indicate the fit of the model. Low values (for a given DF) indicate a good fit These values should be close to unity if the model’s assumptions are met Values less than unity indicate underdispersion (variance less than expected) Values greater than unity indicate overdispersion (variance greater than expected) Values after scaling with DSCALE

15 Analysis Of Parameter Estimates Parameter DF Estimate Std Err ChiSquare Pr>Chi INTERCEPT DIST VEG A VEG B VEG C SCALE NOTE: The scale parameter was estimated by the square root of DEVIANCE/DOF. LR Statistics For Type 3 Analysis Source NDF DDF F Pr>F ChiSquare Pr>Chi DIST VEG

16 Criteria For Assessing Goodness Of Fit Criterion DF Value Value/DF Deviance Scaled Deviance Pearson Chi-Square Scaled Pearson X Log Likelihood Analysis Of Parameter Estimates Parameter DF Estimate Std Err ChiSquare Pr>Chi INTERCEPT DIST SCALE NOTE: The scale parameter was estimated by the square root of DEVIANCE/DOF. LR Statistics For Type 3 Analysis Source NDF DDF F Pr>F ChiSquare Pr>Chi DIST

17 Observation Statistics VISIT Pred Xbeta Std HessWgt Lower Upper Resraw

18 Predicted values and 95% confidence limits

19 /* Example B: Analysis of the fraction of eggs in a nest that are lost */ PROC GENMOD; /* procedure is Generalized Linear Models */ TITLE 'Eksempel B'; CLASS veg; /* veg is a class variable */ MODEL killed/eggs = dist veg eggs/DIST=binomial LINK=logit TYPE3 DSCALE OBSTATS; /* DIST = distribution function (here chosen as binomial) */ /* LINK = the model uses a logit-transformation of data */ /* TYPE3 = SS3 is used to determine the contribution of the individual factors to the dependent variable */ /* DSCALE = option that can be used if Deviance/DF is different from 1. It reduces the risk of Type 1 errors if the scale parameter is > 1 og the risk of a Type II errors, if the scale parameter is < 1 */ /* OBSTATS = gives the predicted values, and the confidence limits */ RUN; Note that this procedure takes the absolute number of eggs killed out of the total number of eggs into consideration, and not merely the proportion of killed eggs

20 Eksempel B 12:26 Thursday, November 22, The GENMOD Procedure Model Information Description Value Data Set WORK.LOGIST Distribution BINOMIAL Link Function LOGIT Dependent Variable KILLED Dependent Variable EGGS Observations Used 57 Number Of Events 183 Number Of Trials 336 Class Level Information Class Levels Values VEG 3 A B C

21 Criteria For Assessing Goodness Of Fit Criterion DF Value Value/DF Deviance Scaled Deviance Pearson Chi-Square Scaled Pearson X Log Likelihood

22 Analysis Of Parameter Estimates Parameter DF Estimate Std Err ChiSquare Pr>Chi INTERCEPT DIST VEG A VEG B VEG C EGGS SCALE NOTE: The scale parameter was estimated by the square root of DEVIANCE/DOF. LR Statistics For Type 3 Analysis Source NDF DDF F Pr>F ChiSquare Pr>Chi DIST VEG EGGS

23 Criteria For Assessing Goodness Of Fit Criterion DF Value Value/DF Deviance Scaled Deviance Pearson Chi-Square Scaled Pearson X Log Likelihood Analysis Of Parameter Estimates Parameter DF Estimate Std Err ChiSquare Pr>Chi INTERCEPT DIST SCALE NOTE: The scale parameter was estimated by the square root of DEVIANCE/DOF. LR Statistics For Type 3 Analysis Source NDF DDF F Pr>F ChiSquare Pr>Chi DIST

24 Predicted values and 95% confidence limits

25 Criteria For Assessing Goodness Of Fit Criterion DF Value Value/DF Deviance Scaled Deviance Pearson Chi-Square Scaled Pearson X Log Likelihood What is this?

26 The likelihood function

27 A nest contains n eggs of which r are eaten by predators. The probability that a given egg is eaten is denoted π. The probability that exactly r of the eggs are killed is The binomial distribution where

28 r 1 = number of killed eggs out of n 1 eggs in the first nest r 2 = number of killed eggs out of n 2 eggs in the second nest r i = number of killed eggs out of n i eggs in the ith nest The probability of observing exactly r 1, r 2,...,r i events is times L = P(r 1 ) P(r 2 ) P(r 3 ) P(r i ) P(r k ) = ln L = ln P(r 1 ) + ln P(r 2 ) + ln P(r 3 ) ln P(r i ) ln P(r k ) = Log-likelihood function

29 Maximum likelihood The parameters of are found as the values that maximize the likelihood of observing exactly r 1, r 2,....,r i.... positive events out of n 1, n 2,....,n i.... events The maximum value of L can be found by differentiation of L with respect to β 0, β 1,...., β p, and setting the derivative equal to 0. This is the same as differentiation with respect to ln L......