Presentation is loading. Please wait.

Presentation is loading. Please wait.

Université d’Ottawa - Bio 4518 - Biostatistiques appliquées © Antoine Morin et Scott Findlay 2016-01-08 04:32 1 Logistic regression.

Similar presentations


Presentation on theme: "Université d’Ottawa - Bio 4518 - Biostatistiques appliquées © Antoine Morin et Scott Findlay 2016-01-08 04:32 1 Logistic regression."— Presentation transcript:

1 Université d’Ottawa - Bio 4518 - Biostatistiques appliquées © Antoine Morin et Scott Findlay 2016-01-08 04:32 1 Logistic regression

2 Université d’Ottawa - Bio 4518 - Biostatistiques appliquées © Antoine Morin et Scott Findlay 2016-01-08 04:32 2 Logistic regression Member of the GLM family Unlike standard linear regression, the dependent variable is binary (0,1), so that each cases’ value is either 0 or 1. Normally, 0 is taken to mean the absence of some attribute, 1 its presence. Logistic regression can be extended to the case where there are more than two possible values for the dependent variable (e.g. low, medium, high – multinomial regression)

3 Université d’Ottawa - Bio 4518 - Biostatistiques appliquées © Antoine Morin et Scott Findlay 2016-01-08 04:32 3 Example: incidence of heart attacks in relation to age Linear regression inappropriate because: Residuals not normal Residuals heteroscedastic Predicted values nonsense (e.g. what does a predicted value of 0.3 mean?)

4 Université d’Ottawa - Bio 4518 - Biostatistiques appliquées © Antoine Morin et Scott Findlay 2016-01-08 04:32 4 Logistic regression: dependent variable Variable of interest is the probability p of obtaining a a one as a function of predictor variables The magnitude of regression coefficients in the model depends on distribution of the predictor variables in the two groups Y= 0 and Y = 1, X Y X Y 1 0 1 0

5 Université d’Ottawa - Bio 4518 - Biostatistiques appliquées © Antoine Morin et Scott Findlay 2016-01-08 04:32 5 Dependent variable: logit (p) -4-2024 logit 0 20 40 60 80 100 p

6 Université d’Ottawa - Bio 4518 - Biostatistiques appliquées © Antoine Morin et Scott Findlay 2016-01-08 04:32 6 Logistic regression: model coefficients Negative regression coefficient means probability of success decreases with increasing value of predictor. Positive regression coefficient means probability of success decreases with increasing value of predictor. X Y X Y 1 0 1 0  > 0  < 0

7 Université d’Ottawa - Bio 4518 - Biostatistiques appliquées © Antoine Morin et Scott Findlay 2016-01-08 04:32 7 Logistic regression: model coefficients The magnitude of the regression coefficient depends on how abruptly p changes with X, with large values indicating abrupt change. X Y 1 0  > 0, small X Y 1 0  > 0, large

8 Université d’Ottawa - Bio 4518 - Biostatistiques appliquées © Antoine Morin et Scott Findlay 2016-01-08 04:32 8 Least squares estimation (LSE) An ordinary least squares (OLS) estimate of a model parameter  is that which minimizes the sum of squared differences between observed and predicted values: Predicted values are derived from some model whose parameters we wish to estimate OLS  SS R

9 Université d’Ottawa - Bio 4518 - Biostatistiques appliquées © Antoine Morin et Scott Findlay 2016-01-08 04:32 9 Maximum likelihood estimation (MLE) A maximum likelihood estimate (MLE) of a model parameter  for a given distribution is that which maximizes the probability of generating the observed sample data. MLEs are obtained by maximizing the loss function …or equivalently, by minimizing the negative log likelihood function MLE  L or - log L - log L L

10 Université d’Ottawa - Bio 4518 - Biostatistiques appliquées © Antoine Morin et Scott Findlay 2016-01-08 04:32 10 How are the model parameters estimated? Estimated not by least squares, but rather by Maximum Likelihood –Based on an estimate of the likelihood of obtaining the observed results based on different values of the model parameters –In principle, parameter estimates should converge to those maximizing log-likelihood or minimizing - LogL

11 Université d’Ottawa - Bio 4518 - Biostatistiques appliquées © Antoine Morin et Scott Findlay 2016-01-08 04:32 11 Hypothesis testing Likelihood –Deviance=-2L –Is apprioximately distributed as chi-square –Measures the variation unexplained by the fitted model, analagous to residual sums of squares. Model comparison –Change in deviance when model terms are added (or deleted) is also approximately distributed as chi-square, so can test hypotheses relating to individual model terms.

12 Université d’Ottawa - Bio 4518 - Biostatistiques appliquées © Antoine Morin et Scott Findlay 2016-01-08 04:32 12 Model assumptions Observations are independent Dependent variable has a binomial distribution Little error in measurement of dependent variables.

13 Université d’Ottawa - Bio 4518 - Biostatistiques appliquées © Antoine Morin et Scott Findlay 2016-01-08 04:32 13 Logistic regression in SPlus *** Generalized Linear Model *** Call: glm(formula = cardiaque ~ age, family = binomial(link = logit), data = SDF12, na.action = na.exclude, control = list(epsilon = 0.0001, maxit = 50, trace = F)) Deviance Residuals: Min 1Q Median 3Q Max -1.545637 -0.5732664 -0.272312 -0.1404323 2.679875 Coefficients: Value Std. Error t value (Intercept) -7.76838060 0.376403465 -20.63844 age 0.09557905 0.005097055 18.75182 (Dispersion Parameter for Binomial family taken to be 1 ) Null Deviance: 2050.515 on 1999 degrees of freedom Residual Deviance: 1490.001 on 1998 degrees of freedom Number of Fisher Scoring Iterations: 4

14 Université d’Ottawa - Bio 4518 - Biostatistiques appliquées © Antoine Morin et Scott Findlay 2016-01-08 04:32 14 Incidence of heart attack in relation to age

15 Université d’Ottawa - Bio 4518 - Biostatistiques appliquées © Antoine Morin et Scott Findlay 2016-01-08 04:32 15 Presence of post-operative kyphosis using logistic regression Kyphosis: a binary variable indicating the presence/absence of a postoperative spinal deformity called Kyphosis. Age: the age of the child in months. Number: the number of vertebrae involved in the spinal operation. Start: the beginning of the range of the vertebrae involved in the operation

16 Université d’Ottawa - Bio 4518 - Biostatistiques appliquées © Antoine Morin et Scott Findlay 2016-01-08 04:32 16 Evidence that the distribution of predictor variables differs among levels of response variable

17 Université d’Ottawa - Bio 4518 - Biostatistiques appliquées © Antoine Morin et Scott Findlay 2016-01-08 04:32 17 The model

18 Université d’Ottawa - Bio 4518 - Biostatistiques appliquées © Antoine Morin et Scott Findlay 2016-01-08 04:32 18 Testing hypotheses


Download ppt "Université d’Ottawa - Bio 4518 - Biostatistiques appliquées © Antoine Morin et Scott Findlay 2016-01-08 04:32 1 Logistic regression."

Similar presentations


Ads by Google