1 Logistic Regression Homework Solutions EPP 245/298 Statistical Analysis of Laboratory Data.

Slides:



Advertisements
Similar presentations
Lecture 10 F-tests in MLR (continued) Coefficients of Determination BMTRY 701 Biostatistical Methods II.
Advertisements

SPH 247 Statistical Analysis of Laboratory Data 1April 2, 2013SPH 247 Statistical Analysis of Laboratory Data.
Logistic Regression Example: Horseshoe Crab Data
Logistic Regression.
Predicting Success in the National Football League An in-depth look at the factors that differentiate the winning teams from the losing teams. Benjamin.
Logistic Regression Predicting Dichotomous Data. Predicting a Dichotomy Response variable has only two states: male/female, present/absent, yes/no, etc.
1 Regression Homework Solutions EPP 245/298 Statistical Analysis of Laboratory Data.
1 Logistic Regression EPP 245/298 Statistical Analysis of Laboratory Data.
Nemours Biomedical Research Statistics April 23, 2009 Tim Bunnell, Ph.D. & Jobayer Hossain, Ph.D. Nemours Bioinformatics Core Facility.
Bcmort SPH 247 Statistical Analysis of Laboratory Data.
Introduction to Logistic Regression Analysis Dr Tuan V. Nguyen Garvan Institute of Medical Research Sydney, Australia.
Genetic Association and Generalised Linear Models Gil McVean, WTCHG Weds 2 nd November 2011.
1 Logistic Regression EPP 245 Statistical Analysis of Laboratory Data.
Logistic Regression with “Grouped” Data Lobster Survival by Size in a Tethering Experiment Source: E.B. Wilkinson, J.H. Grabowski, G.D. Sherwood, P.O.
MATH 3359 Introduction to Mathematical Modeling Download/Import/Modify Data, Logistic Regression.
Logistic Regression and Generalized Linear Models:
SPH 247 Statistical Analysis of Laboratory Data May 19, 2015SPH 247 Statistical Analysis of Laboratory Data1.
New Ways of Looking at Binary Data Fitting in R Yoon G Kim, Colloquium Talk.
Lecture 15: Logistic Regression: Inference and link functions BMTRY 701 Biostatistical Methods II.
MATH 3359 Introduction to Mathematical Modeling Project Multiple Linear Regression Multiple Logistic Regression.
Lecture 6 Generalized Linear Models Olivier MISSA, Advanced Research Skills.
7.1 - Motivation Motivation Correlation / Simple Linear Regression Correlation / Simple Linear Regression Extensions of Simple.
© Department of Statistics 2012 STATS 330 Lecture 26: Slide 1 Stats 330: Lecture 26.
© Department of Statistics 2012 STATS 330 Lecture 25: Slide 1 Stats 330: Lecture 25.
Logistic Regression Pre-Challenger Relation Between Temperature and Field-Joint O-Ring Failure Dalal, Fowlkes, and Hoadley (1989). “Risk Analysis of the.
Introduction to Generalized Linear Models Prepared by Louise Francis Francis Analytics and Actuarial Data Mining, Inc. October 3, 2004.
Design and Analysis of Clinical Study 11. Analysis of Cohort Study Dr. Tuan V. Nguyen Garvan Institute of Medical Research Sydney, Australia.
Repeated Measures  The term repeated measures refers to data sets with multiple measurements of a response variable on the same experimental unit or subject.
April 6 Logistic Regression –Estimating probability based on logistic model –Testing differences among multiple groups –Assumptions for model.
Logistic regression. Analysis of proportion data We know how many times an event occurred, and how many times did not occur. We want to know if these.
November 5, 2008 Logistic and Poisson Regression: Modeling Binary and Count Data LISA Short Course Series Mark Seiss, Dept. of Statistics.
Using R for Marketing Research Dan Toomey 2/23/2015
FACTORS AFFECTING HOUSING PRICES IN SYRACUSE Sample collected from Zillow in January, 2015 Urban Policy Class Exercise - Lecy.
Exercise 1 The standard deviation of measurements at low level for a method for detecting benzene in blood is 52 ng/L. What is the Critical Level if we.
© Department of Statistics 2012 STATS 330 Lecture 31: Slide 1 Stats 330: Lecture 31.
Lecture 7 GLMs II Binomial Family Olivier MISSA, Advanced Research Skills.
© Department of Statistics 2012 STATS 330 Lecture 20: Slide 1 Stats 330: Lecture 20.
Design and Analysis of Clinical Study 10. Cohort Study Dr. Tuan V. Nguyen Garvan Institute of Medical Research Sydney, Australia.
Lecture 12: Cox Proportional Hazards Model
Exercise In the bcmort data set, the four-level factor cohort can be considered the product of two two-level factors, say “period” ( or )
A preliminary exploration into the Binomial Logistic Regression Models in R and their potential application Andrew Trant PPS Arctic - Labrador Highlands.
Applied Statistics Week 4 Exercise 3 Tick bites and suspicion of Borrelia Mihaela Frincu
Lecture 6: Multiple Linear Regression Adjusted Variable Plots BMTRY 701 Biostatistical Methods II.
Count Data. HT Cleopatra VII & Marcus Antony C c Aa.
Linear Models Alan Lee Sample presentation for STATS 760.
© Department of Statistics 2012 STATS 330 Lecture 22: Slide 1 Stats 330: Lecture 22.
Université d’Ottawa - Bio Biostatistiques appliquées © Antoine Morin et Scott Findlay :32 1 Logistic regression.
Logistic Regression. Example: Survival of Titanic passengers  We want to know if the probability of survival is higher among children  Outcome (y) =
Statistics 2: generalized linear models. General linear model: Y ~ a + b 1 * x 1 + … + b n * x n + ε There are many cases when general linear models are.
© Department of Statistics 2012 STATS 330 Lecture 24: Slide 1 Stats 330: Lecture 24.
EPP 245 Statistical Analysis of Laboratory Data 1April 23, 2010SPH 247 Statistical Analysis of Laboratory Data.
Logistic Regression and Odds Ratios Psych DeShon.
Logistic Regression. What is the purpose of Regression?
R Programming/ Binomial Models Shinichiro Suna. Binomial Models In binomial model, we have one outcome which is binary and a set of explanatory variables.
1 Analysis of Variance (ANOVA) EPP 245/298 Statistical Analysis of Laboratory Data.
Logistic Regression Jeff Witmer 30 March Categorical Response Variables Examples: Whether or not a person smokes Success of a medical treatment.
Lecture 21: poisson regression log-linear regression BMTRY 701 Biostatistical Methods II.
Transforming the data Modified from:
Logistic regression.
A priori violations In the following cases, your data violates the normality and homoskedasticity assumption on a priori grounds: (1) count data  Poisson.
Résolution de l’ex 1 p40 t=c(2:12);N=c(55,90,135,245,403,665,1100,1810,3000,4450,7350) T=data.frame(t,N,y=log(N));T; > T t N y
CHAPTER 7 Linear Correlation & Regression Methods
Measuring Success in Prediction
ביצוע רגרסיה לוגיסטית. פרק ה-2
Console Editeur : myProg.R 1
PSY 626: Bayesian Statistics for Psychological Science
SAME THING?.
PSY 626: Bayesian Statistics for Psychological Science
Logistic Regression with “Grouped” Data
Analysis of Variance (ANOVA)
Presentation transcript:

1 Logistic Regression Homework Solutions EPP 245/298 Statistical Analysis of Laboratory Data

December 1, 2004EPP 245 Statistical Analysis of Laboratory Data 2 Exercise 11.1 Predict risk of malaria from age and log transformed antibody level using logistic regression First examine some plots Then fit the logistic regression Interpret the results Check model assumptions

December 1, 2004EPP 245 Statistical Analysis of Laboratory Data 3 > library(ISwR) Loading required package: survival Loading required package: splines > data(malaria) > summary(malaria) subject age ab mal Min. : 1.00 Min. : 3.00 Min. : 2.0 Min. :0.00 1st Qu.: st Qu.: st Qu.: st Qu.:0.00 Median : Median : 9.00 Median : Median :0.00 Mean : Mean : 8.86 Mean : Mean :0.27 3rd Qu.: rd Qu.: rd Qu.: rd Qu.:1.00 Max. : Max. :15.00 Max. : Max. :1.00 > attach(malaria) > plot(age,mal) > plot(log(ab),mal) > plot(age, log(ab))

December 1, 2004EPP 245 Statistical Analysis of Laboratory Data 4

December 1, 2004EPP 245 Statistical Analysis of Laboratory Data 5

December 1, 2004EPP 245 Statistical Analysis of Laboratory Data 6

December 1, 2004EPP 245 Statistical Analysis of Laboratory Data 7 > mal.glm <- glm(mal ~ age+log(ab),binomial) > summary(mal.glm) Call: glm(formula = mal ~ age + log(ab), family = binomial) Deviance Residuals: Min 1Q Median 3Q Max Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) ** age log(ab) *** --- (Dispersion parameter for binomial family taken to be 1) Null deviance: on 99 degrees of freedom Residual deviance: on 97 degrees of freedom AIC: Residual deviance shows no lack of fit

December 1, 2004EPP 245 Statistical Analysis of Laboratory Data 8 > summary(glm(mal ~ log(ab),binomial)) Call: glm(formula = mal ~ log(ab), family = binomial) Deviance Residuals: Min 1Q Median 3Q Max Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) * log(ab) *** --- (Dispersion parameter for binomial family taken to be 1) Null deviance: on 99 degrees of freedom Residual deviance: on 98 degrees of freedom AIC: Number of Fisher Scoring iterations: 4

December 1, 2004EPP 245 Statistical Analysis of Laboratory Data 9 Exercise 11.2 The 'gvhd' data frame has 37 rows and 7 columns. It contains data from patients receiving a nondepleted allogenic bone marrow transplant, with the purpose of finding variables associated with the development of acute graft-versus-host disease.

December 1, 2004EPP 245 Statistical Analysis of Laboratory Data 10 pnr a numeric vector. Patient number. rcpage a numeric vector. Age of recipient (years). donage a numeric vector. Age of donor (years). type a numeric vector, type of leukaemia coded 1: AML, 2: ALL, 3: CML for acute myeloid, acute lymphatic, and chronic myeloid leukaemia. preg a numeric vector code, indicating whether donor has been pregnant. 0: no, 1: yes. index a numeric vector giving an index of mixed epidermal cell-lymphocyte reactions. gvhd a numeric vector code, graft versus host disease. 0: no, 1: yes. time a numeric vector. Follow-up time dead a numeric vector code 0: no (censored), 1: yes

December 1, 2004EPP 245 Statistical Analysis of Laboratory Data 11 > summary(graft.vs.host) pnr rcpage donage type preg Min. : 1 Min. :13.00 Min. :14.00 Min. :1.000 Min. : st Qu.:10 1st Qu.: st Qu.: st Qu.: st Qu.: Median :19 Median :23.00 Median :23.00 Median :2.000 Median : Mean :19 Mean :25.43 Mean :25.81 Mean :1.973 Mean : rd Qu.:28 3rd Qu.: rd Qu.: rd Qu.: rd Qu.: Max. :37 Max. :43.00 Max. :43.00 Max. :3.000 Max. : index gvhd time dead Min. : Min. : Min. : 41.0 Min. : st Qu.: st Qu.: st Qu.: st Qu.: Median : Median : Median : Median : Mean : Mean : Mean : Mean : rd Qu.: rd Qu.: rd Qu.: rd Qu.: Max. : Max. : Max. : Max. : > attach(graft.vs.host) > hist(index) > hist(sqrt(index)) > hist(log(index))

December 1, 2004EPP 245 Statistical Analysis of Laboratory Data 12

December 1, 2004EPP 245 Statistical Analysis of Laboratory Data 13

December 1, 2004EPP 245 Statistical Analysis of Laboratory Data 14

December 1, 2004EPP 245 Statistical Analysis of Laboratory Data 15 > summary(glm(gvhd ~ rcpage+donage+type+preg+log(index),binomial)) Deviance Residuals: Min 1Q Median 3Q Max Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) * rcpage donage type type preg log(index) * --- Null deviance: on 36 degrees of freedom Residual deviance: on 30 degrees of freedom AIC: Number of Fisher Scoring iterations: 6

December 1, 2004EPP 245 Statistical Analysis of Laboratory Data 16 > drop1(glm(gvhd ~ rcpage+donage+type+preg+log(index), binomial),test="Chisq") Single term deletions Model: gvhd ~ rcpage + donage + type + preg + log(index) Df Deviance AIC LRT Pr(Chi) rcpage donage type preg log(index) *

December 1, 2004EPP 245 Statistical Analysis of Laboratory Data 17 > drop1(glm(gvhd ~ donage+type+preg+log(index),binomial), test="Chisq") Single term deletions Model: gvhd ~ donage + type + preg + log(index) Df Deviance AIC LRT Pr(Chi) donage type preg log(index) * --- Signif. codes: 0 `***' `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1

December 1, 2004EPP 245 Statistical Analysis of Laboratory Data 18 > drop1(glm(gvhd ~ donage+preg+log(index),binomial),test="Chisq") Single term deletions Model: gvhd ~ donage + preg + log(index) Df Deviance AIC LRT Pr(Chi) donage * preg log(index) *** --- Signif. codes: 0 `***' `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1

December 1, 2004EPP 245 Statistical Analysis of Laboratory Data 19 > drop1(glm(gvhd ~ donage+log(index),binomial),test="Chisq") Single term deletions Model: gvhd ~ donage + log(index) Df Deviance AIC LRT Pr(Chi) donage ** log(index) *** --- Signif. codes: 0 `***' `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1

December 1, 2004EPP 245 Statistical Analysis of Laboratory Data 20 > summary(glm(gvhd ~ donage+log(index),binomial),test="Chisq") Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) ** donage * log(index) ** Null deviance: on 36 degrees of freedom Residual deviance: on 34 degrees of freedom AIC: > summary(glm(gvhd ~ donage+sqrt(index),binomial),test="Chisq") Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) ** donage * sqrt(index) ** Null deviance: on 36 degrees of freedom Residual deviance: on 34 degrees of freedom AIC:

December 1, 2004EPP 245 Statistical Analysis of Laboratory Data 21 > summary(glm(gvhd ~ donage+log(index),binomial),test="Chisq") Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) ** donage * log(index) ** Null deviance: on 36 degrees of freedom Residual deviance: on 34 degrees of freedom AIC: > summary(glm(gvhd ~ donage+index,binomial),test="Chisq") Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) ** donage * index ** Null deviance: on 36 degrees of freedom Residual deviance: on 34 degrees of freedom AIC:

December 1, 2004EPP 245 Statistical Analysis of Laboratory Data 22 Other Issues If we want to analyze time to GVHD, we could use the time and the censoring variable This is complex because someone who does not have GVHD may develop it later (one type of censoring) and cannot develop it if dead (another type) We have competing risks Some of these issues will be addressed next quarter

December 1, 2004EPP 245 Statistical Analysis of Laboratory Data 23 Exercise 11.3 The function confint() gives parameter confidence intervals for nonlinear models that are more accurate than the ones given by default This uses a profile likelihood technique, which varies the parameter until the change in likelihood/deviance is too big

December 1, 2004EPP 245 Statistical Analysis of Laboratory Data 24 > mal.glm <- glm(mal ~ age+log(ab),binomial) > summary(mal.glm) Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) ** age log(ab) *** (1.960)(.19552) = (-1.066, ) > library(MASS) > confint(mal.glm) Waiting for profiling to be done % 97.5 % (Intercept) age log(ab)

December 1, 2004EPP 245 Statistical Analysis of Laboratory Data 25 > gvhd.glm <- glm(gvhd ~ donage+log(index),binomial) > summary(gvhd.glm) Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) ** donage * log(index) ** (1.960)( ) = (0.6296, ) (1.960)( ) = (0.0192, ) > confint(gvhd.glm) Waiting for profiling to be done % 97.5 % (Intercept) donage log(index)