Logistic regression. Analysis of proportion data We know how many times an event occurred, and how many times did not occur. We want to know if these.

Slides:



Advertisements
Similar presentations
© Department of Statistics 2012 STATS 330 Lecture 32: Slide 1 Stats 330: Lecture 32.
Advertisements

Irwin/McGraw-Hill © Andrew F. Siegel, 1997 and l Chapter 12 l Multiple Regression: Predicting One Factor from Several Others.
Logistic Regression Example: Horseshoe Crab Data
Proportion Data Harry R. Erwin, PhD School of Computing and Technology University of Sunderland.
Logistic Regression.
ANCOVA Regression with more than one line Andrew Jackson
Predicting Success in the National Football League An in-depth look at the factors that differentiate the winning teams from the losing teams. Benjamin.
Logistic Regression Predicting Dichotomous Data. Predicting a Dichotomy Response variable has only two states: male/female, present/absent, yes/no, etc.
Chapter 13 Conducting & Reading Research Baumgartner et al Data Analysis.
Statistics: Data Analysis and Presentation Fr Clinic II.
Correlation. Two variables: Which test? X Y Contingency analysis t-test Logistic regression Correlation Regression.
Nemours Biomedical Research Statistics April 23, 2009 Tim Bunnell, Ph.D. & Jobayer Hossain, Ph.D. Nemours Bioinformatics Core Facility.
Introduction to Logistic Regression Analysis Dr Tuan V. Nguyen Garvan Institute of Medical Research Sydney, Australia.
1 Logistic Regression Homework Solutions EPP 245/298 Statistical Analysis of Laboratory Data.
Simple Linear Regression Analysis
Logistic Regression with “Grouped” Data Lobster Survival by Size in a Tethering Experiment Source: E.B. Wilkinson, J.H. Grabowski, G.D. Sherwood, P.O.
MATH 3359 Introduction to Mathematical Modeling Download/Import/Modify Data, Logistic Regression.
Lecture 5 Correlation and Regression
Logistic Regression and Generalized Linear Models:
SPH 247 Statistical Analysis of Laboratory Data May 19, 2015SPH 247 Statistical Analysis of Laboratory Data1.
New Ways of Looking at Binary Data Fitting in R Yoon G Kim, Colloquium Talk.
Lecture 15: Logistic Regression: Inference and link functions BMTRY 701 Biostatistical Methods II.
MATH 3359 Introduction to Mathematical Modeling Project Multiple Linear Regression Multiple Logistic Regression.
Lecture 6 Generalized Linear Models Olivier MISSA, Advanced Research Skills.
7.1 - Motivation Motivation Correlation / Simple Linear Regression Correlation / Simple Linear Regression Extensions of Simple.
© Department of Statistics 2012 STATS 330 Lecture 26: Slide 1 Stats 330: Lecture 26.
© Department of Statistics 2012 STATS 330 Lecture 25: Slide 1 Stats 330: Lecture 25.
Logistic Regression Pre-Challenger Relation Between Temperature and Field-Joint O-Ring Failure Dalal, Fowlkes, and Hoadley (1989). “Risk Analysis of the.
Design and Analysis of Clinical Study 11. Analysis of Cohort Study Dr. Tuan V. Nguyen Garvan Institute of Medical Research Sydney, Australia.
Excepted from HSRP 734: Advanced Statistical Methods June 5, 2008.
Correlation and Regression SCATTER DIAGRAM The simplest method to assess relationship between two quantitative variables is to draw a scatter diagram.
Repeated Measures  The term repeated measures refers to data sets with multiple measurements of a response variable on the same experimental unit or subject.
November 5, 2008 Logistic and Poisson Regression: Modeling Binary and Count Data LISA Short Course Series Mark Seiss, Dept. of Statistics.
Multiple Regression and Model Building Chapter 15 Copyright © 2014 by The McGraw-Hill Companies, Inc. All rights reserved.McGraw-Hill/Irwin.
Statistics PSY302 Quiz One Spring A _____ places an individual into one of several groups or categories. (p. 4) a. normal curve b. spread c.
© Department of Statistics 2012 STATS 330 Lecture 20: Slide 1 Stats 330: Lecture 20.
ITEC6310 Research Methods in Information Technology Instructor: Prof. Z. Yang Course Website: c6310.htm Office:
Design and Analysis of Clinical Study 10. Cohort Study Dr. Tuan V. Nguyen Garvan Institute of Medical Research Sydney, Australia.
Tutorial 4 MBP 1010 Kevin Brown. Correlation Review Pearson’s correlation coefficient – Varies between – 1 (perfect negative linear correlation) and 1.
Lecture 12: Cox Proportional Hazards Model
A preliminary exploration into the Binomial Logistic Regression Models in R and their potential application Andrew Trant PPS Arctic - Labrador Highlands.
Applied Statistics Week 4 Exercise 3 Tick bites and suspicion of Borrelia Mihaela Frincu
Count Data. HT Cleopatra VII & Marcus Antony C c Aa.
1 Model choice Gil McVean, Department of Statistics Tuesday 17 th February 2007.
© Department of Statistics 2012 STATS 330 Lecture 22: Slide 1 Stats 330: Lecture 22.
Université d’Ottawa - Bio Biostatistiques appliquées © Antoine Morin et Scott Findlay :32 1 Logistic regression.
Logistic Regression. Example: Survival of Titanic passengers  We want to know if the probability of survival is higher among children  Outcome (y) =
Statistics 2: generalized linear models. General linear model: Y ~ a + b 1 * x 1 + … + b n * x n + ε There are many cases when general linear models are.
Copyright © 2011 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin Simple Linear Regression Analysis Chapter 13.
© Department of Statistics 2012 STATS 330 Lecture 24: Slide 1 Stats 330: Lecture 24.
Describing Samples Based on Chapter 3 of Gotelli & Ellison (2004) and Chapter 4 of D. Heath (1995). An Introduction to Experimental Design and Statistics.
Tutorial 5 Thursday February 14 MBP 1010 Kevin Brown.
Copyright © 2011 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin Multiple Regression Chapter 14.
Logistic Regression and Odds Ratios Psych DeShon.
Logistic Regression. What is the purpose of Regression?
R Programming/ Binomial Models Shinichiro Suna. Binomial Models In binomial model, we have one outcome which is binary and a set of explanatory variables.
Central Bank of Egypt Basic statistics. Central Bank of Egypt 2 Index I.Measures of Central Tendency II.Measures of variability of distribution III.Covariance.
Construction Engineering 221 Probability and statistics Normal Distribution.
Logistic Regression Jeff Witmer 30 March Categorical Response Variables Examples: Whether or not a person smokes Success of a medical treatment.
Lecture #25 Tuesday, November 15, 2016 Textbook: 14.1 and 14.3
Transforming the data Modified from:
Logistic regression.
A priori violations In the following cases, your data violates the normality and homoskedasticity assumption on a priori grounds: (1) count data  Poisson.
CHAPTER 7 Linear Correlation & Regression Methods
Statistics PSY302 Review Quiz One Fall 2018
SAME THING?.
PSY 626: Bayesian Statistics for Psychological Science
When You See (This), You Think (That)
Statistics PSY302 Review Quiz One Spring 2017
Logistic Regression with “Grouped” Data
Presentation transcript:

Logistic regression

Analysis of proportion data We know how many times an event occurred, and how many times did not occur. We want to know if these proportions are affected by a treatment or a factor Examples: Proportion dying Proportion responding to a treatment Proportion in a sex Proportion flowering

The old fashion way: People used to model these data using percentage mortality as the response variable The problems with this are: Errors are not normally distributed The variance is not constant The response is bounded (1-0) We lose information of the size of the sample

However… Some data as percentage of plant cover are better analyzed using the conventional models (normal errors and constant variance) following arcsine transformation (the response variable measured in radians)

If the response variable takes the form of percentage change is some measurement It is usually better: Analysis of covariance, using final weight as the response variable and initial weight as covariate, or By specifying the response variable as a relative growth rate, measured as log(final/initial) Both of which can be analyzed with normal errors without further transformation

Rational for logistic regression The traditional transformation of proportion data was arcsine. This transformation took care of the error distribution. There is nothing wrong with this transformation, but a simpler approach is often preferable, and is likely to produce a model easier to interpret

The logistic curve The logistic curve is commonly used to describe data on proportions. It asymptotes at 0 and 1, so that negative proportions and responses of more than 100 % cannot be predicted.

Binomial errors If p = proportion of individuals observed to respond in a given way The proportion of individuals that respond in alternative ways is: 1-p and we shall call this proportion q n is the size of the sample (or number of attempts An important point is that the variance of the binomial distribution is not constant. In fact the variance of a binomial distribution with mean np is: So that the variance changes with the mean like this:

The logistic model The logistic model for p as a function of x is given by: This model is bounded since:

The trick of linearizing the logistic model is a simple transformation See better description for the logit transformation in the class website

Small short-lived perennial herb Narrowly endemic and endangered Flowers are small and bisexual Self-compatible, but requires pollinators to set seed Hypericum cumulicola: Menges et al. (1999) Dolan et al. (1999) Boyle and Menges (2001)

15 populations (various patch sizes) >80 individuals per population each year Data on height and number of reproductive structures Survival between August 1994 and August 1995 Demographic data

Histogram of height (cm) Hypericum cumulicola (1994)

Call: glm(formula = survival ~ rep_structures * height, family = binomial) Deviance Residuals: Min 1Q Median 3Q Max Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) 2.043e e < 2e-16 *** rep_structures e e *** height e e *** rep_structures:height 1.219e e ** --- Signif. codes: 0 ‘***’ ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 (Dispersion parameter for binomial family taken to be 1) Null deviance: on 878 degrees of freedom Residual deviance: on 875 degrees of freedom AIC: Number of Fisher Scoring iterations: 4

Calculating a given proportion You can back-transform from logits (z) to proportions (p) by

Survival vs height

Survival vs rep_structures

Height - rep structures interaction 0 fruits100 fruits 200 fruits1000 fruits Height (cm) survival